Abstract :
Diabetes is a chronic health condition affecting millions globally, causing severe complications and burdening healthcare systems. Current machine learning methods for diabetes prediction face challenges such as data imbalance, limited generalizability, and computational inefficiency. This study proposes a novel method that combines K-Nearest Neighbors (KNN), clustering techniques, Synthetic Minority Over- sampling Technique (SMOTE), and Random Forest for outcome classification to address these issues. The PIMA Indian Diabetes Dataset was used to evaluate the approach, achieving accuracy of 87.50%. However, the study has limitations, such as dependency on specific datasets and computational complexity. Future work will focus on validating the method across diverse datasets, optimizing computational efficiency, and developing real-time prediction capabilities.
Keywords :
Data Imbalance, Diabetes Prediction, Healthcare, Machine learning, Random forest, Synthetic Minority Over-sampling Technique.References :
- National Diabetes Data Group (US), et al., Diabetes in America. No. 95, National Institutes of Health, National Institute of Diabetes and Digestive and Kidney Diseases, 1995.
- G. Forouhi and N. J. Wareham, Epidemiology of diabetes, Medicine, vol. 38, no. 11, pp. 602–606, 2010.
- Bilous, R. Donnelly, and I. Idris, Handbook of Diabetes. John Wiley & Sons, 2021.
- Alam, O. Asghar, S. Azmi, and R. A. Malik, General aspects of diabetes mellitus, Handbook of Clinical Neurology, vol. 126, pp. 211–222, 2014.
- A. DeFronzo et al., Type 2 diabetes mellitus, Nature Reviews Disease Primers, vol. 1, no. 1, pp. 1–22, 2015.
- Ginter and V. Simko, Type 2 diabetes mellitus, pandemic in 21st century, Diabetes: An Old Disease, a New Insight, pp. 42–50, 2013.
- D. Deshmukh, A. Jain, and B. Nahata, Diabetes mellitus: A review, Int. J. Pure Appl. Biosci, vol. 3, no. 3, pp. 224–230, 2015.
- C. Smith Jr, Multiple risk factors for cardiovascular disease and diabetes mellitus, The American Journal of Medicine, vol. 120, no. 3, pp. S3–S11, 2007.
- H. Medalie, C. M. Papier, U. Goldbourt, and J. B. Herman, Major factors in the development of diabetes mellitus in 10,000 men, Archives of Internal Medicine, vol. 135, no. 6, pp. 811–817, 1975.
- M. Olefsky, Prospects for research in diabetes mellitus, Jama, vol. 285, no. 5, pp. 628–632, 2001.
- J. Shi et al., Involvement of growth factors in diabetes mellitus and its complications: A general review, Biomedicine & Pharmacotherapy, vol. 101, pp. 510–527, 2018.
- Diabetes Control and Complications Trial/Epidemiology of Diabetes Interventions and Complications (DCCT/EDIC) Study Research Group, Long-term effect of diabetes and its treatment on cognitive function, New England Journal of Medicine, vol. 356, no. 18, pp. 1842–1852, 2007.
- Diabetes Control and Complications Trial Research Group, The effect of intensive treatment of diabetes on the development and progression of long-term complications in insulin-dependent diabetes mellitus, New England Journal of Medicine, vol. 329, no. 14, pp. 977–986, 1993.
- Howard, J. H. Arnsten, and M. N. Gourevitch, Effect of alcohol consumption on diabetes mellitus: A systematic review, Annals of Internal Medicine, vol. 140, no. 3, pp. 211–219, 2004.
- A. Mazzuca et al., The diabetes education study: A controlled trial of the effects of diabetes patient education, Diabetes Care, vol. 9, no. 1, pp. 1–10, 1986.
- Mahesh, Machine learning algorithms-a review, International Journal of Science and Research (IJSR), vol. 9, no. 1, pp. 381–386, 2020.
- H. Zhou, Machine learning, Springer Nature, 2021.
- I. Jordan and T. M. Mitchell, Machine learning: Trends, perspectives, and prospects, Science, vol. 349, no. 6245, pp. 255–260, 2015.
- Singh, N. Thakur, and A. Sharma, A review of supervised machine learning algorithms, in 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom), pp. 1310–1315, IEEE, Mar. 2016.
- B. Kotsiantis, I. Zaharakis, and P. Pintelas, Supervised machine learning: A review of classification techniques, Emerging Artificial Intelligence Applications in Computer Engineering, vol. 160, no. 1, pp. 3–24, 2007.
- Saxena, S. K. Sharma, M. Gupta, and G. C. Sampada, A novel approach for feature selection and classification of diabetes mellitus: machine learning methods, Computational Intelligence and Neuro- science, vol. 2022, no. 1, pp. 3820360, 2022.
- Zou, K. Qu, Y. Luo, D. Yin, Y. Ju, and H. Tang, Predicting diabetes mellitus with machine learning techniques, Frontiers in Genetics, vol. 9, pp. 515, 2018.
- Krishnamoorthi et al., [Retracted] A novel diabetes healthcare disease prediction framework using machine learning techniques, Journal of Healthcare Engineering, vol. 2022, no. 1, pp. 1684017, 2022.
- Maniruzzaman et al., Accurate diabetes risk stratification using machine learning: role of missing value and outliers, Journal of Medical Systems, vol. 42, pp. 1–17, 2018.
- M. Butt et al., Machine learning based diabetes classification and prediction for healthcare appli- cations, Journal of Healthcare Engineering, vol. 2021, no. 1, pp. 9930985, 2021.
- D. Joshi and C. K. Dhakal, Predicting type 2 diabetes using logistic regression and machine learning approaches, International Journal of Environmental Research and Public Health, vol. 18, no. 14, pp. 7346, 2021.
- Gnanadass, Prediction of gestational diabetes by machine learning algorithms, IEEE Potentials, vol. 39, no. 6, pp. 32–37, 2020.
- Hayashi and S. Yukita, Rule extraction using Recursive-Rule extraction algorithm with J48graft combined with sampling selection techniques for the diagnosis of type 2 diabetes mellitus in the Pima Indian dataset, Informatics in Medicine Unlocked, vol. 2, pp. 92–104, 2016.