Prediction of type 2 diabetes mellitus using medical attributes of the Leo SAC Polyclinic of San Juan de Lurigancho through the Machine Learning approach

Authors

  • Jaime Yelsin Rosales Malpartida Universidad Nacional de Ingeniería, Lima, Perú

DOI:

https://doi.org/10.53673/jb.v1i1.5

Keywords:

predicción de diabetes, Machine Learning, datos del Policlínico Leo SAC de San Juan de Lurigancho

Abstract

Deaths from diabetes increased by 70% globally between 2000 and 2019, ranking it among the top ten causes of mortality. It was the direct cause of 4.2 million deaths in 2019, and the number of adults (20-79 years) living with diabetes was approximately 463 million and is expected to rise to 700 million by 2045. Diabetes is a serious disease for health due to the presence of high glucose levels in the human body, so an early diagnosis will help treat it and prevent its complications. The need for an easy and fast way to diagnose diabetes is crucial. It is essential to evaluate the impacts of the chosen Machine Learning models using medical attributes, so we developed and tested 13 Machine Learning methods of classical models, neural networks and ensemble models to predict type 2 diabetes mellitus in elderly patients. the data set was obtained from the Leo SAC Polyclinic in San Juan de Lurigancho. Models with optimal hyperparameters were evaluated using the accuracy, precision, sensitivity, specificity, F1-score, misclassification rate, and AUC on the training and test data set. In all seven performance measures, the model that consistently outperformed the others was LightGBM. This study demonstrates that the choice of Machine Learning models has an effect on the prediction results.

References

Diabetes, Centers for Disease Control and Prevention, https://www.cdc.gov.

Kristeen Cherney, Age of Onset for Type 2 Diabetes: Know Your Risk, online article https://www.healthline.com/health/type-2-diabetes-age-ofonset.

Rajput, M. R., & Khedgikar, S. S. Diabetes prediction and analysis using medical attributes: A Machine learning approach.

May, O. A. C., Koo, J. J. P., Kinani, J. M. V., & Encalada, M. A. Z. (2018). Construcción De Un Modelo De Predicción Para Apoyo Al Diagnóstico De Diabetes (Construction of a Prediction Model To Support the Diabetes Diagnosis). Pistas Educativas, 40(130).

ALSHARİ, H., & ODABAS, A. Machine Learning Model to Diagnose Diabetes Type 2 Based on Health Behavior. Gazi University Journal of Science, 1-1.

Singh, D. A. A. G., Leavline, E. J., & Baig, B. S. (2017). Diabetes prediction using medical data. Journal of Computational Intelligence in Bioinformatics, 10(1), 1-8.

Mitushi Soni, Dr. Sunita Varma, 2020, Diabetes Prediction using Machine Learning Techniques, INTERNATIONAL JOURNAL OF ENGINEERING RESEARCH & TECHNOLOGY (IJERT) Volume 09, Issue 09 (September 2020).

Nnamoko, N., Hussain, A., & England, D. (2018, July). Predicting diabetes onset: an ensemble supervised learning approach. In 2018 IEEE Congress on Evolutionary Computation (CEC) (pp. 1-7). IEEE.

Barakat, N., Bradley, A. P., & Barakat, M. N. H. (2010). Intelligible support vector machines for diagnosis of diabetes mellitus. IEEE transactions on information technology in biomedicine, 14(4), 1114-1120.

Nai-arun, N., & Moungmai, R. (2015). Comparison of classifiers for the risk of diabetes prediction. Procedia Computer Science, 69, 132-142.

Joshi, T. N., & Chawan, P. P. M. (2018). Diabetes prediction using machine learning techniques. Ijera, 8(1), 9-13.

Sisodia, D., & Sisodia, D. S. (2018). Prediction of diabetes using classification algorithms. Procedia computer science, 132, 1578-1585.

Majji, Ramachandro y Bhramaramba Ravi. (2018). Type 2 Diabetes Classification and Prediction Using Risk Score. International Journal of Pure and Applied Mathematics Volume 119 No. 15 2018, 1099-1111.

Shetty, D., Rit, K., Shaikh, S., & Patil, N. (2017, March). Diabetes disease prediction using data mining. In 2017 international conference on innovations in information, embedded and communication systems (ICIIECS) (pp. 1-5). IEEE.

Chowdary, P. B. K., & Kumar, D. R. U. (2021). An Effective Approach for Detecting Diabetes using Deep Learning Techniques based on Convolutional LSTM Networks. IJACSA) International Journal of Advanced Computer Science and Applications, 12(4).

Yahyaoui, A., Jamil, A., Rasheed, J., & Yesiltepe, M. (2019, November). A decision support system for diabetes prediction using machine learning and deep learning techniques. In 2019 1st International Informatics and Software Engineering Conference (UBMYK) (pp. 1-4). IEEE.

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., ... & Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. the Journal of machine Learning research, 12, 2825-2830.

Wang, S., Tang, J., & Liu, H. (2017). Feature Selection.

Jiang, S. Y., & Wang, L. X. (2016). Efficient feature selection based on correlation measure between continuous and discrete features. Information Processing Letters, 116(2), 203-215.

Hsu, H. H., & Hsieh, C. W. (2010). Feature Selection via Correlation Coefficient Clustering. J. Softw., 5(12), 1371-1377.

Amin, M. M., Gomes, P. M., Gomes, J. P., & Tasneem, F. (2021). Developing a machine learning based prognostic model and a supporting web-based application for predicting the possibility of early diabetes and diabetic kidney disease (Doctoral dissertation, Brac University).

Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (2017). Classification and regression trees. Routledge.

Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences, 55(1), 119-139.

Chen, T., & Guestrin, C. (2016, August). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining (pp. 785-794).

Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., ... & Liu, T. Y. (2017). Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems, 30.

Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V., & Gulin, A. (2018). CatBoost: unbiased boosting with categorical features. Advances in neural information processing systems, 31.

Published

2022-09-26

How to Cite

Jaime Yelsin Rosales Malpartida. (2022). Prediction of type 2 diabetes mellitus using medical attributes of the Leo SAC Polyclinic of San Juan de Lurigancho through the Machine Learning approach. Journal BioFab, 1(1), 143–162. https://doi.org/10.53673/jb.v1i1.5