×
Home Current Archive Editorial board
Instructions for papers
For Authors Aim & Scope Contact
Original scientific article

ADVANCING DIABETES PREDICTION THROUGH MACHINE LEARNING AND DEEP LEARNING MODELS USING PIMA INDIAN AND CLINICAL-BIOLOGICAL DATA

By
Zeeshan Hussain Orcid logo ,
Zeeshan Hussain

Jamia Hamdard University India

Suraiya Parveen Orcid logo ,
Suraiya Parveen

Jamia Hamdard University India

Ashif Khan Orcid logo ,
Ashif Khan

Jamia Hamdard University India

Ihtiram Raza Orcid logo ,
Ihtiram Raza

Jamia Hamdard University India

Umnah Orcid logo
Umnah

Jamia Millia Islamia India

Abstract

Diabetes Mellitus is a significant world health and early detection is of paramount significance since it decreases the complications and enables medical intervention in time. The paper is a comparison between the predictive accuracy of the eight Machine Learning classifiers: Logistic Regression, Support Vector Machine (SVM), Decision Tree, Random Forest, Gradient Boosting, Naive Bayes, k-Nearest Neighbors (k-NN), and an Ensemble model on the Pima Indian Diabetes dataset and a collection of clinical-biological patient records. Performance evaluation was conducted using Precision, Recall, F1-Score, and the Area Under the ROC Curve (AUC-ROC). The findings show that a significant difference was observed among the models, with SVM (AUC-ROC: 0.8648) and the Logistic Regression (AUC-ROC: 0.8638) having the best discriminative ability. A comparable study found that Logistic Regression had the highest Precision (0.7632), indicating fewer false-positive predictions, whereas Decision Tree had the highest Recall (0.7447), indicating greater sensitivity in detecting diabetes cases. The ensemble learning produced the best overall performance (AUC-ROC: 0.8709), suggesting that combining predictions from multiple models increases reliability and generalization. On the other hand, k-NN performed worst due to sensitivity to noise and the number of features. In general, the results provide evidence of the high potential of linear-margin and ensemble-based models to structured clinical data and would be a robust foundation of clinical decision support systems, which further help to broaden the role of ML-based analytics in early diabetes diagnosis and preventive health care planning.

References

1.
Taskinen MR. Diabetic dyslipidaemia: from basic research to clinical practice. Diabetologia. 2003 Jun;46(6):733–49.
2.
Saratha B, Radhika MS, Priya VS. An Approach Towards Diabetic Retinopathy Detection and Analysis Through Cognitive Computing. Archives for Technical Sciences. 2025 J1(33): 125–134.
3.
Ganie AH et al. Robust diabetic prediction using ensemble machine learning techniques with SMOTE. Scientific Reports. 2023.
4.
Vij P, Prashant PM. Predicting aquatic ecosystem health using machine learning algorithms. International Journal of Aquatic Research and Environmental Studies. 2024;4(S1):39–44.
5.
Ganie S, Malik MB, Arif T. Performance analysis and prediction of type 2 diabetes mellitus based on lifestyle data using machine learning approaches. Journal of Diabetes & Metabolic Disorders. 2022 Jun;21(1):339–52.

Citation

This is an open access article distributed under the  Creative Commons Attribution Non-Commercial License (CC BY-NC) License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. 

Article metrics

Google scholar: See link

The statements, opinions and data contained in the journal are solely those of the individual authors and contributors and not of the publisher and the editor(s). We stay neutral with regard to jurisdictional claims in published maps and institutional affiliations.