Abstract :
This study aims to optimize the hyperparameters of the Random Forest model in diabetes classification using the Pima Indian Diabetes dataset, given the importance of early diabetes diagnosis to mitigate serious health impacts. While Random Forest is a popular algorithm for classification due to its resistance to overfitting, the selection of the right hyperparameters significantly affects its performance. Therefore, this research utilizes Grid Search and Random Search techniques for hyperparameter tuning to improve model accuracy. The research methodology includes data collection, preprocessing, dataset splitting (80% for training and 20% for testing), feature scaling using Standard Scaler, and the application of the Random Forest algorithm with hyperparameter tuning and model evaluation based on accuracy, precision, recall, and F1-Score. The results show that Random Forest, when tuned with Grid Search and Random Search, significantly improved model performance, with Random Search yielding the best results, achieving an accuracy of 0.75, precision of 0.64, and recall of 0.69. This study demonstrates that hyperparameter tuning can significantly enhance the performance of the Random Forest model, contributing to the development of machine learning applications for medical diabetes diagnosis.
Keywords :
Classification, Diabetes, Hyperparameter Tuning, Machine learning, Random forestReferences :
- Lingga Aji A, Pratiwi Amalia Nur A, Respatiwulan R. Analisis Sentimen Masyarakat terhadap Hasil Quick Count Pemilihan Presiden Indonesia 2019 pada Media Sosial Twitter Menggunakan Metode Naive Bayes Classifiere. Indones J Appl Stat. 2019;2(1):34-41.
- Purwanto A, Masduki A, Fahlevi M, et al. Impact ofWork From Home(WFH) on Indonesian Teachers Performance During the Covid-19 Pandemic : An Exploratory Study. Int J Adv Sience Technol. 2020;29(5):6235-6244.
- Jack S, BS, JE E, MD, MPH, WC Dickson. Using the ADAP Learning Algorithm to Forecast the Onset of Diabetes Mellitus. In: Annual Symposium on Computer Applications in Medical Care. ; 1998:261-265.
- Heryadi Y, Wahyono T. Machine Learning Konsep Dan Implementasi. Cetakan I. (Turi, ed.). Gava Media; 2020.
- Yu T, Zhu H. Hyper-Parameter Optimization: A Review of Algorithms and Applications. Arxiv. Published online 2020:1-54.
- Andonie R. Hyperparameter optimization in learning systems. J Membr Comput. 2019;1(4):279-291.
- Pavel L, Patrick D, Christin S, Rieck K. Learning Instruction Detection: Supervised or Unsupervised? In: Lectures Note in Computer Science. ; 2005:50-57.
- Pratap C Sen, Mahimarnab H, Mitadru G. Supervised Classification Algorithms in Machine Learning: A Survey and Review. In: Advances in Intelligent System and Computing. ; 2020:99-111.
- Shreffler J, Heucker MR. Exploratory Data Analysis: Frequencies, Descriptive Statistics, Histograms, and Boxplots. StatPearls Publishing; 2023.
- Ali M, Haider MN, Lashari SA, Sharif W, Khan A, Ramli DA. Stacking Classifier with Random Forest functioning as a Meta Classifier for Diabetes Diseases Classification. Procedia Comput Sci. 2022;217(1877-0509):3453-3462.
- Jain N, Jana PK. LRF: A logically randomized forest algorithm for classification and regression problems[Formula presented]. Expert Syst Appl. 2023;213(September 2021). doi:10.1016/j.eswa.2022.119225
- Liaw A, Wiener M. Classification and Regression by Random Forest. R News. 2022;2(3):18-22.
- Guo Z, Guo R, Lin S. Multi-factor fuzzy prediction model of concrete surface chloride concentration with trained samples expanded by random forest algorithm. Mar Struct. 2022;86(September):103311. doi:10.1016/j.marstruc.2022.103311
- Biau G, Scornet E. Rejoinder on: A random forest guided tour. Test. 2016;25(2):264-268. doi:10.1007/s11749-016-0488-0
- Breiman L. Random Forest. Mach Learn. 2001;45(1):5-32.
- Billah M, Islam AKMS, Mamoon W Bin, Rahman MR. Random forest classifications for landuse mapping to assess rapid flood damage using Sentinel-1 and Sentinel-2 data. Remote Sens Appl Soc Environ. 2023;30(February):100947. doi:10.1016/j.rsase.2023.100947
- Roberts JF, Mwangi R, Mukabi F, et al. Pyeo: A Python package for near-real-time forest cover change detection from Earth observation using machine learning. Comput Geosci. 2022;167(February):105192. doi:10.1016/j.cageo.2022.105192
- Han X, Zhu X, Pedrycz W, Li Z. A three-way classification with fuzzy decision trees. Appl Soft Comput. 2023;132:109788. doi:10.1016/j.asoc.2022.109788