Articles

Hyperparameter Tuning of Random Forest Algorithm for Diabetes Classification

This study aims to optimize the hyperparameters of the Random Forest model in diabetes classification using the Pima Indian Diabetes dataset, given the importance of early diabetes diagnosis to mitigate serious health impacts. While Random Forest is a popular algorithm for classification due to its resistance to overfitting, the selection of the right hyperparameters significantly affects its performance. Therefore, this research utilizes Grid Search and Random Search techniques for hyperparameter tuning to improve model accuracy. The research methodology includes data collection, preprocessing, dataset splitting (80% for training and 20% for testing), feature scaling using Standard Scaler, and the application of the Random Forest algorithm with hyperparameter tuning and model evaluation based on accuracy, precision, recall, and F1-Score. The results show that Random Forest, when tuned with Grid Search and Random Search, significantly improved model performance, with Random Search yielding the best results, achieving an accuracy of 0.75, precision of 0.64, and recall of 0.69. This study demonstrates that hyperparameter tuning can significantly enhance the performance of the Random Forest model, contributing to the development of machine learning applications for medical diabetes diagnosis.

An Advanced Machine Learning Approach for Enhanced Diabetes Prediction

 Diabetes is a chronic health condition affecting millions globally, causing severe complications and burdening healthcare systems. Current machine learning methods for diabetes prediction face challenges such as data imbalance, limited generalizability, and computational inefficiency. This study proposes a novel method that combines K-Nearest Neighbors (KNN), clustering techniques, Synthetic Minority Over- sampling Technique (SMOTE), and Random Forest for outcome classification to address these issues. The PIMA Indian Diabetes Dataset was used to evaluate the approach, achieving accuracy of 87.50%. However, the study has limitations, such as dependency on specific datasets and computational complexity. Future work will focus on validating the method across diverse datasets, optimizing computational efficiency, and developing real-time prediction capabilities.

Grid Search Optimized Machine Learning based Modeling of CO2 Emissions Prediction from Cars for Sustainable Environment

Carbon emissions have increased dramatically because of industrialization, trapping heat in the atmosphere and hastening climate change. This is a serious threat to the wealth, security, and well-being of the world. The effects are extensive, ranging from severe weather, disease outbreaks, and economic disruption to food insecurity and water scarcity. The World Health Organization (WHO) has determined that climate change poses the greatest threat to public health in the twenty-first century. Thus, precise CO2 emissions have emerged as a crucial concern in recent times. Several studies have tried to forecast the amount CO2 from industry and power plant using statistical analysis. Efficiency, robustness and diverse application was the limitation of the study.  In this study, we have proposed an AI based model that is able to predict the amounts of CO2 emissions from cars. We applied a grid search-optimized machine learning approach using the publicly available Canadian dataset. Incorporation of different statistical analyses and preprocessing techniques such as duplicate data management, outlier rejection, scaling contributed to enhance the quality of the dataset. Later, grid search techniques were applied to tune the KNN, RF, and SVR models. The approach has enhanced the performance of CO2 emissions prediction. In the study, we further used the explainability of the random forest model to check the bias and fairness of predictability. MSE, RMSE, and R-squared metrics of the proposed approach were the highest as the state of the art.  

Data Analytics for Decision-Making in Evaluating the Top-Performing Product and Developing Sales Forecasting Model in an Oil Service Company

This study addresses the strategic challenges faced by a company specialising in the manufacture of oil and gas equipment. Following organisational restructuring, which involved the dissolution of one business unit and the creation of another, the company is navigating complexities in product focus and manpower allocation within the Asia-Pacific region. The research problem centres on identifying the top-performing product, determining potential countries for establishing a support base facility based on sales performance, and developing a method for forecasting future sales.

The research involved retrieving and pre-processing historical sales data, then performing a thorough descriptive and predictive analysis. The data was partitioned into training and testing sets to facilitate predictive analytics. Several predictive models were developed and tested, including neural networks, linear regression, gradient-boosted trees, random forests, and ARIMA methods. Tableau Public was utilised for descriptive analytics, whereas RapidMiner Studio was employed for predictive analytics.

The study’s results, derived through both descriptive and predictive analytic methods, reveal critical insights. The Blowout Preventer (BOP) emerged as the top-performing product in the Asia-Pacific region. In terms of establishing support base facilities, Malaysia was identified as the ideal location for the BOP, while Indonesia was found suitable for the Manifold product group. Furthermore, the Random Forest model was determined to be the most effective for forecasting future sales. These findings provide strategic guidance for the company in product focus, regional expansion, and resource allocation, contributing significantly to the company’s decision-making process in a competitive industry.