Articles

An Advanced Machine Learning Approach for Enhanced Diabetes Prediction

 Diabetes is a chronic health condition affecting millions globally, causing severe complications and burdening healthcare systems. Current machine learning methods for diabetes prediction face challenges such as data imbalance, limited generalizability, and computational inefficiency. This study proposes a novel method that combines K-Nearest Neighbors (KNN), clustering techniques, Synthetic Minority Over- sampling Technique (SMOTE), and Random Forest for outcome classification to address these issues. The PIMA Indian Diabetes Dataset was used to evaluate the approach, achieving accuracy of 87.50%. However, the study has limitations, such as dependency on specific datasets and computational complexity. Future work will focus on validating the method across diverse datasets, optimizing computational efficiency, and developing real-time prediction capabilities.

Grid Search Optimized Machine Learning based Modeling of CO2 Emissions Prediction from Cars for Sustainable Environment

Carbon emissions have increased dramatically because of industrialization, trapping heat in the atmosphere and hastening climate change. This is a serious threat to the wealth, security, and well-being of the world. The effects are extensive, ranging from severe weather, disease outbreaks, and economic disruption to food insecurity and water scarcity. The World Health Organization (WHO) has determined that climate change poses the greatest threat to public health in the twenty-first century. Thus, precise CO2 emissions have emerged as a crucial concern in recent times. Several studies have tried to forecast the amount CO2 from industry and power plant using statistical analysis. Efficiency, robustness and diverse application was the limitation of the study.  In this study, we have proposed an AI based model that is able to predict the amounts of CO2 emissions from cars. We applied a grid search-optimized machine learning approach using the publicly available Canadian dataset. Incorporation of different statistical analyses and preprocessing techniques such as duplicate data management, outlier rejection, scaling contributed to enhance the quality of the dataset. Later, grid search techniques were applied to tune the KNN, RF, and SVR models. The approach has enhanced the performance of CO2 emissions prediction. In the study, we further used the explainability of the random forest model to check the bias and fairness of predictability. MSE, RMSE, and R-squared metrics of the proposed approach were the highest as the state of the art.  

Data Analytics for Decision-Making in Evaluating the Top-Performing Product and Developing Sales Forecasting Model in an Oil Service Company

This study addresses the strategic challenges faced by a company specialising in the manufacture of oil and gas equipment. Following organisational restructuring, which involved the dissolution of one business unit and the creation of another, the company is navigating complexities in product focus and manpower allocation within the Asia-Pacific region. The research problem centres on identifying the top-performing product, determining potential countries for establishing a support base facility based on sales performance, and developing a method for forecasting future sales.

The research involved retrieving and pre-processing historical sales data, then performing a thorough descriptive and predictive analysis. The data was partitioned into training and testing sets to facilitate predictive analytics. Several predictive models were developed and tested, including neural networks, linear regression, gradient-boosted trees, random forests, and ARIMA methods. Tableau Public was utilised for descriptive analytics, whereas RapidMiner Studio was employed for predictive analytics.

The study’s results, derived through both descriptive and predictive analytic methods, reveal critical insights. The Blowout Preventer (BOP) emerged as the top-performing product in the Asia-Pacific region. In terms of establishing support base facilities, Malaysia was identified as the ideal location for the BOP, while Indonesia was found suitable for the Manifold product group. Furthermore, the Random Forest model was determined to be the most effective for forecasting future sales. These findings provide strategic guidance for the company in product focus, regional expansion, and resource allocation, contributing significantly to the company’s decision-making process in a competitive industry.