Articles

Hyperparameter Tuning of Random Forest Algorithm for Diabetes Classification

This study aims to optimize the hyperparameters of the Random Forest model in diabetes classification using the Pima Indian Diabetes dataset, given the importance of early diabetes diagnosis to mitigate serious health impacts. While Random Forest is a popular algorithm for classification due to its resistance to overfitting, the selection of the right hyperparameters significantly affects its performance. Therefore, this research utilizes Grid Search and Random Search techniques for hyperparameter tuning to improve model accuracy. The research methodology includes data collection, preprocessing, dataset splitting (80% for training and 20% for testing), feature scaling using Standard Scaler, and the application of the Random Forest algorithm with hyperparameter tuning and model evaluation based on accuracy, precision, recall, and F1-Score. The results show that Random Forest, when tuned with Grid Search and Random Search, significantly improved model performance, with Random Search yielding the best results, achieving an accuracy of 0.75, precision of 0.64, and recall of 0.69. This study demonstrates that hyperparameter tuning can significantly enhance the performance of the Random Forest model, contributing to the development of machine learning applications for medical diabetes diagnosis.

An Advanced Machine Learning Approach for Enhanced Diabetes Prediction

 Diabetes is a chronic health condition affecting millions globally, causing severe complications and burdening healthcare systems. Current machine learning methods for diabetes prediction face challenges such as data imbalance, limited generalizability, and computational inefficiency. This study proposes a novel method that combines K-Nearest Neighbors (KNN), clustering techniques, Synthetic Minority Over- sampling Technique (SMOTE), and Random Forest for outcome classification to address these issues. The PIMA Indian Diabetes Dataset was used to evaluate the approach, achieving accuracy of 87.50%. However, the study has limitations, such as dependency on specific datasets and computational complexity. Future work will focus on validating the method across diverse datasets, optimizing computational efficiency, and developing real-time prediction capabilities.

Forecasting Cryptocurrency Markets: Predictive Modelling Using Statistical and Machine Learning Approaches

The rapidly evolving landscape of cryptocurrency markets presents unique challenges and opportunities. The significant daily variations in cryptocurrency exchange rates lead to substantial risks associated with investments in crypto assets. This study aims to forecast the prices of cryptocurrencies using advanced machine learning models. Among seven models that were tested for their prediction and validation efficiency, Neutral Networks performed the best with minimum error. Thus, Long Short-Term Memory (LSTM) neural networks were used for predicting future trends. LSTM model is well-suited for analyzing complex dependencies in financial data. Starting with historical data collection, data preprocessing, feature engineering, normalization and integrative binning, a comprehensive Exploratory Data Analysis (EDA) was conducted on 50 cryptocurrencies. Top performers were identified based on criteria such as trading volume, market capitalization, and price trends. The LSTM model was implemented using Python to predict 90-day price movements data to check intricate patterns and relationships. Model performance was validated by performance metrics such as MAE and RMSE. The findings align with the Adaptive Market Hypothesis (AMH) which suggests that cryptocurrency markets exhibit dynamic efficiency influenced by evolving market conditions and investor behavior. The study shows the potential of machine learning models in financial economics and their role in enhancing risk management strategies and investment decision-making processes.

Comparative Analysis of Machine Learning Algorithms for Used Car Price Prediction

After 2021, over 90 million passenger automobiles were produced, marking a significant increase in auto production. This growth has led to a flourishing used car market, which has become a highly lucrative sector. One of the most critical and fascinating areas of research within this market is automobile price prediction. Accurate price prediction models can greatly benefit buyers, sellers, and businesses in the used car industry. This paper presents a detailed comparative analysis of two supervised machine learning models: K-Nearest Neighbour and Support Vector Machine regression techniques, to predict used car prices. We utilized a comprehensive dataset of used cars sourced from the Kaggle website for training and testing our models. The K Nearest Neighbour algorithm is known for its simplicity and effectiveness in regression tasks. On the other hand, the Support Vector Machine regression technique uses a different approach, finding the optimal hyperplane that best fits the data. Both methods have their strengths and weaknesses, which we explored in this study. Our results indicated that both KNN and SVM models performed well in predicting used car prices, but with slight variations in accuracy.  Consequently, the suggested models fit as the optimum models and have an accuracy of about 83 percent for KNN and 80 percent for SVM. The results indicate that the KNN model slightly outperforms the SVM model in predicting used car prices.

Identifying Fishing Trip Behavior from Vessel Monitoring System (VMS) Data Using Machine Learning Models

Illegal fishing in Indonesian waters poses a serious challenge that requires innovative solutions. This research offers an advanced technological approach by applying the Hidden Markov Model (HMM) in Machine Learning to address this issue. Data from the Vessel Monitoring System (VMS) is utilized to efficiently identify fishing vessel activities. By involving a dataset that encompasses various vessel activities, this model can detect suspicious fishing practices in real-time. The research findings demonstrate that this model consistently identifies fishing vessel activities with a high level of accuracy. This study makes a significant contribution to efforts in preventing Illegal, Unreported, and Unregulated (IUU) Fishing and supports marine resource sustainability initiatives.

A Review of AI-powered Diagnosis of Rare Diseases

The diagnosis of rare diseases presents significant challenges due to their low prevalence, complex symptomatology, and the scarcity of specialized knowledge. However, advancements in Artificial Intelligence (AI) offer promising solutions to these challenges. This review explores the current state of AI-powered diagnostic tools for rare diseases, focusing on the methodologies, algorithms, and platforms utilized in this emerging field. We examine how AI technologies, such as machine learning, deep learning, and natural language processing, are being integrated into clinical practice to enhance diagnostic accuracy and speed. The research also provides the examples that highlight the successes and limitations of AI in this domain, providing insights into how AI can be harnessed to improve patient outcomes in rare disease diagnosis and management.

Machine Learning Approaches for Customer Churn Prediction in the Aquaculture Technology Sector

This study investigates the application of advanced machine learning techniques for customer churn prediction in the rapidly evolving aquaculture technology sector. We employ and compare three distinct models—Logistic Regression, Random Forest, and XGBoost—to analyze a synthesized dataset representative of the industry. The research encompasses comprehensive data preprocessing, feature engineering, and model evaluation using standard performance metrics. Our findings demonstrate the superior performance of XGBoost, achieving 88% accuracy in predicting customer churn. Through feature importance analysis, we identify key churn predictors, with the difference between a customer’s last order amount and their mean order amount emerging as the most significant factor. Additionally, we utilize SHAP (SHapley Additive exPlanations) analysis to interpret model outcomes, revealing nuanced relationships between features and churn probability. The study highlights the critical role of consistent engagement, proactive customer support, and personalized retention strategies in reducing churn. Our research contributes to the growing body of knowledge on churn prediction in specialized technology sectors and provides actionable insights for improving customer retention strategies in the aquaculture industry. The paper concludes with recommendations for future research, including the integration of external data sources and exploration of deep learning approaches for temporal dependency analysis in customer behaviour.

An Exploratory Data Analysis (EDA) Approach for Analyzing Financial Statements in Pharmaceutical Companies Using Machine Learning

This research investigates the use of Exploratory Data Analysis (EDA) and machine learning techniques to analyze financial statements (FSs) of pharmaceutical companies. The study focuses on three major Indonesian pharmaceutical companies: Kimia Farma, Kalbe Farma, and IndoFarma. By leveraging EDA, this study aims to uncover hidden patterns and insights within financial data, such as earnings per share (EPS), return on capital employed (ROCE), net profit margin, and inventory turnover ratio. Additionally, the study employs machine learning models, including Linear Regression, K-Nearest Neighbor (KNN), Support Vector Machine (SVM), and Decision Tree, to predict financial performance metrics and trends. The performance of these models is evaluated using metrics such as Root Mean Squared Error (RMSE), Mean Squared Error (MSE), Mean Absolute Error (MAE), and Mean Absolute Percentage Error (MAPE). Among the models tested, the Decision Tree model demonstrated the highest performance, indicating high accuracy and a strong fit to the data. These results highlight the potential of data-driven approaches in improving the operational efficiency and financial stability of healthcare organizations.

Predictive Modeling in Remote Sensing Using Machine Learning Algorithms

Predictive modeling in remote sensing using machine learning (ML) algorithms has emerged as a powerful approach for addressing various environmental and climatic challenges. This paper explores the integration of advanced ML techniques with remote sensing data to enhance predictive capabilities for applications such as land cover classification, crop yield prediction, climate change monitoring, and disaster management. We review related works and existing systems, highlighting platforms like Google Earth Engine (GEE), NASA Earth Exchange (NEX), and Sentinel Hub, which leverage cloud computing to handle large-scale data processing and model deployment. The proposed system incorporates data acquisition, preprocessing, feature extraction, model selection and training, and prediction and visualization to provide accurate and timely predictions. Future enhancements, including deep learning integration, real-time data processing, enhanced user interfaces, and collaboration with Internet of Things (IoT) devices, are discussed to further strengthen the system’s capabilities. The paper concludes by emphasizing the potential of ML algorithms in transforming remote sensing applications, supporting informed decision-making, and improving the management of Earth’s resources.

Predictive Analysis for Personalized Machine: Leveraging Patient Data for Enhanced Healthcare

This research explores predictive analysis for personalized machine: leveraging patient data for enhanced healthcare. By leveraging the power of information and analytics, the healthcare industry can be driven towards a more patient-centric, proactive model that enhances outcomes and improve the overall quality of care. The objectives of the study are to: determine the significance and challenges of predictive analytics in healthcare, ascertain the data analytics techniques used in healthcare to enhance patient care, find out how predictive analytics can be applied for enhanced healthcare, and determine the ethical considerations associated with healthcare predictive analytics. This study employs the case study approach and experimental design. The study analyzes case studies of real-time deployment of predictive analytics models in healthcare centers, examines how these models enhance the healthcare delivery in those centers. Experiments were also conducted to understand how predictive analytics works. The C4.5 learning algorithm was employed to predict the presence of chronic kidney disease (CKD) in patients and differentiate between those not affected by the condition. The C4.5 classifier shows reasonable strength, evident in the large number of rightly classified occurrences (396) and a low misclassification of only 4 occurrences. This is further demonstrated by a low error rate of 0.37, as shown in table 5. The prevalence of this algorithm is emphasized by the large value of KS (0.97), indicating the classifier’s ground-breaking accuracy and performance. The performance of C4.5, featured by its minimal execution time and accuracy, puts it as a decent classifier. This characteristic makes it specifically well-suited for application in the healthcare sector, particularly for tasks involving prediction and classification. The application of data analytics methods for predictive analysis holds significant benefits in the health sector, as it gives us the power to predict and address potential threats to human health, covering different age groups, from the young ones to the elderly. This proactive method enables early disease detection, helping in timely interventions and contributing to better decision-making.