Articles

Hyperparameter Tuning of Random Forest Algorithm for Diabetes Classification

This study aims to optimize the hyperparameters of the Random Forest model in diabetes classification using the Pima Indian Diabetes dataset, given the importance of early diabetes diagnosis to mitigate serious health impacts. While Random Forest is a popular algorithm for classification due to its resistance to overfitting, the selection of the right hyperparameters significantly affects its performance. Therefore, this research utilizes Grid Search and Random Search techniques for hyperparameter tuning to improve model accuracy. The research methodology includes data collection, preprocessing, dataset splitting (80% for training and 20% for testing), feature scaling using Standard Scaler, and the application of the Random Forest algorithm with hyperparameter tuning and model evaluation based on accuracy, precision, recall, and F1-Score. The results show that Random Forest, when tuned with Grid Search and Random Search, significantly improved model performance, with Random Search yielding the best results, achieving an accuracy of 0.75, precision of 0.64, and recall of 0.69. This study demonstrates that hyperparameter tuning can significantly enhance the performance of the Random Forest model, contributing to the development of machine learning applications for medical diabetes diagnosis.

Overview of the problems of VET education: an Attempt for Classification and Cross-Mapping the Problems from 2024: A Case study

This study systematically examines the multifaceted challenges facing vocational education and training (VET) in Bulgaria, highlighting the persistent issues that undermine the effectiveness of the system. Through a comprehensive classification of these challenges, the research identifies key areas such as inadequate funding, outdated curricula, insufficient industry linkages, and a lack of qualified teaching personnel. The analysis reveals that long-term neglect, frequent political shifts, and ineffective policy measures have contributed to a significant skills gap among graduates compared to their predecessors. It summons cores of the multi-layered long-standing problems in Vocational education in Bulgaria. Long-term neglect, frequent changes in the views of various political entities with a strong influence on vocational education, inadequate malapropos and untimely fragmented half-measures, as well as the lack of the participation of serious business and expertise in vocational education are among the many reasons for the low level of technical and professional skills of the modern graduate of a vocational high school compared to his predecessor from the time of Polytechnicism. The absence of a cohesive framework for cooperation between educational institutions and industry stakeholders exacerbates the disconnect between VET programs and labor market needs. The paper also aims to provide a structured overview of the current state of VET in Bulgaria, offering insights that can inform future policy development and strategic interventions.

Detection and Classification of Gastrointestinal Diseases by using Machine Learning: A Review

Currently, gastrointestinal diseases claim the lives of up to two million people worldwide. GI disease treatment can be challenging, time-consuming, and expensive.  One of the most recent advancements in medical imaging is the use of video endoscopy to diagnose gastrointestinal illnesses such stomach ulcers, bleeding, and polyps. Doctors require a lot of time to review all the images produced by medical video endoscopy since there are so many of them. This makes manual diagnosis difficult and has encouraged research into computer-aided approaches to diagnose all of the generated images quickly and accurately. The innovative aspect of the suggested methodology is the creation of a system for the diagnosis of digestive disorders. Machine learning techniques have the potential to significantly lower the cost of examination procedures while increasing the accuracy and speed of diagnosis. This paper describes a method for classifying GI illnesses using machine learning techniques.

Predicting Customer Satisfaction through Sentiment Analysis on Online Review

User-generated content, such as user reviews, posts, tags, ratings, and opinions on the internet, can be used as a business indicator if collected and appropriately analyzed. One of the examples is predicting customer satisfaction through implementing big data analytics on online reviews. In analyzing the user-generated content to predict customer satisfaction, the author implements machine learning approach using the Sentiment Analysis method. Five-fold cross-validation was performed to train the classification model. The training was performed with a combination of tokenization methods: term frequency-inverse document frequency (tf-idf) and bag-of-words; n-gram types: unigram, bigram, trigram, and combination of unigram, bigram, and trigram; and machine learning algorithms: linear support vector classification (LinearSVC) and multinomial naïve bayes (MultinomialNB). The result was then evaluated using classification performance metrics such as precision, recall, F1 measure, and AUC score.

The result shows that the tf-idf vectorizer performs similarly to the bag-of-words method. A similar result was also observed for machine learning algorithm selection. Both MultinomialNB and LinearSVC produce the same performance. Low-level n-grams (such as unigrams and bigrams) tended to have higher precision, recall, F1 measure, and AUC score than high-order n-grams (such as trigrams). The best results were achieved by combining unigrams, bigrams, and trigrams, resulting in an average performance score of 0.94 for all measurements. From the result and analysis, the author finds that predicting customer satisfaction using text and sentiment analysis methods on user-generated content is possible. The model’s performance in this experiment is decent, with high precision, recall, F1, and AUC score.