Articles

Machine Learning as Managerial Tool: A Case Study in ADNOC

In the current business environment, managers are facing challenges in managing different kinds of people. They find it difficult to track, evaluate, and manage employees in a fast-paced work setting. Machine learning is an emerging concept that deals with unsupervised and supervised learning of a machine to provide a usable system. In this matter, this paper aims to investigate how companies can leverage the use of machine learning in people management and in improving the performance, productivity, and motivation of employees and managers. Thus, the research used both qualitative and quantitative research approaches to examine the impact of machine learning in an organizational setting.

Circular Economy Transformation in Chemical Industry: Integrating CRM and AI for Sustainable Growth

This white paper explores the pivotal role of Customer Relationship Management (CRM) in the digital transformation journey of the chemical industry. As customer expectations continue to evolve and competition intensifies, chemical companies are turning to CRM solutions to enhance customer interactions, streamline operations, and drive business growth. The abstract provides an overview of CRM’s significance in the chemical sector, highlighting its role in customer segmentation, sales automation, marketing optimization, and customer service enhancement. By centralizing and optimizing customer-related processes, CRM enables chemical companies to deliver personalized experiences, improve sales productivity, and foster stronger customer relationships. Through a comprehensive examination of CRM implementations and potential applications in the chemical industry, this white paper aims to provide valuable insights for industry professionals seeking to leverage CRM to navigate the challenges and opportunities of the digital age.

A Generic Approach to Entity Resolution Mechanisms for Big Data on Real World Match Problems in the Global Oil and Gas Sector

Complex challenges are facing the global oil and gas industry. Oil prices are dropping due to OPEC production level, US oil boom, and other factors. Many experts believe that prices of oil will remain low for years at equilibrium of around $40-50 (Blumberg, 2018; Walls and Zheng 2018; Azar, 2019). Although 2019 oil price is expected to average at $65 with a further decline at $62 by 2020 (Amadeo, 2019; Kasim, 2019). Also, newly commercial resources are extremely expensive to develop, as massive capital investments are required. This research intends to develop a comprehensive entity resolution framework that has the ability to search across multiple databases with disparate forms, tame large amounts of data very quickly, efficiently resolving multiple entities into one, as well as finding hidden connections without human intervention. Putting in place a system to manage these entities will not only help to better assign resources, but to do so in a more expedient fashion. Although the necessary information is mostly already available within the oil and gas companies, it is spread around different company areas and application. Entity resolution will helps to aggregate these data, identify and exploit connection between entities and offer holistic all-in-one information that can helps to identify and deal with potential risk. We therefore present such an evaluation of existing implementations on challenging real-world match tasks. We consider approaches both with and without using machine learning to find suitable parameterization and combination of similarity functions. In addition to approaches from the research community we also consider a state-of-the-art commercial entity resolution implementation. Our results indicate significant quality and efficiency differences between different approaches. We also find that some challenging resolution tasks such as matching product entities from Opec database are not sufficiently solved with conventional approaches based on the similarity of attribute values.

Outpatient Length of Stay (OLOS) Analysis at Edelweis Hospital using Machine Learning Algorithm

Patient satisfaction may be impacted by the length of stay (LOS) that a patient perceives during an outpatient clinic visit. With the increasing competition in the healthcare industry and patients’ demands for higher-quality care, hospitals are focusing more on enhancing their quality from a clinical and management perspective. The Indonesia Ministry of Health has established minimum standards (SPM) for healthcare services that all Indonesian hospitals are required to meet, particularly the hospital waiting time indicator, which must be no longer than 60 minutes. Furthermore, there is a term in healthcare called outpatient length of stay (OLOS) that is not yet specified in SPM. OLOS is defined as the amount of time a patient spends in a hospital from the moment he or she arrives at the administration until he or she leaves. Edelweis Hospital is one of a private hospital located in Bandung that has established a 2-hour maximum LOS standard for its outpatient services. Providing accurate information about LOS may increase patient satisfaction by reducing uncertainty. However, effective methods to predict the length of stay for outpatients (OLOS) in Pediatric Clinics are seldom known. This study’s goal is to design a prediction model for OLOS based on patient characteristics and several other clinical attributes. By identifying the attributes that affected OLOS, the model will help hospital make relevant decisions. We used machine learning algorithms such as random forest, decision tree, k-nearest neighbor (kNN), adaboost, and gradient boosting to design prediction models for OLOS. From the validation set, random forest has the highest accuracy rate with a value of 99.3%, followed by decision tree and gradient boosting were 99.2% each. Furthermore, machine learning models were used to determine the importance of attributes. These models could eventually be used alongside with real-time IT system data to provide accurate real-time estimates of OLOS at the Pediatric Clinic.

A Literary Review of Pattern Matching Techniques in Network Intrusion Detection

With the exponential growth in devices and services being added to networks, we are also witnessing an increase in the volume and complexity of threats, urging an increased efficiency in network intrusion detection systems which primarily rely on pattern matching to identify malicious activity on the network. In this literary review of pattern matching techniques in network intrusion detection, we explore the limitations and the research carried out in both signature-based and anomaly-based intrusion detection systems to overcome them. It focuses on the performance improvements in signature-based intrusion detection systems achieved through methodologies and technologies like regular expressions, Hyperscan, RE2, Flashtext, a generalized Aho-Corasick algorithm, usage of Bloom filters and payload sampling. It also covers the usage of machine learning techniques, including genetic algorithms, Support Vector Machines (SVM) and Improved Self-Adaptive Bayesian Algorithm (ISABA), which are used to detect anomalous behavior and identify potential threats in a network in anomaly-based network intrusion detection to assist the security analysts carry out their job functions. Additionally, this review explores the integration of the MITRE ATT&CK framework and Security Information and Event Management (SIEM) systems in network intrusion detection as this framework provides a structured and standardized approach for analyzing the tactics and techniques used by attackers to classify them, while SIEM systems enable the correlation of threat activity across multiple sources, allowing for a more comprehensive and accurate view of the network security. Overall, this literary review provides insights into the state-of-the-art techniques and frameworks used in Network Intrusion Detection based on Pattern Matching, highlighting the significant improvements in performance and detection capabilities.

Detection and Classification of Gastrointestinal Diseases by using Machine Learning: A Review

Currently, gastrointestinal diseases claim the lives of up to two million people worldwide. GI disease treatment can be challenging, time-consuming, and expensive.  One of the most recent advancements in medical imaging is the use of video endoscopy to diagnose gastrointestinal illnesses such stomach ulcers, bleeding, and polyps. Doctors require a lot of time to review all the images produced by medical video endoscopy since there are so many of them. This makes manual diagnosis difficult and has encouraged research into computer-aided approaches to diagnose all of the generated images quickly and accurately. The innovative aspect of the suggested methodology is the creation of a system for the diagnosis of digestive disorders. Machine learning techniques have the potential to significantly lower the cost of examination procedures while increasing the accuracy and speed of diagnosis. This paper describes a method for classifying GI illnesses using machine learning techniques.

E-Commerce Product Demand Modelling Using Machine Learning Algorithm Case Study of Rice Trading Products in PT XYZ

E-commerce XYZ is an Indonesian commerce company that have 3 types of products in its B2C business line: trading, consignment, and marketplace. From January 2021 until October 2022, the company’s trading rice category product sales generated a negative profit. Even though for the last several years e-commerce has been focused on growth instead of profitability, the current economic environment is forcing e-commerce companies to focus on profitability as well. For trading products, maximum profit can be achieved in two ways: selling products with a very high margin but with less quantities or selling in large quantities but with a sub-optimal margin. Hence, the company needs to find a demand function model that can be used to generate maximum profit. To find the best model, the researcher first created a baseline model by using median for every product group which is already grouped based on their Unit of Measurement. Next, to find the best model, the researcher will create a demand function using 4 other models. It is found that Gradient Boosted is the best algorithm to model the demand function. Although this model successfully models a demand function for a product category in e-commerce, business context still needs to be added before this model can be implemented in real life as well as finding other features that might affect the demand function.

Predicting Customer Satisfaction through Sentiment Analysis on Online Review

User-generated content, such as user reviews, posts, tags, ratings, and opinions on the internet, can be used as a business indicator if collected and appropriately analyzed. One of the examples is predicting customer satisfaction through implementing big data analytics on online reviews. In analyzing the user-generated content to predict customer satisfaction, the author implements machine learning approach using the Sentiment Analysis method. Five-fold cross-validation was performed to train the classification model. The training was performed with a combination of tokenization methods: term frequency-inverse document frequency (tf-idf) and bag-of-words; n-gram types: unigram, bigram, trigram, and combination of unigram, bigram, and trigram; and machine learning algorithms: linear support vector classification (LinearSVC) and multinomial naïve bayes (MultinomialNB). The result was then evaluated using classification performance metrics such as precision, recall, F1 measure, and AUC score.

The result shows that the tf-idf vectorizer performs similarly to the bag-of-words method. A similar result was also observed for machine learning algorithm selection. Both MultinomialNB and LinearSVC produce the same performance. Low-level n-grams (such as unigrams and bigrams) tended to have higher precision, recall, F1 measure, and AUC score than high-order n-grams (such as trigrams). The best results were achieved by combining unigrams, bigrams, and trigrams, resulting in an average performance score of 0.94 for all measurements. From the result and analysis, the author finds that predicting customer satisfaction using text and sentiment analysis methods on user-generated content is possible. The model’s performance in this experiment is decent, with high precision, recall, F1, and AUC score.