Outpatient Length of Stay (OLOS) Analysis at Edelweis Hospital using Machine Learning Algorithm

: Patient satisfaction may be impacted by the length of stay (LOS) that a patient perceives during an outpatient clinic visit. With the increasing competition in the healthcare industry and patients' demands for higher-quality care, hospitals are focusing more on enhancing their quality from a clinical and management perspective. The Indonesia Ministry of Health has established minimum standards (SPM) for healthcare services that all Indonesian hospitals are required to meet, particularly the hospital waiting time indicator, which must be no longer than 60 minutes. Furthermore, there is a term in healthcare called outpatient length of stay (OLOS) that is not yet specified in SPM. OLOS is defined as the amount of time a patient spends in a hospital from the moment he or she arrives at the administration until he or she leaves. Edelweis Hospital is one of a private hospital located in Bandung that has established a 2-hour maximum LOS standard for its outpatient services. Providing accurate information about LOS may increase patient satisfaction by reducing uncertainty. However, effective methods to predict the length of stay for outpatients (OLOS) in Pediatric Clinics are seldom known. This study's goal is to design a prediction model for OLOS based on patient characteristics and several other clinical attributes. By identifying the attributes that affected OLOS, the model will help hospital make relevant decisions. We used machine learning algorithms such as random forest, decision tree, k-nearest neighbor (kNN), adaboost, and gradient boosting to design prediction models for OLOS. From the validation set, random forest has the highest accuracy rate with a value of 99.3%, followed by decision tree and gradient boosting were 99.2% each. Furthermore, machine learning models were used to determine the importance of attributes. These models could eventually be used alongside with real-time IT system data to provide accurate real-time estimates of OLOS at the Pediatric Clinic.


INTRODUCTION
The importance of establishing a proficient healthcare system has escalated in response to the substantial surge in healthcare expenses, coupled with the simultaneous growth in the need for healthcare services and the expectations of patients regarding the quality of service (Hulshof, et al., 2012).In the meantime, hospitals are increasingly striving to attract patients through heightened competition.Hospital management is compelled to prioritize clinical results and patient satisfaction due to competition, in order to sustain or enhance their revenues and market share (Capkun, et al., 2012).The issue of waiting time for treatments is a prevalent concern in the healthcare system in Indonesia.This component has the potential to generate unhappiness among patients, as prolonged waiting periods can lead to dissatisfaction.Buhang (2007) asserts that patient waiting time is a crucial factor in quality management.It significantly impacts the quality of healthcare services provided by a health service unit and reflects the hospital's ability to manage service components by following patient needs and expectations.To address this issue, the Indonesian Ministry of Health has implemented a policy outlined in Minister of Health Decree No. 129 (2008) called "Standard Pelayanan Minimum" (SPM).This policy sets minimum service standards for hospitals in Indonesia, specifically regarding wait times.According to this policy, hospitals are required to ensure that wait times do not exceed 60 minutes.The wait time is determined by the interval between the patient's registration and the commencement of the doctor's treatment.
In addition, the SPM does not currently provide a specific period for the length of stay (LOS).LOS refers to the duration of a patient's stay in a hospital, starting from their arrival at the administration until the time he or she leaves.Outpatient length of stay (OLOS) is crucial in acquiring new clients for healthcare institutions, especially private hospitals in a competitive healthcare market.Hence, the hospital is prioritizing endeavors to reduce patient waiting times to enhance patient perceptions of service quality and boost productivity in health services.Hospitals must establish a standard waiting time for outpatients to ensure that patients can The application of big data and data analytics in healthcare has provided new opportunities for reforming the healthcare system, particularly in addressing the issue of prolonged waiting times for treatment.Predicting the waiting time for treatment can greatly benefit patients in need.Having information about the duration of patients' wait time at the hospital before treatment can help reduce their anxiety (Mowen et al., 2016).In their 2015 study, Ang et al. utilized queueing and service flow variables, along with patient information such as age, to forecast the waiting time in the emergency department of Princess Alexandria Hospital in Australia.The aim was to optimize service time and minimize the number of patients who are unable to receive treatment due to capacity constraints, thus discouraging them from waiting if the service is already at full capacity.Predictive analytics is a field that combines data mining, statistics, and computer science to forecast future occurrences that cannot be known in advance.This is done by examining the relationship between input and output variables to identify patterns and make predictions (Dowdell, et al., 2017).Machine learning (ML) approaches are employed to identify these patterns and develop predictive models for future outcomes.
In this study, we will conduct research on Edelweis Hospital, a privately-owned healthcare facility located in Bandung.The hospital's objective is to establish itself as a leading healthcare provider in Bandung within a period of 5 years from its inception.The hospital is currently facing issues with the duration of stay for its outpatient services, which is affecting patient satisfaction.Initial data shows that there are several deviations, with the average OLOS exceeding the hospital's established standard of no more than 2 hours.
Based on the existing background, the problem formulations are as follows:  1.

Machine learning algorithm
In the healthcare sector, machine learning has shown promise in tasks such as image segmentation, classification, machine translation, and recommender systems, for example, segmenting and classifying radiology images, diagnosing high-risk patients, and predicting length of stay (Andersson, 2019).In this study, we are using classification analysis known as the supervised method.Classification is a form of data analysis that produces predictive models to describe label or class data techniques for predicting data (Han & Kamber, 2012).The algorithms used for classifying or predicting are random forest, decision tree, k-nearest neighbor, adaboost, and gradient boosting.

Cross validation
A classification model is created and evaluated by distributing labeled data into training and testing sets.Cross validation is a method used to evaluate the performance of the model or algorithm.Data is divided into training and testing sets, with models trained in training and validated by testing sets.Cross validation types can be selected based on the dataset size.A recommended K-fold cross validation is 10-fold, as it provides a less biased accuracy estimate.Cross validation uses 9 folds for training and 1 fold for testing for each of the 10 data subsets.

Research gaps and contribution to the literature
Outpatient length of stay (OLOS) in hospitals is complex and varies greatly based on factors such as patient characteristics, clinical conditions, medical procedures, examining doctors, and patient management.Factors affecting OLOS include demographics, care management, treatment response, and administrative data (Lubis & Susilawati, 2017).According to research conducted by (Aldhoyan & Alobadi, 2023) gender and age are significant factors affecting patient lateness in outpatient clinics using machine learning.Patient unpunctuality is a major concern, with travel distance and BMI being significant factors (Kasaie & Rajendran, 2023).Patient visit time is another important factor influencing outpatient no-show and waiting time (Guorui, et al., 2021).In pediatric ophthalmology outpatient clinics, doctors like ophthalmologists and insurance factors also influence wait times (Lin, et al., 2020).The number of treatments and the number of diagnoses or ICD variables have not been included in previous research, as a large number of treatments can affect waiting time.This study aims to focus on outpatient time from arrival to discharge at the Pediatric Clinic by using the number of treatments and the number of ICD as variables to predict patient wait time.

Research context and available data
This study draws data from a private hospital in Bandung.We extracted data from outpatient EMR systems from January 2022 to June 2023, with a total of 17,167 outpatient visit records at the Pediatric Clinic.Any incomplete and erroneous records were removed to ensure the reliability of the data.After removing invalid records, we obtained a sample size of 6,157.

Dependent variable definition
Prior studies have established that the patient's waiting time is determined by the interval between their arrival and the moment they receive medical attention from the doctor.We define patient waiting time at the Pediatric Clinic of Edelweis Hospital as the duration between the patient's registration their departure from the hospital.The hospital established the regulation for OLOS was set maximum of 2 hours which means standard time, if more than 2 hours is not standard, so that we make categorized the OLOS duration into 2 types such as Standard and Not Standard.

Predictor variable
The variables used in this study are day, month, patient visit time, number of ICD, doctor id, number of medical treatments, poly type, and insurance.The class label or target variable in this study is the duration of outpatient length of stay (OLOS).The variable that is used in this study can be seen in Table 2.

Feature selected and development of predictive models
The study uses the chi-square test of independence, also known as Pearson's chi-square test of association, to analyze the impact of categorical input variables on OLOS.This method determines the correlation between two variables, requiring them to be measured on an ordinal or nominal scale.In this study, we used SPSS tools to obtain the outcome based on the following hypotheses: H0 = There is no effect between the independent variable and the dependent variable.H1: There is an effect between the independent variable and the dependent variable.3 shows a summary of the variables' descriptive statistics.We report the frequency distribution for each categorical variable.
In the outpatient length of stay duration (OLOS), not standard time rate was 77.9% which indicates that generally, the OLOS at the Pediatric Clinic exceeds the maximum service time which is more than 2 hours.Most of the patients get the treatment on Monday (21.9%) and October (12.4%).There were 2776 (45%) patients who visited the hospital in the afternoon.Most patients have one diagnosis (ICD) (61.7%).The doctor id -5 has the most patient to be cured (38.3%).0 treatment or only specialist doctor examination is the most treatment that is given to the patients (81.6%).The pediatric clinic type is more used by patients than the BPJS pediatric clinic.Lastly, most of the patients used general insurance (52.3%) to pay for the treatment.

Prediction model performance
The performance of the prediction model is shown in Table 5.For the validation data set, the prediction accuracy for the random forest model is 99.3%.The precision and recall of the prediction model constructed according to random forest are 99.5% and 97.4%, respectively, with an F1 value of 0.984.The prediction accuracy of the decision tree algorithm is 99.2 slightly lower than decision tree.The precision and recall of the decision tree are 99.7% and 97%, respectively, and the F1 is 0.984.As for kNN algorithm, the precision, recall, and accuracy are 99.0%,94.3%, and 98.5, respectively, and the F1 value is 0.965.kNN algorithm has the lowest accuracy compared to other models.The prediction accuracy for the adaboost model is 98.9% slightly lower than random forest, decision tree, and gradient boosting.The precision and recall of the prediction model constructed according to adaboost are 98.2% and 97.0%, respectively, with an F1 value is 0.975.Lastly, the prediction accuracy for gradient boosting is 99.2%, this value has the same value with the accuracy of the decision tree algorithm.The precision and recall of the gradient boosting are 99.1% and 97.4%, respectively, with an F1 value is 0.982.This result shows that based on the accuracy rate random forest is the highest accuracy, recall, and F1 value, and then it is followed by the decision tree algorithm and gradient boosting based on the accuracy prediction model.
Table 5. Prediction model result on the validation data set

Important variables
In the end, we evaluated the important variable that impacts the duration of the outpatient length of stay (OLOS) at Edelweis Hospital using the most effective prediction model, which happened to be a random forest.The importance of the variables that authors obtain from the rank feature in Orange tools.

DISCUSSION
In this study, we evaluated the applicability of machine learning models to predict patient wait time in Pediatric Clinic.The key findings of our study were (1) machine learning models such as random forest can accurately predict outpatient length of stay (OLOS) in Pediatric Clinic.(2) Machine learning models can provide insight into the factors associated with patient wait time.In our random forest model provided the most accurate prediction with the accuracy rate is 99.3%, which is higher than the prediction accuracy of the decision tree and gradient boosting with the accuracy rate is 99.2.Machine learning algorithms are a great choice for predicting sophisticated and noisy phenomena like patient wait time in a Pediatric Clinic.Moreover, random forest models were able to identify the most important factors associated with outpatient length of stay.The top two important variables identified in the model are patient visit time, doctor id, and insurance.The importance of patient visit time may be because of two things, first the number of patients who come in the evening is lower than patients who visit in the morning and afternoon.Second, the patient came late to the hospital so many of patients accumulated in a certain time.The importance of doctor id is due to many doctors coming late to the clinic since the most of doctors are part-time doctor in the Pediatric Clinic consequently, the doctor is not consistently available at the hospital, requiring patients to wait for the doctor to arrive from another medical facility.

LIMITATION AND CONCLUSION
This study has the following limitations.First, the data used in this study are provided by a specific general hospital, thus the generalizability of the research results may be limited and may not be applicable to other subspecialties within the Pediatric Clinic or other healthcare systems.Second, the OLOS duration time stamps do not always effectively record the provider-patient interaction time, resulting in erroneous wait time duration.Since the phenomenon of patient wait time is becoming more serious, Hospital must make effective predictions, especially for outpatient clinic.Using machine learning algorithms, this study confirms the possibility of using a large amount of patient characteristics and clinical medication to build predictive models for outpatient length of stay.The prediction models can assist hospital in optimizing their outpatient appointment systems by predicting the estimation of patient will stay at the hospital in getting the treatment service whether the OLOS duration is standard (less than 2 hours) or not standard (more than 2 hours), so it can make the patient more flexible to regulate their time and also patient will feel more comfortable in the beginning before they accept the treatment from the doctor.
and promptly during working hours.They should also monitor the length of stay for outpatients and assess whether it aligns with the established standards, taking into account factors that influence the outpatient length of stay.

1 .
Random forest is a popular machine learning algorithm that consists of many decision trees, forming numerous trees for classification tasks.Chrusciel et al. (2021) research suggests that the random forest model works well for handling data with intricate relationships between variables, such as forecasting patient wait times in the emergency room.2. The K-Nearest Neighbor (KNN) algorithm is a non-parametric method used for classification and regression in data mining.It generates models from training tests using distance and class selection functions, and the most common class among neighbors is assigned to the data to be classified.3. The decision tree is another data mining classification technique commonly used.It is a flowchart structure with internal nodes representing tests on attributes, branches representing test results, and leaf nodes representing classes or distributions.Decision trees are easier to use than ANN or Bayesian classifiers, are efficient for large data sets, and do not require additional information other than those contained in the training data.4. Adaptive Boosting (AdaBoost) is a boosting technique used as an ensemble method in machine learning.It is iterative or repetitive, starting with training a weak classifier on training data and then weighting it based on performance.This process is repeated until the error rate is below a specified threshold.The final classifier is the weighted average of the weak classifiers, determined based on the error rate of each weak classifier. 5. Gradient Boosting is a machine learning classification algorithm that uses an ensemble of decision trees to predict values.It starts by generating an initial classification tree and continues to adjust new trees by minimizing loss.In each iteration, Gradient Boosting updates the residual error by subtracting prediction results from the target and adding a new weak learner that solves the residual error problem.This algorithm has become popular due to its efficacy in handling complex data, including noise or errors.

ISSN 5
Following the decision rules: If asymptotic significance (p-value) < 5% (0,05) then reject H0 If asymptotic significance (p-value) > 5% (0,05) then fail to reject H0 Predictive models evaluation metrics Model performance is typically evaluated via accuracy, precision, recall, precision, and F1 score.The metrics are calculated using the outputs of the confusion matrix for each class: TP for true positives, TN for true negatives, FN for false negatives, and FP for false positives.Our evaluated metrics are defined as follows:

2. LITERATURE REVIEW 2.1 Outpatient Length of Stay (OLOS) Prediction
1. What is the best OLOS prediction model at the Pediatric Clinic of Edelweis Hospital?2. What are the important attributes that influence OLOS at the Pediatric Clinic of Edelweis Hospital?

accuracy rate. The visit type variable was found to be the most significant factor in predicting service time
. Another study by Dunstan et al. (2021) used ML algorithms to predict pediatric patient no-shows and propose cost-effective intervention strategies.Dewi Rosmala and Rizandi Nugro Libranto's C4.5 algorithm predicted registration waiting time in maternity clinics with an accuracy of 84.62%, with age being the most influencing variable.Kasaie & Rajendran (2023) used logistic regression to predict late patient arrival patterns in psychiatric clinics, with the random forest method showing the best performance.Lastly, Anton et al. (2021) found the quantile regression method to be the most accurate in predicting waiting times in emergency departments.The summary of the research above if grouped based on keywords can be seen in Table

Table 2 .
Variable Type

2581-8341 Volume 06 Issue 12 December 2023 DOI: 10.47191/ijcsrr/V6-i12-66, Impact Factor: 6.789 IJCSRR @ 2023 www.ijcsrr.org 8145
Table 4shows the chi-square result of the variable that was used for predicting the OLOS at Edelweis Hospital.Day, month, patient visit time, number of ICDs, doctor id, number of treatments, poly type, and insurance variable have the asymptotic significance (pvalue) is 0.00 which means that all variables effect the OLOS duration at Edelweis Hospital because the p-value is less than 0.005.

2581-8341 Volume 06 Issue 12 December 2023 DOI: 10.47191/ijcsrr/V6-i12-66, Impact Factor: 6.789 IJCSRR @ 2023 www.ijcsrr.org 8147
Table 6shows the importance of all variables.The primary factors that influenced the length of stay for outpatient visits at Edelweis Hospital are patient visit time, doctor id, and with the value of 0.43, 0.038, and 0.013, respectively.The findings regarding the important variables in this study align with the research conducted by Guorui et al.(2021), which emphasized that patient visit time is the most influential factor affecting outpatient waiting time.