Modeling and Forecasting of Tourism Demand in Malaysia

The pattern of foreign tourist demand to Malaysia is analysed and forecasted using time series method and non-linear technique. There are nine selected countries that contribute a lot to tourist arrivals to Malaysia, namely Australia, Brunei, China, Indonesia, India, Japan, the Philippines, South Korea, and the United Kingdom. Box-Jenkins time series method and Singular Spectrum Analysis are conducted and compared to study the best model to forecast the foreign tourist demand to Malaysia. Monthly data of tourism arrival in 1990 to 2014 were used and the forecasting were compared with 2015. Based on the results obtained, the forecasting model of Box-Jenkins time series method is the best model based on the percentage accuracy in forecasting the tourist demand to Malaysia.


INTRODUCTION
Tourism is derived from the word tour or touring which means visiting while sightseeing. It can also be defined as the act of leaving home going to a destination with certain reason or purpose and with the intention to return to the original location. Islamic Tourism Centre under the Ministry of Tourism and Culture Malaysia has defined Islamic tourism as a field or industry related to a tour to visit places with Islamic history, getting to know Islamic culture and heritage as well as searching and understanding the Islamic community way of life.
The tourism industry, from the perspective of the national economy, is one of the most important industries, although initially, the government does not emphasize the tourism industry and focuses on the country's infrastructure development instead. In the 1960s, the tourism industry was yet to play an important role towards the country's economic development. The government also focused on natural resources sectors such as tin and rubber that contributed 45% to Gross National Product (GNP). The tourism began to improve positively in the 1970s. Therefore, the tourism industry is a growing and expanding industrial sector each year generating substantial profits to the country's economy in the form of foreign exchange and tourism is a means to introduce Malaysia globally. In Malaysia, the tourism sector is the fifth largest sector after the financial services, oil palm, wholesale and retail, gas and energy as well as petroleum sectors. According to the history of the Ministry of Tourism and Cu lture Malaysia, in 1987, the government showed support to the tourism sector by establishing the Ministry of Culture and Tourism. However, in April 2004, a special ministry known as the Ministry of Tourism Malaysia was set up to manage all matters concerning tourism and to proof the government's commitment to promoting the country as one of the contributing revenues to the country. After the 13 th General Election, Ministry of Tourism Malaysia was renamed Ministry of Tourism and Culture Malaysia as the tourism and cultural sectors are closely linked in promoting the country as a tourist destination which is in line with the slogan "Malaysia, Truly Asia" (Official Portal of Ministry of Tourism and Culture Malaysia, May 2015). The government and the private sector are promoting Malaysia through various activities. Various government-run programs that cover culture, arts, national festivals, sports, conventions and exhibitions. The programs such as sports tournaments were introduced namely the Tour de Langkawi Bicycle Tournament and the Langkawi Maritime and Aerospace Exhibition (LIMA). With the establishment of the international sport, the government can introduce Malaysia to delegates or athletes from overseas to attract tourists to come to Malaysia and prove that the nation is able to do well in terms of international sports and excellence in tourism. As a result, tourists will keep continue to come to Malaysia for their tourism destination. The campaign of "Visit Malaysia Year 1990" was launched and was continued to the following year in 1994, 2007, 2008 and 2014. This campaign has made the numbers of tourist arrival to Malaysia increased. Fig. 1 shows the total number of tourist arrival in Malaysia from 1990 to 2014.   Table 1 shows the number of tourist arrivals to Malaysia and tourism income. In Table 1, '+' shows an increase in the number of tourist arrivals compared to the previous year. While '-' showed a decrease in the number of tourist arrivals during the year compared to the previous year. In 1990, the number of tourist arrivals to Malaysia increased dramatically by 7.4 million people compared to 4.8 million in 1989. The growth had increased by 53.64%, and it had a positive impact on tourism revenue which also increased by 60.56%. In 1999In , 2000In , 2007 and 2014, respectively, it saw a steady increase in tourist arrivals to 7.9   20.9 million and 27.4 million respectively. However, there were several years in the middle of 1990 to 2014, a sharp decline due to economic recession, international political turmoil, natural disasters, and disease. According to the Malaysia Tourism Promotion Board, total income also increased from RM 4500 million in 1990 to generating RM 72000 million in revenue in 2014. This proved tourism industry could help the country's economic growth.
Indirectly, the tourism industry has affects Gross Domestic Product (GDP), employment opportunities, export earnings and investments. According to World Travel & Tourism Council, WTTC (2015), in 2014, the tourism sector contributed RM161 billion representing 14.9% of GDP and 1 770 000 employment opportunities. In addition, the tourism sector generates RM 74 billion representing 8.6% of total exports, in fact, investments also contribute RM 19 billion representing 6.8% of total investment in 2014. Thus, the tourism sector is one of the largest contributors to generating income country.
The main arrival of tourists to Malaysia are mostly from Asia and ASEAN countries. Countries that are the major contributors to the Malaysian market are Australia, Brunei, China, Indonesia, India, Japan, the Philippines, South Korea and the United Kingdom. Figure 3 shows the breakdown of the number of tourist arrivals to Malaysia by country in 2000-2014.
Thus, this study focuses on nine countries which are the major contributors to the Malaysian tourism market listed in the Annual Tourism Report by the Malaysian Tourism Promotion Board. The list of countries is Australia, Brunei, China, Philippines, India, Indonesia, Japan, South Korea and the United Kingdom. In this study, tourist arrivals from Singapore and Thailand were not taken into account as Singapore and Thailand frequently came to Malaysia due to the price of oil and the price of Malaysian goods and the Malaysian currency was cheaper than their country. Therefore, nine elected countries are reviewed based on their interests as tourists or representatives based on their specific goals. For example, Indonesia is the main contributor to the tourist arrivals to Malaysia followed by Brunei and the Philippines. These are the three ASEAN countries. While, the markets from East Asia are tourists from China, Japan and South Korea. While India is an important market from South Asia, Australia is an important marketplace of Australasia, and the United Kingdom is an important market from Europe.  This study aims to forecast tourism demand for nine major tourism markets in Malaysia. Forecasting tourism demand in Malaysia is important as it helps generate income to the country and the findings can help the government to formulate better tourism plans and enhance the tourism promotion. The timeframe of the study dates from 1990 to 2014 while data in 2015 are kept for comparison in predictions and to test the accuracy of the model. Most data on tourism demand is non-stationary. However, the non-stationary state of the data is often negligible by many researchers in the field of tourism. Therefore, this study takes into account the non-stationary state of the data and also models the tourism demand especially the tourist arrivals using Box-Jenkins time series method such as the Autoregressive Integrated Moving Average (ARIMA) Model and Seasonal Autoregressive Integrated Moving Average (SARIMA) Model as well as Singular Spectrum Analysis (SSA). Next, this study forecasts the foreign tourist arrivals to Malaysia using Box-Jenkins time series model and non-linear approach that is the Singular Spectrum Analysis (SSA) model as well as compares Box-Jenkins time series model and SSA model to determine the best model.

MATERIALS AND METHODS
Over the last four decades, the time series model has been widely used for tourism by using the Autoregressive Integrated Moving Average model (ARIMA) proposed by Box and Jenkins (1976). In the 2000s, ARIMA was modified to the Seasonal Autoregressive Integrated Moving Average model (SARIMA). For the past few years, SARIMA is a popular time series forecasting technique due to the dominant feature of the tourism industry being seasonal in which decision makers are keen on seasonal variations in tourism demand. Performance forecasting of ARIMA and SARIMA models presents conflicting evidence in empirical studies. For example, Goh and Law (2002) suggest that the SARIMA model addresses eight other time series methods, while the ARIMA model considers average performance for all forecasting models. However, Smeral and Wuger (2005) find that the ARIMA and SARIMA models cannot cope with the Naive 1 model.
According to Witt et al. (1995), the Naive 1 model is the simplest and most often used model to generate more accurate model forecasting for the next year compared to other more sophisticated models. However, the performance of the Naive 1 model cannot make the right decision when there is a long-term structural change and prediction (Witt et al. 1995). Therefore, Model Naive 2 is introduced and widely used when there is continuous flow in data. Most of the tourism demand undertaken by previous researchers use linear techniques rather than nonlinear techniques. Among the linear techniques used include artificial neural networks, solid polynomial models, fuzzy clock series, GARCH models, transitioned parameters and varying times, arc learning models, time series structure models and nonlinear sine wave models (Song and Li 2008).
In recent years the technique known as single-spectrum analysis is a powerful technique developed in the field of time series analysis. Single Spectrum Analysis (SSA) is one of the nonparametric methods of time series analysis and projections that consist of many different but interconnected methods. There are several books devoted to SSA ( Hassani (2007) reviewed the performance of SSA techniques using the monthly death data series of mortality accidents in the United States. The forecasting results of the study have been compared with the SARIMA model, ARAR algorithm and Holt-Winter algorithm. The results show that the SSA model provides better predictive results than other methods. Hassani et al. (2015) also reviews SSA forecasts using monthly data for US tourist arrivals during the period 1996 to 2012. SSA's prediction method compared to other forecasting methods including ARIMA, exponential smoothing (ES) and neural network (ANN). The results of the study show that the SSA method yields better forecasts than alternative methods in the forecast of tourist arrivals to the United States.
Therefore, the Box-Jenkins time series model (ARIMA and SARIMA) and Singular Spectrum Analysis are more accurate models to be used for large sample. The Singular Spectrum Analysis is an alternative and extended research method that is performed to compare the performance of a Singular Spectrum Analysis with Box-Jenkins time series forecasting technique.

A. Box-Jenkins Time Series Method
Box-Jenkins method is one of the univariate methods used for time series economic forecasting. This method is able to determine the best model by using a systematic statistical procedure. This model uses variable values based on past observations and random errors to forecast future value of variables. In general, ARIMA and SARIMA series are the extension models of ARMA that include more realistic dynamics. ARIMA time series can be modelled as a combination of past values and involves parameters (p, d, q) where p is the number of autoregression parameter, d is the difference parameter, while q is the moving average parameter. There are four (4) forecasting processes using ARIMA as in Figure 4 (Newton, 1988): The best model will be used to predict the value of the future series. According to Bowerman and Connell (1990), the classic Box Jenkins model can describe the data well. Therefore, the data used must be stationary. The data is said to be stationary in the time series, when the trend is not ascending and decreasing, but has constant mean and variance.
The ARIMA and SARIMA models usually are not stationary and has seasonal data. These model need to re-modelling by removing the source of variation which is not stationary and usually being done by making a difference in the series. When t x is not stationary, the ARMA model is on a t w series constructing on the decision is determined by the difference in the series (generally d = 1); Therefore, the ARIMA model is ARMA model defined in the original process of the difference d: is known as the general autoregressive operator and d t X  is a stationarity quantity through the difference. While seasonal ARIMA or SARIMA is a series of data that having seasonal components repeating each observation L.
For monthly data, L = 12 which contains 12 months in a year while for quarterly data, L = 4. Therefore, the SARIMA model is

Identify Models
Parameters p, d, q are identified the model is identified based on Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF). (FAKS)

Model Estimates
Estimate the value of model parameters based on past time series data being identified tentatively.

Diagnostic Test
This test is performed to verify the adequacy of the model used.

Forecasting
, and can be written as: In order to determine the models (p, d, q), ACF and PACF are used to observe the form of ACF and PACF coefficients that are declining to zero. The p-value is for autoregression (AR), the d-value is the number of differences and the q-value is for moving average (MA). The p-value is determined when a stationary PACF plot is highlighted. If the partial correlation value is truncated on k intervals and in a stationary state after that; the intervals value is used. In the case of the q-value, the stationary ACF plot is being highlighted. Similar to determining the p parameter, the truncated ACF intervals value which later in a stationary state is obtained. The number of differences is performed on the d-value; if the data is non-stationary.
If both ACF and PACF plots are truncated, observations are performed to see which plots are most significantly truncated and declined (Bowerman et al., 1987). The following is the equation of non-seasonal Box-Jenkins and autoregression moving average models in order (p, q), or ARMA (p, q): Whereas the equation for the seasonal Box-Jenkins model, the seasonal autoregression model of p order is as follows: Apart from that, the seasonal moving average model of q order is as the following equation; Lastly, the non-seasonal and seasonal levels are combined to obtain a tentative model. Once a suitable model is obtained, the parameters are identified. Based on the smallest value of Akaike Information Criteria (AIC), the best model can be determined. Therefore, the estimation of significant parameters is included in the ARIMA or SARIMA equations.
The diagnostic test is performed to test the suitability of the model before performing the forecasting. The best way to test the entire Box-Jenkins model is by analysing the residue of the model. Before estimating the equation of Box-Jenkins model, the researcher should examine the residue to prove the serial correlation. One of the methods that can be used is Ljung Box statistics. Therefore, the model specification is sufficient if the p-value for Ljung Box statistics is greater than the significance level of 0.05.
Forecasting can be performed when an appropriate model is identified. The forecasting value is then compared with the real data. According to Lewis (1992), if the Mean Absolute Percentage Error (MAPE) is less than 10%, it can be considered as a very accurate prediction, so the forecasting model is very good.

B. Singular Spectrum Analysis (SSA)
Singular Spectrum Analysis (SSA) is a powerful time-series analysis technique that combines classical elements of time series, multivariate statistics, multivariate geometry, dynamic system and signal processing (Hassani, 2007).
The purpose of the SSA is to decompose the original time series into a smaller amount of free and assessable components such as different slow trends, oscillating and less structured noise components. SSA is a very useful method to solve the following problems: i. Finding trends for resolution differences; ii.
Simultaneous extraction of cycles with small and large periods; v.
Extraction of periodicities with various amplitudes; Finding structure in short time series; viii.
Change-point detection.
The main purpose of the SSA is to decompose the original time series into a several time series known as trends, oscillation components or partial oscillation components which are known as amplitude modulation or noise. Then, the original time series are reconstructed. The SSA method consists of two (2) complementary stages; decomposition and reconstruction. Both include two (2) separate steps. The first stage involves the decomposition of time series and the second stage involves the reconstruction of the original time series and using a series of construction without sound for new data point forecasting. Monthly data of tourist arrivals from the country of origin to Malaysia from 1990 to 2014 are used. The data for SSA model is transformed to facilitate the comparison between models in forecasting. This is because the value of forecasting precision for the SSA model becomes larger and cannot be compared. In addition, the researcher uses the SSA method on this data with the purpose to illustrate the ability of the SSA method in the decomposition of trend, oscillation, noise, and forecast. The results are obtained through the R software.
The SSA forecasting method is divided into two namely; recurrence predictions and vector predictions. In repeated predictions, the diaphragm is applied to obtain a reconstructed series and then using linear recurrence relations (LRR) method whereby LRR are associated with autoregressive (AR) models. Whereas, in vector predictions, the SSA steps exchanged. This shows that vector predictions are more stable but have greater computation costs than recurrence predictions.
If the time series components are separated from noise and use linear recurrence function; LRF, then the two predictions coincide and provide precise continuity. In isolation, the study provides different continuity. Since LRF provides the basis for recurrence predictions, this study uses the LRR method further. It is very useful in selecting parameters and understanding the forecasting behaviour. The SSA forecast can be used for a time series that meets the linear recurrence formula (LRF).

C. Measuring Model Precision
In order to evaluate the accuracy of time series forecasting and non-linear model, the t-value is set as t = 12 in the first year. This is due to the fact that this study uses monthly data. For ex-post forecasting process of tourist arrivals to Malaysia to forecast the monthly data for 2015, the process uses the monthly data from 1990 to 2014.
The forecasting accuracy is assessed on various methods to understand the precision of the forecasting of the model. In this study, the researcher uses the MAPE method which is the most suitable to be used in assessing the performance of the tourism demand model forecasting. This measurement is defined by taking yt as the original data or observation and t y  is the forecasted data by the researcher at a given time (t) and the number of forecasts (i).
MAPE is a relative performance measure used to measure the forecasting performance. Table 2 shows the accuracy level of the MAPE test. Forecasting results with a MAPE value of <10% can be assumed generating a very precise forecasting (Lewis, 1982).     Table 6 shows the equations of estimates for ARIMA model time series for tourism arrivals of nine (9) major markets.   According to Lewis (1982), the MAPE values that are lesser than 10% show this model is very precise in which it is suitable for future forecasting against the model. Based on Table 6, it is found that all Box-Jenkins models show MAPE results that are lesser than 10% in which they are suitable to be used to generate forecasting. This is especially for the Philippines, Indonesia, and Japan that have MAPE values close to 0%. This proves that the lower MAPE value generates smaller percentage errors and the forecasting derived from the model is more precise. This shows that all models are suitable to be used for future forecasting of tourism arrivals in Malaysia.

B. Singular Spectrum Analysis (SSA)
Singular Spectrum Analysis is used to extract trend, seasonal, noise, and forecasting. Hence, the analysis uses the monthly data from 1990 to 2014 against the nine (9) major markets that greatly influence the tourist arrivals in Malaysia. In general, the purpose of this analysis is to decompose the original series into a number of series, each of which is known as a trend, seasonal components, and noise. Next, the original series is rebuilt. This analysis is divided into two (2) levels in which each level has two (2) separate ways. In this study, the first stage involves the decomposition of the tourist arrivals series in Malaysia and the second level involves the reconstruction of the tourist arrival series in Malaysia; the series is used to forecast future tourist arrivals in 241 Malaysia without noise. The data is transformed into logs so that it is easy to compare the forecasting of all two models. Figure 1 represents the tourist arrivals in Malaysia for the nine (9) major markets against their respective countries of origin. Therefore, the levels as described by Hassani (2007), there are two (2) stages of the application stage. The application of a single parameter is the length of the window, L. The periodic components are used in this study due to seasonal factors in the time series data of tourist arrivals in Malaysia. In order to obtain a good single-parameter application based on the seasonal components, the length of the window is separated with the said proportionality, therefore, L = 144. So, based on the length of the window and trajectory of the SVD matrix is (144 × 144), the study has 144 eigentriples for this decomposition step. The SVD step is shown through the optimum components of the first 12 eigentriples of nine (9) major tourism markets in Malaysia for their respective countries of origin that show the percentage of optimum components decreases proving that the coordinates of optimum components are almost constant. Additionally, proper groupings enable the process of obtaining the trends, harmony components and noise, and even increasing the ability to develop the appropriate model. As such, the additional information is known as a bridge between the decomposition and reconstruction steps. Each harmony component with different frequency yields two (2) eigentriples approaching a single value. The original noise series, as a condition, yields a single value of a sequence that is decreasing. Thus, L = 12 plot is used. All countries confirm that the first eigentriples correspond to the trend. Meanwhile, other eigenes contain high frequencies. Therefore, it is not related to the trend.
After identifying the trend, the pair by pair analysis of a single vector of scatter plot is used to produce visuals to identify eigentriples corresponding to seasonal components series with a condition that the seasonal components are separated with signal components. Table 8 represents the trend components and eigentriples seasonal components against the tourism arrivals series in Malaysia for nine (9) major markets: The eigentriples listed in Table 7 correspond to the periods of 12, 6, 2.4, 3 and 4 produced by the seasonal components and are clearly explained by the periodogram analysis.
The original series of periodogram analysis and eigenvectors assist in creating proper groupings. The periodogram is to study the frequency of eigentriples that coincide with the frequency of the original series. If the periodogram eigenvector has a sharp pitch around a few frequencies, then the eigentriples are related to the signal component. Generally, eigentriples value is found to be lesser than 12 which indicates the monthly period. If pairing eigentriples have a period value of more than 12, it means they are not interpreted for monthly data and they are considered as noise, n. The main concept of the SSA model is separability that describes how the differences in components are separated from each other. In this study, the separability of each major market shows the w-correlation for the reconstruction of 30 components from black and white that correspond to the absolute value of correlation from 0-1.

i.
SSA Forecasting Forecasting in a single spectral analysis has two (2) methods, namely recurrent method and vector method. In this study, it is found that the recurrent method is closer to the actual value compared to the vector method. Hence, the forecasting of tourism arrivals to Malaysia for 2015 using the data from 1990 to 2014 is plotted using the recurrent method. The analysis uses MAPE to measure the forecasting performance. The MAPE forecasting results are shown in Table 9. The MAPE value is in percentage.  Lewis (1982) states that the MAPE value <10% is a very precise model for the forecasting. Based on Table 8, all SSA models are found to yield MAPE values of less than 10%. Therefore, the SSA model is considered to be a very precise forecasting model. For the studied countries, the MAPE values nearing zero indicates a very small percentage error. Therefore, the model is said to be better and suitable to be used. This proves that the model is suitable to be used to forecast future tourism in Malaysia.

C. Box-Jenkins and Singular Spectrum Analysis Comparison
In this section, the comparison between Box-Jenkins and SSA models is discussed. Therefore, the MAPE values are used to compare the two (2) models studied. The lowest MAPE value shows a more suitable and more accurate forecasting model. Table  10 shows the results of MAPE values for the two (2) models. Based on Table 10, generally, the forecasting outcome of both models have MAPE values of less than 10% and can be considered to generate a very precise forecasting. Overall, the Box-Jenkins model has the lower MAPE value than the SSA model. Additionally, the MAPE value nearing zero proves that the model's forecasting generates a smaller percentage error, especially for the Philippines, Indonesia, and Japan. Therefore, the lower MAPE value proves that the forecasting derived from the model is more precise. In conclusion, the best model is Box-Jenkins in which it takes into account the seasonal factors. While the SSA does not take into account the seasonal factors, thus, the model accuracy can be proven.

CONCLUSIONS
The tourism industry plays an important role and makes one of the most important contributors to the Malaysian economy. The tourism industry is one of the sources of income generated by foreign exchange inflows. Malaysia is well known for its halal destinations and focuses on foreign tourists, especially the Middle East and Southeast Asia. The purpose of this study is to determine the best forecasting method and suitable for use in tourism demand in Malaysia. Overall, this study concludes that the Box Jenkins time series model is best compared to the non-linear SSA model. The result of MAPE value below 1% proves that the Box Jenkins model generates smaller percentage of errors and the predictions derived from the model are very accurate.
The best empirical prediction method used is the Box Jenkins model for most countries such as Australia, Brunei, Philippines, India, Indonesia, South Korea, and the United Kingdom when compared to the SSA model. While for China and Japan shows that the SSA model is the best model. In terms of planning, this model should be a major research model in the study of international tourism demand. Since this model has been set up, the prediction results process can help in the development and investment strategies of the tourism industry in the future. In overall, both the Box Jenkins and SSA models have MAPE forecasting values of less than 10%. It can be concluded that both models are appropriated to predict future tourism demand in Malaysia.
As a basic implication, efforts from all parties are needed to enhance the tourism industry in Malaysia. The government should coordinate and monitor the quality of their services and tourism products. Sufficient, comfortable, economical and affordable service attracts more foreign tourists to Malaysia. Additionally, the government can also reduce the goods purchase tax, immigration regulations need to be relaxed and increase the foreign exchange service centre. This is because foreign tourists can shop in Malaysia with moderate spending. In fact, the diversity of services and tourism products should be provided to meet different travelers' needs. At the same time, governments and Malaysians can join forces to organize tourism promotion campaigns both inside and outside the country. In fact, Malaysians can together help to enhance the tourism field with a friendly and helpful attitude towards tourists. The findings clearly show that this factor is very significant in attracting foreign tourists to Malaysia.