Predicting Customer Satisfaction through Sentiment Analysis on Online Review
User-generated content, such as user reviews, posts, tags, ratings, and opinions on the internet, can be used as a business indicator if collected and appropriately analyzed. One of the examples is predicting customer satisfaction through implementing big data analytics on online reviews. In analyzing the user-generated content to predict customer satisfaction, the author implements machine learning approach using the Sentiment Analysis method. Five-fold cross-validation was performed to train the classification model. The training was performed with a combination of tokenization methods: term frequency-inverse document frequency (tf-idf) and bag-of-words; n-gram types: unigram, bigram, trigram, and combination of unigram, bigram, and trigram; and machine learning algorithms: linear support vector classification (LinearSVC) and multinomial naïve bayes (MultinomialNB). The result was then evaluated using classification performance metrics such as precision, recall, F1 measure, and AUC score.
The result shows that the tf-idf vectorizer performs similarly to the bag-of-words method. A similar result was also observed for machine learning algorithm selection. Both MultinomialNB and LinearSVC produce the same performance. Low-level n-grams (such as unigrams and bigrams) tended to have higher precision, recall, F1 measure, and AUC score than high-order n-grams (such as trigrams). The best results were achieved by combining unigrams, bigrams, and trigrams, resulting in an average performance score of 0.94 for all measurements. From the result and analysis, the author finds that predicting customer satisfaction using text and sentiment analysis methods on user-generated content is possible. The model’s performance in this experiment is decent, with high precision, recall, F1, and AUC score.