Abstract :
With the increasing demand for efficient and responsive customer service in the banking sector, artificial intelligence offers a promising solution. This paper presents a comparative analysis of artificial intelligence methodologies applied to intent classification within the banking sector customer service domain. Utilizing a comprehensive dataset of banking service inquiries, we evaluate several machine learning approaches, including Naive Bayes, Logistic Regression, Support Vector Machine with Linear Kernel, Random Forest, XGBoost, and the transformer-based DistilBERT model. The models are assessed based on their accuracy, precision, recall, and F1 score metrics. Our findings indicate that DistilBERT, with its distilled architecture, not only outstrips traditional models but also demonstrates exceptional performance with an accuracy and F1 score exceeding 92%. The paper delves into the advantages of employing such an efficient and powerful model in real-time customer service settings, suggesting that DistilBERT offers a substantial enhancement over conventional methods. By providing detailed insights into the model’s capabilities, we underscore the transformative impact of employing advanced AI in the financial industry to elevate customer service standards, streamline operational efficiency, and harness the power of state-of-the-art technology for improved client interactions. The results showcased in this study are indicative of the strides being made in AI applications for financial services and set a benchmark for future exploratory and practical endeavors in the field.
Keywords :
Banking sector, DistilBERT, Intent Classification, natural language processing (NLP)., Transformer Models.References :
- Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. [CrossRef] [Google Scholar] [Publisher Link]
- Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., … & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.[CrossRef] [Google Scholar] [Publisher Link]
- Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2019). DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108.[CrossRef] [Google Scholar] [Publisher Link]
- Larson, S., Mahendran, A., Peper, J. J., Clarke, C., Lee, A., Hill, P., … & Mars, J. (2019). An evaluation dataset for intent classification and out-of-scope prediction. arXiv preprint arXiv:1909.02027.[CrossRef] [Google Scholar] [Publisher Link]
- Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., … & Liu, P. J. (2020). Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of machine learning research, 21(140), 1-67.[CrossRef] [Google Scholar] [Publisher Link]
- Hugging Face. (2022). Banking77 [Data set]. Hugging Face Datasets.
- Schick, T., & Schütze, H. (2020). Exploiting cloze questions for few shot text classification and natural language inference. arXiv preprint arXiv:2001.07676.[CrossRef] [Google Scholar] [Publisher Link]
- Gao, T., Fisch, A., & Chen, D. (2020). Making pre-trained language models better few-shot learners. arXiv preprint arXiv:2012.15723.[CrossRef] [Google Scholar] [Publisher Link]
- Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780.[CrossRef] [Google Scholar] [Publisher Link]
- Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.[CrossRef] [Google Scholar] [Publisher Link]
- Pennington, J., Socher, R., & Manning, C. D. (2014, October). Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 1532-1543).[CrossRef] [Google Scholar] [Publisher Link]
- Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2017). Enriching word vectors with subword information. Transactions of the association for computational linguistics, 5, 135-146. [CrossRef] [Google Scholar] [Publisher Link]
- Peters, M. E., Neumann, M., Zettlemoyer, L., & Yih, W. T. (2018). Dissecting contextual word embeddings: Architecture and representation. arXiv preprint arXiv:1808.08949. [CrossRef] [Google Scholar] [Publisher Link]
- Howard, J., & Ruder, S. (2018). Universal language model fine-tuning for text classification. arXiv preprint arXiv:1801.06146.[CrossRef] [Google Scholar] [Publisher Link]
- Radford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018). Improving language understanding by generative pre-training. [CrossRef] [Google Scholar] [Publisher Link]
- Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R. R., & Le, Q. V. (2019). Xlnet: Generalized autoregressive pretraining for language understanding. Advances in neural information processing systems, 32.[CrossRef] [Google Scholar] [Publisher Link]
- Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., & Soricut, R. (2019). Albert: A lite bert for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942.[CrossRef] [Google Scholar] [Publisher Link]
- Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., … & Stoyanov, V. (2019). Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.[CrossRef] [Google Scholar] [Publisher Link]
- Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., … & Amodei, D. (2020). Language models are few-shot learners. Advances in neural information processing systems, 33, 1877-1901.[CrossRef] [Google Scholar] [Publisher Link]
- Yin, W., Hay, J., & Roth, D. (2019). Benchmarking zero-shot text classification: Datasets, evaluation and entailment approach. arXiv preprint arXiv:1909.00161. [CrossRef] [Google Scholar] [Publisher Link]
- González-Carvajal, S., & Garrido-Merchán, E. C. (2020). Comparing BERT against traditional machine learning text classification. arXiv preprint arXiv:2005.13012.[CrossRef] [Google Scholar] [Publisher Link]
- Bowman, S. R., & Dahl, G. E. (2021). What will it take to fix benchmarking in natural language understanding?. arXiv preprint arXiv:2104.02145. [CrossRef] [Google Scholar] [Publisher Link]
- Sun, S., Cheng, Y., Gan, Z., & Liu, J. (2019). Patient knowledge distillation for bert model compression. arXiv preprint arXiv:1908.09355. [CrossRef] [Google Scholar] [Publisher Link]
- Phang, J., Févry, T., & Bowman, S. R. (2018). Sentence encoders on stilts: Supplementary training on intermediate labeled-data tasks. arXiv preprint arXiv:1811.01088. [CrossRef] [Google Scholar] [Publisher Link]
- Tang, R., Lu, Y., Liu, L., Mou, L., Vechtomova, O., & Lin, J. (2019). Distilling task-specific knowledge from bert into simple neural networks. arXiv preprint arXiv:1903.12136. [CrossRef] [Google Scholar] [Publisher Link]
- Vu, T., Wang, T., Munkhdalai, T., Sordoni, A., Trischler, A., Mattarella-Micke, A., … & Iyyer, M. (2020). Exploring and predicting transferability across NLP tasks. arXiv preprint arXiv:2005.00770.[CrossRef] [Google Scholar] [Publisher Link]
- Pruksachatkun, Y., Phang, J., Liu, H., Htut, P. M., Zhang, X., Pang, R. Y., … & Bowman, S. R. (2020). Intermediate-task transfer learning with pretrained models for natural language understanding: When and why does it work?. arXiv preprint arXiv:2005.00628.[CrossRef] [Google Scholar] [Publisher Link]
- Jiao, X., Yin, Y., Shang, L., Jiang, X., Chen, X., Li, L., … & Liu, Q. (2019). Tinybert: Distilling bert for natural language understanding. arXiv preprint arXiv:1909.10351. [CrossRef] [Google Scholar] [Publisher Link]
- Hendrycks, D., Burns, C., Chen, A., & Ball, S. (2021). Cuad: An expert-annotated nlp dataset for legal contract review. arXiv preprint arXiv:2103.06268. [CrossRef] [Google Scholar] [Publisher Link]
- Liu, Z., Wang, H., Niu, Z. Y., Wu, H., Che, W., & Liu, T. (2020). Towards conversational recommendation over multi-type dialogs. arXiv preprint arXiv:2005.03954. [CrossRef] [Google Scholar] [Publisher Link]
- Casanueva, I., Temčinas, T., Gerz, D., Henderson, M., & Vulić, I. (2020). Efficient intent detection with dual sentence encoders. arXiv preprint arXiv:2003.04807.[CrossRef] [Google Scholar] [Publisher Link]
- He, H., & Garcia, E. A. (2009). Learning from imbalanced data. IEEE Transactions on knowledge and data engineering, 21(9), 1263-1284. [CrossRef] [Google Scholar] [Publisher Link]
- Yun-tao, Z., Ling, G., & Yong-cheng, W. (2005). An improved TF-IDF approach for text classification. Journal of Zhejiang University-Science A, 6(1), 49-55. [CrossRef] [Google Scholar] [Publisher Link]
- Trstenjak, B., Mikac, S., & Donko, D. (2014). KNN with TF-IDF based framework for text categorization. Procedia Engineering, 69, 1356-1364. [CrossRef] [Google Scholar] [Publisher Link]
- Robertson, S. (2004). Understanding inverse document frequency: on theoretical arguments for IDF. Journal of documentation, 60(5), 503-520. [CrossRef] [Google Scholar] [Publisher Link]