Automated ticket classification for training ISTAT's PUC chatbot

Authors

  • Samanta Pietropaoli ISTAT
  • Gabriella Fazzi ISTAT

DOI:

https://doi.org/10.71014/sieds.v80i3.475

Abstract

The increasing volume of user requests handled by ISTAT’s contact center for supporting participants in official statistical survey has underscored the need for automated solutions to optimise ticket classification and reduce reliance on manual processing.

This study presents the development and evaluation of a supervised classification system that leverages Natural Language Processing (NLP) techniques to enhance the accuracy, efficiency and scalability of request management within a public administration context.

The proposed framework integrates a TF-IDF-based text representation with synthetic oversampling (SMOTE) and three supervised learning algorithms: Random Forest, LightGBM, and Multilayer Perceptron. The methodology also incorporates a tailored preprocessing pipeline—covering tokenisation, lemmatisation, stopword removal, and anonymisation of personal information—to ensure data quality and privacy compliance.

The classification system was designed to support the training phase of ISTAT’s PUC chatbot, which will provide first-level assistance to citizens and establishments involved in statistical surveys. By generating high-quality labelled data, this approach aims to improve chatbot intent recognition and facilitate self-service interactions for survey respondents.

Model performance was evaluated using standard classification metrics, including accuracy and both weighted and macro-averaged F1 scores.

Among the tested configurations, LightGBM demonstrated the most balanced and robust performance. The results confirm the effectiveness of integrating machine learning and NLP into institutional workflows. Future work will explore the integration of the classifier into the generative architecture of Salesforce Agentforce, contributing to the evolution of intelligent support systems in citizen-facing public services.

References

AOKI N. 2020. An experimental study of public trust in AI chatbots in the public sector. Government information quarterly, Vol .37, No.4.

BERZI A., BERÉNYI E., KÉPES Z., ANTAL B., VARGA Á. G., EMRI M. 2025. NLP-based removal of personally identifiable information from Hungarian electronic health records. Frontiers in Artificial Intelligence, Vol. 8, 1585260.

BIANCHI G., BELLINI G., BOSSO P., PAPA P. 2022. A machine learning based help-desk approach for units involved in official surveys. In Proceedings of UNECE Expert Meeting on Statistical Data Collection 2022.

BREIMAN, L. 2001. Random forests. Machine learning, Vol. 45, pp. 5-32.

CHEN,Y., ZOU J., LIU L., HU C. 2024. Improved Oversampling Algorithm for Imbalanced Data Based on K-Nearest Neighbor and Interpolation Process Optimization. Symmetry, Vol. 16, No. 3, p. 273.

EUROPEAN COMMISSION 2020. AI watch - artificial intelligence in public services. Technical report.

EUROPEAN COMMISSION 2024. Horizon 2020 - Automated, Transparent Citizen-Centric Public Policy Making based on Trusted Artificial Intelligence. Project web page. https://cordis.europa.eu/project/id/101004480/results.

FERNÁNDEZ A., GARCIA S., HERRERA F., CHAWLA N. V. 2018. SMOTE for learning from imbalanced data: progress and challenges, marking the 15-year anniversary. Journal of artificial intelligence research, Vol. 61, pp. 863-905.

GÉRON A. 2022. Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow: Concepts, tools, and techniques to build intelligent systems. O'Reilly Media, Inc.

HAYKIN S., 1994. Neural networks: a comprehensive foundation, Prentice Hall PTR.

INPS 2024. Linee guida sull’implementazione di sistemi di Intelligenza Artificiale in INPS. Direttiva del Direttore Generale n. 8 del 8 aprile 2024. Allegati circolare 14541.

KE G., MENG Q., FINLEY T., WANG T., CHEN W., MA YE Q., LIU T. Y. 2017. Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems, Vol. 30.

MANNING C. D., RAGHAVAN P., SCHÜTZE H. 2010. An introduction to information retrieval. In Mogotsi, I. C. (Ed.), Information retrieval, Cambridge University Press, pp. 192-195.

MARCEDDU A., Miccoli, M., Amicone, A., Marangoni, L., Risso, A. 2024. Artificial Intelligence for Urban Safety: A Case Study for reducing road accident in Genoa. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Vol. 48, pp. 131-138.

MÉNDEZ J. R., IGLESIAS E. L., FDEZ-RIVEROLA F., DÍAZ F., CORCHADO J. M. 2005. Tokenising, stemming and stopword removal on anti-spam filtering domain. In: Marín, R., Onaindía, E., Bugarín, A., Santos, J. (Eds) Current Topics in Artificial Intelligence. CAEPIA 2005. Lecture Notes in Computer Science, Vol 4177. Springer, Berlin, Heidelberg.

SALTON G., BUCKLEY C. 1988. Term-weighting approaches in automatic text retrieval. Information processing & management, Vol.24, No. 5, pp. 513-523.

TWIZEYIMANA J.D., ANDERSSON A. 2019. The public value of E-Government–A literature review. Government information quarterly, Vol. 36, No. 2.

VAN NOORDT C., MISURACA G. 20222. Artificial intelligence for the public sector: results of landscaping the use of AI in government across the European Union. Government information quarterly, Vol. 39, No. 3.

VASSILAKOPOULOU P., HAUG A., SALVESEN L. M., PAPPAS I. O. 2023. Developing human/AI interactions for chat-based customer services: lessons learned from the Norwegian government. European journal of information systems, Vol. 32, No. 1, pp. 10-22.

Downloads

Published

2026-02-26

Issue

Section

Articles