The online services of municipalities in Italy: the digital divide
DOI:
https://doi.org/10.71014/sieds.v79i4.386Abstract
In Italy, the enhancement of online public services is one of the main goals of the National Recovery and Resilience Plan (NRRP), including relevant investment on digital identity. To this respect, traditional methods for data collection in official statistics may lack the capability to grasp relevant information to assess the digital transition of public institutions, especially in a timely fashion.
The main objective of the paper is to measure in a systematic manner the capacity of local institutions in offering online public services, by collecting relevant information directly through municipal websites using Machine Learning (ML) techniques, while highlighting territorial digital gaps in providing online services. Particularly, the study aims to develop an automatic classification framework to investigate whether and to what extent Italian municipalities implement the digital identity system. This is achieved by comparing the effectiveness of random forest and naive Bayes supervised ML algorithms commonly used for text classification. The classification procedure is based on two different approaches: 1) the integration and use of auxiliary online sources with official statistics sources, such as the Permanent Census of Public Institutions conducted by the Italian National Institute of Statistics (ISTAT) in 2023, and 2) gathering information on relevant features of municipalities’ websites by means of web scraping techniques.
By combining official statistics information with big data, the analysis draws the attention on municipalities’ digital divide, by comparing online access to public services of citizens living in different areas of the Country, e.g. regions and provinces.
References
BREIMAN L. 2001. Random Forests. Machine Learning, Vol. 45, No. 1, pp. 5-32. DOI: https://doi.org/10.1023/A:1010933404324
DE FAUSTI F., PUGLIESE F., ZARDETTO D. 2019. Towards automated website classification by deep learning. arXiv preprint arXiv:1910.0999
DE PANIZZA A., LOMBARDI S. 2024. Rapporto sulle istituzioni pubbliche, Istat, ISBN 978-88-458-2146-2.
EFRON B., TIBSHIRANI R. J. 1994. An introduction to the bootstrap. Chapman and Hall/CRC.] DOI: https://doi.org/10.1201/9780429246593
EUROPEAN COMMISSION 2023. Communication establishing the Union-level projected trajectories for the digital targets.
ISTAT 2022. Permanent Census of Public Institutions. https://www.istat.it/statistiche-per-temi/censimenti/istituzioni-pubbliche/
ISTAT 2022. Survey on information and communication technologies in the PA.
KOHAVI R. 1995. A study of cross-validation and bootstrap for accuracy estimation and model selection. Morgan Kaufman Publishing.
MANNING C. D. 2009. An introduction to information retrieval. DOI: https://doi.org/10.1017/CBO9780511809071
MCCALLUM A., NIGAM K. 1998. A comparison of event models for Naive Bayes text classification. In AAAI-98 workshop on learning for text categorization.
PORTER M. F., 2001. An algorithm for suffix stripping, Program, Vol. 14, No.3, pp. 130-137. http://snowball.tartarus.org/texts/introduction.html DOI: https://doi.org/10.1108/eb046814
PRESIDENZA DEL CONSIGLIO DEI MINISTRI, 2021. Piano Nazionale di Ripresa e resilienza.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Chiara Orsini, Fabrizio De Fausti, Sergio Leonardi

This work is licensed under a Creative Commons Attribution 4.0 International License.