Web data sources and official statistical standards: the WIN experience

Authors

  • Giuseppina Ruocco Istat
  • Renato Magistro ISTAT
  • Giulio Massacci Istat

DOI:

https://doi.org/10.71014/sieds.v80i2.426

Keywords:

official statistical standards, web data, process standardization, use cases experience

Abstract

In order to promote the integration of web data sources for official statistics, Eurostat has launched a four-year initiative, the Web Intelligence Network project. One of the main objectives of the project is to create a network within the European Statistical System (ESS) and to develop a common infrastructure, the Web Intelligence Hub (WIH), providing services and tools for web data collection and management. The WIH is designed to support National Statistical Institutes (NSIs) in all stages of web data collection and processing. In the long term, the WIH can be further developed to explore the potential of additional innovative data sources for official statistics. During the WIN project, the architectural task has dealt with the Enhancement and Enrichment (E&E) of the architectural standard Big Data REference Architecture and Layers (BREAL). BREAL is an architectural framework developed in the Big Data II project to support NSIs in planning their investments in big data. It provides a set of tools for defining the business objectives, application components and data models needed to develop statistical processes based on big data. One of the main outcomes achieved by the architectural task is the specialization of BREAL for web data. The adoption and E&E of the BREAL framework has enabled the harmonization between the requirements of the project use cases and the services provided by the WIH. BREAL specialization results from the combination of the project experience and it is intended to promote the deployment of web data workflows in the production environment. This enhancement of BREAL enables a NSI to assess the maturity level of a use case. In addition, the adoption of this approach increases process standardization and the development of shareable tools that can be reused by other statistical organizations, and/or adapted to other use cases and/or statistical domains.

References

AA.VV. 2025. Chapter 5 National data ecosystems and governance. In The Handbook on Management and Organization of National Statistical Systems. Edited by UN, pp. 97-111. https://projects.officialstatistics.org/hb-mgnt-org-nss/handbook/intro.html .

ASCHERI A., MUSEUX J. M., WIRTHMANN A., GIANNAKOURIS K., KARLBERG M., BALDACCI E.2022 Innovation in the European Statistical System: Recent achievements and challenges ahead, Statistical Journal of the IAOS, Vol. 38, No. 3, pp. 805-813. https://doi.org/10.3233/SJI-220053.

AUNO V. et al. 2024. Deliverable 4.8: Quality Assessment for the Statistical Use of Web Scraped Data. Final version, 2024-11-20. Edited by Eurostat.

DAAS P. J. H. 2015. Big data as a source for official statistics, Journal of Official Statistics, Vol. 31, No.1, pp. 249–262. https://doi.org/10.1515/jos-2015-0016.

Generic Statistical Business Process Model – GSBPM. https://statswiki.unece.org/display/GSBPM

Generic Statistical Information Model – GSIM. https://statswiki.unece.org/display/gsim

Generic Activity Model for Statistical Organizations – GAMSO.

https://statswiki.unece.org/spaces/GAMSO/pages/105580149/Generic+Activity+Model+for+Statistical+Organizations

KOWARIK A. et al. 2021. Deliverable 4.1: Minimal guidelines and recommendations for implementation. Version 2021-09-23. Edited by Eurostat.

RUOCCO G. et al. 2025. Deliverable D4.7 BREAL ‐ Big Data REference Architecture and Layers for web scraped data. Final version, 2025‐03‐31. Edited by Eurostat.

SCANNAPIECO M. et al. 2021. (Deliverable F2) BREAL. Big Data Reference Architecture and Layers. Application layer and Information layer. Version 2021-03-31. Edited by Eurostat.

SCANNAPIECO M. et al. 2019. (Deliverable F1) BREAL. Big Data REference Architecture and Layers. Business Layer. Version 2019-12-09. Edited by Eurostat.

SIX, M. et al. 2025. Deliverable 4.6: WP4 Methodology report on using web scraped data. Final version, 2025-03-25. Edited by Eurostat.

SIX M. et al. 2025. Deliverable 4.5: Quality Guidelines for acquiring and using web scraped data. Revised final version, 2025-02-20. Edited by Eurostat

STRUIJS P, BRAAKSMA B., DAAS P. J.. 2014. Official statistics and Big Data, Big Data & Society, Vol. 1, No. 1 https://doi.org/10.1177/2053951714538417.

Downloads

Published

2026-02-19

Issue

Section

Articles