Interim Report: Survey of CPI Production Systems
1 Introduction
1.1 Preface
This work showcases the findings and takeaways from surveying National Statistical Organizations (NSOs) around the world during the winter of 2025. The survey primarily focused on how each NSO structures, develops, and maintains their Consumer Price Index (CPI) production systems and organizes the teams that are responsible for these systems.
This work was done by Workstream 4 (CPI Systems and Architecture), which is part of the UN Task Team on Scanner Data. Hence, we would like to acknowledge the team members of Workstream 4 who provided insights, helped draft survey questions, and supported this empirical assessment.
We would also like to thank and acknowledge the support of the Steering Group of the UN Task Team on Scanner Data, as well as the UN Committee of Experts on Big Data and Data Science for Official Statistics (UN-CEBD) team in making this work possible. Without the support of both, it would not have been possible to survey NSOs around the world and evaluate their CPI production system architectures and team organizations.
1.2 Overview
Extensive methodological and practical guidance is available to National Statistical Organizations (NSOs) related to the creation of the Consumer Price Index (CPI).1 However, there is not much material on how to create maintainable software systems to carry out this methodological and practical guidance.
Furthemore, many NSOs are starting to modernize their programs by shifting towards new data sources, which is similarly well supported with methodological and practical guidance.2 Again, while some resources exist,3 there is not much direction on how to deal with the complexity introduced by developing and maintaining the systems that operate on these new data sources.
To help mitigate this, Workstream 4 of the UN Task Team on Scanner data conducted a detailed survey during the winter of 2025 to (1) provide a detailed summary of the CPI production systems currently used by NSOs and (2) provide guidance in the form of this report on how NSOs can better manage the complexities of system development and maintenance.
While this report is specific to the state of CPI Production Systems at NSOs around the world, our hope is that some of the content in this report is also useful for a wider audience maintaining similar kinds of systems. As such, we attempt to explain our results in a general way and highlight opportunities where our survey approach and findings could be applied in related settings.
1.3 Motivation
In our time working at NSOs, we have encountered some extremely complicated systems that exist in order to produce various analytical and data products such as consumer price indexes, national accounts figures, or labour force statistics. These complicated systems and the teams who maintain them are the subjects of this survey and write up. To reduce ambiguity, we refer to these systems as Complex Analytical Systems throughout this report.
Complex Analytical Systems involve significant amounts of code, documentation, and other non-code artifacts such as Excel Workbooks that carry out complex business logic in order to transform input data into output data. Additionally, they are often developed entirely or in large part by people with backgrounds in Economics, Statistics, Mathematics, or another area related to the domain of Official Statistics.
These Complex Analytical Systems differ from traditional software systems in a number of important aspects4:
Complex Analytical Systems | Typical Software System |
---|---|
Multiple distinct scripts that are run sequentially and perform complex data manipulations. | One code base representing an entire application. |
Running time measured in minutes/hours | Running time measured in milliseconds. |
Human in the loop activities to interpret results. | Completely autonomous system. |
Ad-hoc (messy) data gathered from whatever data sources are available. | Highly structured data whose schema is designed in lock step with the rest of the system. |
Batch workloads that are run manually (or semi-manually). | System running continuously in an event loop waiting for user input. |
Operate on a large fraction of an entire table quickly. | Search for one specific record in a large table quickly. |
Due to differences like those mentioned above, there is not a perfect mapping between best practices from the software engineering world and pain points currently experienced by teams maintaining Complex Analytical Systems. However, there are certainly some best practices from software engineering that are highly appropriate to solve some of the problems faced in the development and maintenance of Complex Analytical Systems.
To this end, we hope our survey can help bridge the gap between well-understood industry best practices from the world of software engineering, and those aspects of Complex Analytical Systems that could benefit from these best practices. Our hope is that the insights gained and the survey methodology deployed may be valuable for other Complex Analytical Systems facing similar challenges.
1.4 What Was the Purpose of This Survey?
In our experience, we’ve noticed that many teams who are responsible for Complex Analytical Systems struggle with managing many aspects of system complexity.
Complex Analytical System business domain teams are typically comprised of individuals with strong analytical skills and significant domain knowledge, however, they often do not have specific training in software engineering concepts. Therefore, they are often not exposed to the significant body of knowledge that has been developed over decades to deal with the kinds of system complexity problems that software developers are routinely exposed to.
We have also found that individuals in these business domains are often missing the vocabulary and concepts to articulate the state of their Complex Analytical Systems. As a result, when these individuals try to explain where they are struggling to a more IT-oriented audience, miscommunication often results, and it becomes difficult to arrive at reasonable solutions.
In this survey, we ask questions that capture several germane aspects of system organization, team organization, technology choices, and business outcomes using language, terms, and conceptual models that are more familiar to individuals on these business domain teams. Our rationale for doing this is threefold.
Measure and describe the state of many Complex Analytical Systems around the world within a specific business domain (CPI Production Systems).
Provide some concrete and practical suggestions to address common areas of struggle within this domain across many NSOs.
Expose people from these business domain teams to software engineering concepts that are relevant in the development and maintenance of Complex Analytical Systems.
While this report is tailored towards a Consumer Prices domain audience, we welcome and encourage readers from different domains to read through this report. We make significant efforts to avoid using too much domain-specific jargon, and present findings in a way that should comprehensible to a more general audience. In Chapter 10, we elaborate on aspects of our survey we believe to have high external validity, provide some practical suggestions that are applicable to Complex Analytical Systems in general, and describe some productive areas of future exploration that are not limited to the Consumer Prices business domain.
1.5 Overview of CPI Production Systems
With the above motivation in mind, we conduct this survey for CPI Production Systems specifically, which are a kind of Complex Analytical System described in Section 1.3. More precisely, these systems take data on the price of consumer goods and services purchased throughout an economy and calculate period-over-period price changes of these goods and services. These price changes are ultimately mapped to a taxonomy of product categories, with the highest level of the taxonomy being the monthly “all items” CPI that is commonly used when discussing the overall level of inflation.
The recent adoption of alternative data sources in the calculation of CPIs has further increased the complexity of these systems,5 and has increased the importance of skills in newly emerging disciplines such as Data Science, Data Engineering, and Analytics Engineering.
1.7 Survey Design and Data Collection
We developed this survey during the fall of 2024 and administered it through the winter of 2025.
With support from the UN-CEBD and UNECE, we contacted NSOs around the world. We prioritized reaching out directly to individuals in the Price Statistics divisions at each NSO. Where no contact was known, we requested that the survey link be forwarded to the Price Statistics team of that organization. Each Price Statistics team submitted one response on behalf of their NSO. At the time of writing this interim report, 70 NSOs responded to our survey.
Every major geographic region of the world is represented in the responses. NSOs that answered ranged from those advanced in their modernization efforts to those that are still focused on traditional methods.
We do not make any claims about the statistical significance of results throughout this survey. Rather, the goal of this survey is to provide descriptive findings about system and team organization, and to relate these findings to prior knowledge in other disciplines such as Software Engineering.
To preserve anonymity, we do not disclose any values if there are 2 or fewer respondents that take on the value. As a result, throughout the report, certain tables and figures may be presented in a way where certain categories are omitted or grouped together.
Some amount of selection bias is possible, which could be in part due to factors such as the time commitment and the complexity of the survey. It is also possible that misinterpretation could have affected some responses.
However, despite these possible shortcomings, we believe the data collected are still relevant. We also believe that the findings in this survey have high external validity with respect to other Complex Analytical Systems beyond CPI production systems.
For readers who are interested, this survey was administered using LimeSurvey, and the questionnaire used can be found in the Github repository for this survey.
1.8 How This Report Is Organized
This report is presented in the order that the survey was conducted, with findings presented along the way.
Chapter 2 covers the key conceptual models and terminology used to articulate concepts about system and team organization.
Chapter 3 analyzes our findings with respect to system and team organization.
Chapter 4 covers some high-level questions on the use of tools and technologies required to develop and maintain CPI Production Systems.
Chapter 5 covers questions about the age and update frequency of systems.
Chapter 6 covers questions about the number of individuals required to participate in system changes.
Chapter 7 covers questions about a concept called lead time, which measures the end-to-end time required to implement a change to a software component.
Chapter 8 covers questions about the usage of alternative data in CPI Production Systems.
Chapter 9 covers the challenges CPI Production System teams face with respect to maintaining their systems.
Chapter 10 concludes with a summary of the most notable findings from the survey, some practical insights to address some common areas of struggle, and some areas of future work.
1.9 Note on Confidentiality and Privacy
As part of the administration of this survey, we ensured respondents that their data will be treated confidentiality. Therefore, no individual response data are made available in this report; all results presented are aggregated over all respondents.
Most notable is the 2020 CPI Manual.↩︎
See the e-handbook, developed and maintained by the UN Task Team on Scanner data, for guidance on various aspects of leveraging new data sources.↩︎
The most notable approach being recommended is Reproducible Analytical Pipelines (or RAPs), which are discussed in Section 1.6. The IT system requirements section in the e-handbook also summarizes several considerations and approaches for systems development.↩︎
We are not implying that all “traditional” software systems have these characteristics. Rather, we are trying to draw contrast between aspects of Complex Analytical Systems that are most likely to be different from the kinds of systems a software engineer would often develop and maintain.↩︎
In the context of CPI Production Systems, alternative data sources refer to data such as retailer scanner and web-scraped data that can be used to calculate the component price changes that are used in CPI calculations.↩︎
We use slightly different terminology to refer to some of these concepts throughout the survey in order to use language that our target audience is most likely familiar with.↩︎
For readers who want to learn more about RAP, see this training session on RAP for price statistics by ESCAP which covers an application involving web scraping.↩︎