Presented at the 2026 Ottawa Group (Warsaw, Poland)
2026-05-14
Ioannidis (2005), Baker (2016), and many others put the replication crisis on the map.
There have been a number of notable movements that have sought to help solve this problem. For example:
FAIR—Findable, Accessible, Interoperable, Reusable—a set of principles for open science (Wilkinson et al. 2016).
Turing way—a set of guides and processes to follow that (Community 2025).
A lot of progress has been made! For example a recent survey of reproducibility in Economics (at least for journals with strict data sharing mandates and policies) reproducibility is as high as 85% (Brodeur et al. 2026).
We can move from a “trust me, it works on my data” toward “here is exactly how I did what I did, and here is everything you need to check my work.”
The movement is actually already underway! For example increasing adoption of Reproducible Analytical Pipelines (RAP) is a step in the right direction!
What is left is to confirm discipline specific research practices.
As a workstream in the UN Task Team, we are a group of price statisticians from NSOs and research institutes around the world.
In 2024, we formed the FAIR/Reproducibility workstream to instil reproducibility within the discipline.
We introduced the topic to the community a year ago (at 2025 CPI Expert Group).
We see two main blockers for reproducibility in price statistics:
Open datasets are few and far between, or are undocumented! To get to reproducibility, we all need access to high value open datasets. Without that, it is natural to default to internal data at our NSOs that is already processed, classified, and ready to go.
Unclear how to be reproducible! There is a lot of guidance out there, but what works for our discipline? The learning curve is non-trivial—so why bother if the “how to do this” path isn’t clear, and the mission isn’t obvious?
We are working to solve both by:
Developing and supporting a curated catalogue of good research datasets.
Providing guidance on practices that apply to our discipline.
We should aim for reproducibility for methodological studies—i.e. if the analysis (i.e. the code) is public and the data is open.
Replicability can be the backup—if the same data can’t be used or to try other data with the same code.
Without either, generalizability is the default. Generalizability however means that research is:
Slower—it takes much more time and resources to reach consensus
Sensitive to local patterns—an observed trend may be sensitive to local conditions (data, geography, etc).
More inclined to novelty over robustness or production benefits—the breadth of options to consider is large but work is left on NSOs to operationalize.
The input data is used to answer research questions. Output data is produced by your research project (such as final indices).
The broader (structured) digital object that tracks all the objects of research (code, doc, output, etc) is the research compendium:
The active version can be on GitHub.
Final version is archived.
Track software used in the computational environment file (no more “it works on my device”).
Store the source for your paper in your research compendium.
Apply the Reproducible Analytical Pipelines (RAP) framework to research.
Get started by:
As you start—there is a 3 step maturity model that can be followed. Step 1 is very approachable.
Add coherent structure to the compendium!
(input) data folders (even though you should ignore the data itself) make it clear how it runs end-to-end;
code is grouped;
document the process—from the project design to project documentation (such as the paper);
(output) data for data the code produces;
choose a licence to tell others how to use your materials;
include a file that lists your computational environment;
carefully track what packages are used;
There are 3 types of (input) data:
Open data—common types include Dominick’s or Turvey.
Proprietary data—commonly used ones include Nielsen and IRI.
Sensitive data—the internal data holdings in NSOs.
Open data is the default, but other data could be okay (next slide).
Irrespective of the data type, structure the research compendium in a standardized way and ignore the data files themselves (i.e. add *.csv to your .gitignore file).
The catalogue documents each dataset in a standardized way
The datasets themselves still belong to the owners.
Ideally, each dataset is in a data repository (e.g., Zenodo)
https://un-task-team-for-scanner-data.github.io/price-stats-data-catalogue/ (or find it embedded in our project site).
Let us know if you know a good dataset to add!
If you still prefer to use proprietary or internal data, you can modify lightly to make your project reproducible.
As with the previous guide, make sure to .gitignore the data itself!
Archive your research compendium (immutable copy tied to your submission).
Cite the objects of research using its findability metadata:
Find each object with its (permanent) unique digital identifier (typically a DOI).
Cite the unique identifier assigned to author (sign up to ORCID if you haven’t yet).
Metadata standards (i.e. SDMX/DDI/etc) typically applies to publishing new input data.
Great!
Goussev, S., Martin, S., Lamboray, C., Bontemps, C., Flower, T., Hillman, B., White, C., & Mehroff, J. (2026). Price Statistics Reproducibility Project (v0.1). Zenodo. https://doi.org/10.5281/zenodo.19779579
Tell us what is in your way to research in a reproducible way?
https://forms.gle/DZRzqNbPHCfgdReLA
TO DO - figure out QR code!
More than anything—are you convinced?