Modern research increasingly depends on complex software, data, and computational environments. While this enables powerful discoveries, it also introduces challenges in ensuring that results can be reliably reproduced over time.
Reproducibility Debt (RpD) captures the accumulation of factors that hinder the reproducibility of research. Similar to technical debt, short-term decisions, such as incomplete documentation, ad hoc workflows, or untracked environments, can lead to long-term costs in validation, reuse, and trust.
Why Reproducibility Debt Matters
In large-scale research environments and infrastructures, unmanaged reproducibility debt can lead to:
- Reduced confidence in research outputs
- Increased cost and effort for validation
- Delays in extending or reusing results
- Barriers to collaboration across teams
- Inefficiencies in research and software workflows
Addressing reproducibility debt is therefore essential for sustainable, scalable, and trustworthy research systems.
A Taxonomy of Reproducibility Debt
Reproducibility debt manifests across multiple dimensions of the research lifecycle. The following taxonomy provides a structured way to identify and manage these challenges:
1. Code Debt
Issues in code quality, structure, or maintainability that make results difficult to reproduce or extend.
2. Data Debt
Problems related to missing metadata, unclear provenance, inaccessible datasets, or lack of standardisation.
3. Documentation Debt
Incomplete, outdated, or inconsistent documentation that prevents understanding and reuse.
4. Human & Organisational Debt
Challenges arising from lack of training, knowledge silos, poor communication, or misaligned team practices.
5. Tools & Infrastructure Debt
Limitations or inconsistencies in tools, platforms, and computational infrastructure that affect reproducibility.
6. Legal & Policy Debt
Constraints related to licensing, data access, compliance, or unclear ownership that restrict reproducibility and sharing.
From Research to Infrastructure
My work moves beyond identifying reproducibility issues to providing structured, actionable solutions.
Key contributions include:
- A formalised concept of Reproducibility Debt
- A taxonomy capturing multi-dimensional contributors
- Empirical studies identifying non-technical root causes
- Probabilistic and causal models for analysing dependencies between factors
- A foundation for the RpD-Manager tool, supporting assessment and mitigation
These approaches are designed to support:
- research teams
- research software engineers
- national research infrastructures
- policy and governance bodies
Application Areas
Reproducibility Debt can be applied in:
- Scientific software development
- Climate and environmental modelling infrastructures
- Data-intensive research platforms
- Large-scale collaborative research projects
- Research quality and governance frameworks
Work With Me
I am a researcher specialising in reproducibility, empirical software engineering, and research systems, with a focus on translating research insights into practical, infrastructure-level solutions.
My work combines:
- empirical software engineering
- reproducibility science
- causal and probabilistic modelling
- research infrastructure practices
I am particularly interested in collaborations that:
- integrate reproducibility into large-scale systems
- improve research quality through structured frameworks
- bridge the gap between research and real-world implementation
Collaboration Opportunities
I welcome collaboration with:
- Research and academic teams
- Research infrastructure organisations
- Research software engineering groups
- Policy and governance stakeholders
