Reproducibility Debt

Modern research increasingly depends on complex software, data, and computational environments. While this enables powerful discoveries, it also introduces challenges in ensuring that results can be reliably reproduced over time.

Reproducibility Debt (RpD) captures the accumulation of factors that hinder the reproducibility of research. Similar to technical debt, short-term decisions, such as incomplete documentation, ad hoc workflows, or untracked environments, can lead to long-term costs in validation, reuse, and trust.

Why Reproducibility Debt Matters

In large-scale research environments and infrastructures, unmanaged reproducibility debt can lead to:

  • Reduced confidence in research outputs
  • Increased cost and effort for validation
  • Delays in extending or reusing results
  • Barriers to collaboration across teams
  • Inefficiencies in research and software workflows

Addressing reproducibility debt is therefore essential for sustainable, scalable, and trustworthy research systems.

A Taxonomy of Reproducibility Debt

Reproducibility debt manifests across multiple dimensions of the research lifecycle. The following taxonomy provides a structured way to identify and manage these challenges:

1. Code Debt

Issues in code quality, structure, or maintainability that make results difficult to reproduce or extend.

2. Data Debt

Problems related to missing metadata, unclear provenance, inaccessible datasets, or lack of standardisation.

3. Documentation Debt

Incomplete, outdated, or inconsistent documentation that prevents understanding and reuse.

4. Human & Organisational Debt

Challenges arising from lack of training, knowledge silos, poor communication, or misaligned team practices.

5. Tools & Infrastructure Debt

Limitations or inconsistencies in tools, platforms, and computational infrastructure that affect reproducibility.

6. Legal & Policy Debt

Constraints related to licensing, data access, compliance, or unclear ownership that restrict reproducibility and sharing.

From Research to Infrastructure

My work moves beyond identifying reproducibility issues to providing structured, actionable solutions.

Key contributions include:

  • A formalised concept of Reproducibility Debt
  • A taxonomy capturing multi-dimensional contributors
  • Empirical studies identifying non-technical root causes
  • Probabilistic and causal models for analysing dependencies between factors
  • A foundation for the RpD-Manager tool, supporting assessment and mitigation

These approaches are designed to support:

  • research teams
  • research software engineers
  • national research infrastructures
  • policy and governance bodies

Application Areas

Reproducibility Debt can be applied in:

  • Scientific software development
  • Climate and environmental modelling infrastructures
  • Data-intensive research platforms
  • Large-scale collaborative research projects
  • Research quality and governance frameworks

Work With Me

I am a researcher specialising in reproducibility, empirical software engineering, and research systems, with a focus on translating research insights into practical, infrastructure-level solutions.

My work combines:

  • empirical software engineering
  • reproducibility science
  • causal and probabilistic modelling
  • research infrastructure practices

I am particularly interested in collaborations that:

  • integrate reproducibility into large-scale systems
  • improve research quality through structured frameworks
  • bridge the gap between research and real-world implementation

Collaboration Opportunities

I welcome collaboration with:

  • Research and academic teams
  • Research infrastructure organisations
  • Research software engineering groups
  • Policy and governance stakeholders