DevOps practices
for Reproducible Computational Science
Computational Science Hub, University of Hohenheim, 2026-01-22
Alma Mater Studiorum – Università di Bologna
picture by: Damien Wallace, CC-BY-SA-4.0
Facts about chickens (Gallus gallus domesticus)
- World’s most common domestic bird
- Strict social hierarchy
- Opportunistic omnivore and constant forager
- High reproductive output through frequent egg-laying
- Light body and fast reactions that depend on precise coordination
- Rhythmic walking patterns can run on autopilot
- Movement can persist without conscious control
I’m now making a claim, and I want to see if you believe me
A beheaded chicken can survive for 18 months
Please raise your hand if you believe this claim
A beheaded chicken can survive for 18 months

Please raise your hand if you believe this claim now
A beheaded chicken can survive for 18 months

Please raise your hand if you believe this claim now
A beheaded chicken can survive for 18 months

Please raise your hand if you believe this claim now
A beheaded chicken can survive for 18 months

Please raise your hand if you believe this claim now
Would it be more convincing if I precisely explained how the cut was performed? (protocol)
Would it be more convincing if multiple independent people were able to replicate the experiment? (independent reproduction)
Picture Reproducible experiments, or didn’t happen

Replication crisis
aka replicability crisis or reproducibility crisis
-
ongoing methodological crisis
-
the results of many scientific studies are hard or impossible to reproduce/replicate reliably.
-
empirical reproductions are essential for the scientific method
no reproducibility $\Rightarrow$ scientific credibility is undermined
Terms (NASEM 2019)
- Reproducibility: same data + code/steps $\Rightarrow$ consistent computational results
- Replicability: independent study/new data $\Rightarrow$ consistent results
The good news: people working in computer science are lucky!
(on average)
- It’s the only engineering discipline producing mass-less artifacts
- Even if we work in highly unpredictable contexts, we can control randomness in many experiments
- We have support tools to track changes, automate, and manage complexity
DevOps practices for Reproducible Computational Science
Practices, techniques and tools that are common in the DevOps world.
Some of them apply to experiments (or part of them), some to tools/libraries, some to both.
- Set up a version control system
- Prepare a code repository
- Set up a build automation system
- Make sure you can control randomness
- Work in isolation using containers
- Raise your confidence with continuous integration and continuous delivery
- Pick a good license
- Create archival copies and persistent identifiers (DOIs)
- Document your work
Version Control Systems
- Keeps track of changes
- Fosters collaboration
- Foundational tool for sharing through well-known source code hosting platforms
Dos
- set it up early to support the construction of the artifact
git is a standard de-facto, use it
- track only non-generated files $\Rightarrow$ set up and maintain a good
.gitignore
- take your time to understand how to solve conflicts
- learn the basics using a terminal
Don’ts
- start using it when it is time to share the artifact
- pick niche and/or declining tools (even if they are good –
hg, svn…)
- paste commands / fiddle with UIs without understanding what you are doing
- delete the repo and start over
Advanced
- Use it for the paper, too, if you write in $\LaTeX$
- Agree on a branching strategy with your collaborators
- Learn
rebase, bisection, cherry-pick, and other advanced features
Public code repositories
- The place where others will search for your code
- Especially if it is a tool or a library
- Fosters collaboration
- Provide many useful ancillary services
- Issue tracking
- Continuous integration
- Documentation hosting
Dos
- prefer cloud-hosted repositories to on-premise ones
- if you can, use GitHub
- a reasonable niche alternative is GitLab
- prepare a good
README.md file to guide users
Don’ts
- share only through your institutional website, unless you are forced to
- upload archives instead of code, it’s not a 1990s FTP server
Advanced
- use pull requests to contribute upstream or integrate changes if it is a long-lived project
- investigate the additional services, e.g., bots that propose dependency updates
Build automation
- Automatic resolution and download of software dependencies
- Automatic compilation and packaging
- Automatic testing
- Automatic execution of the experiments
Dos
- pick tools based on the language/ecosystem favoring those most used in the development community
- Java / Kotlin $\Rightarrow$ Gradle
- JavaScript $\Rightarrow$ npm
- Python $\Rightarrow$ poetry
- strive to have a single short command on a freshly cloned copy run the entire experiment
or test and pack the tool
- minimize the pre-requirements
- lock dependency versions
Don’ts
- require manual steps
- rely on dependencies not available in mainstream repositories
- require an IDE to run the experiments
Advanced
- use the build tool to install the toolchains, if possible (see e.g., the Gradle Toolchains)
- build a one-command, zero-setup demo, e.g.:
curl -sL https://bit.ly/dais-2023-loadshift | bash
Example: https://github.com/angelacorte/vmc-experiments
Control randomness
- Obtain the same exact results when running the same experiment
- When building a tool or library, make sure that the same input produces the same output
Dos
- always seed your pseudo-random number generator
- when modeling a random process, always expose an API to set the seed
- when interacting with the real-world, use mocks or stubs
(there are many great testing libraries that provide them)
Don’ts
- call
Math.random(), random.randint, rand() or similar functions
Advanced
- In case of unavoidable randomness, e.g., due to parallelism,
provide a companion experiment (smaller in size) that can be executed deterministically
Isolation and self-containment via containers
- It runs on your machine? Ship your machine!
- Easy to share lightweight virtual machines (NOTE: they are not VMs)
- They apply to experiments, much less to tools
Dos
- prepare one or more
Dockerfiles packing your experiment and all its dependencies
- prefer containers over setup instructions for clients
- publish your images on a public registry (e.g., dockerhub)
Don’ts
- mount and write into user-local folders with the
root user
- be aware of auth/visibility quirks in
ghcr.io (Docker Hub is generally more straightforward)
Advanced
- Use orchestrators such as
docker-compose to manage complex setups
Example: https://github.com/nicolasfara/experiments-2024-ACSOS-imageonomics-drones
Continuous integration and continuous delivery
- For each change, automatically run a verification process
- Detect regressions early and keep the artifact continuously runnable
Dos
- set up the CI/CD pipeline as soon as possible
- most code hosting platforms provide it for free within limits
- rely on the build process that you set up before
- deliver the artifacts automatically if the verification succeeds
- produce a reduced version of the experiments that can serve testing purposes and can be used as quick demo
Don’ts
- run the whole experiments in the pipeline even if they take hours
Advanced
- automatically deal with updates and patches using a automatic merge rules for PRs
- automatically release new versions (checkout Semantic Release)
Example: https://github.com/nicolasfara/experiments-2024-acsos-multi-tier-field-based-applications
Pick a license
- Unlicensed software is proprietary
- copyright exists even if you don’t write it; a license is an explicit permission grant
- Pick the right license for the job, depending on your goals
Dos
- use a standard license (e.g., MIT, Apache, GPL, LGPL)
- check what your license allows and what it does not
Don’ts
- concoct your own license
- exception: adding a linking/classpath exception to the GPL
- use a funny license, such as Beerware or WTFPL
Rules of thumb
- if you want people to do whatever they want, use MIT or BSD
- if you want to track contributions or protect trademarks, use Apache 2.0
- from personal experience: do not use GNU LGPL if you want companies to use your library
- forces the linking software to be partly reverse-engineerable
- a GNU GPL with linking/classpath exception is more permissive
- if you want to be protective and force everyone using your stuff to release theirs,
use GNU GPL
- if your software is networked, consider GNU Affero in place of GNU GPL
- do not use Creative Commons licenses for software
- for any detailed use, ask your legal team
Archival copies and digital object identifiers
- Don’t let your software disappear
- Let others reference specific versions of your software through a DOI
Dos
- connect GitHub to an archival service, e.g., Zenodo
- every GitHub release will be automatically archived and assigned a DOI
- Use services such as DOI2bib to get a BibTeX entry for your DOI
- Clean it up if you use bots, or they’ll appear as coauthors
Don’ts
- upload software manually to archives
Advanced
- treat your documentation as you treat your software: version it, review it, test it
- configure the CI/CD pipeline to automatically build and deploy the documentation
Example: https://github.com/anitvam/dsrt-2024-jakta-alchemist $\Rightarrow$ 
Document your work
- Help other people understand your work
- Help other people build on your work
Dos
- pick a framework to better organize your documentation
- leverage GitHub Pages or similar services to host your documentation
- with every organization, you get a free static website
https://<org>.github.io/
- you get one with every repository, too, at
https://<username>.github.io/<repository>
- use a static site generator (e.g., Hugo, Jekyll)
to convert Markdown documentation into a website
Don’ts
- use HTML/js/CSS directly, it’s hard to maintain
Advanced
- treat your documentation as you treat your software: version it, review it, test it
- configure the CI/CD pipeline to automatically build and deploy the documentation
Example: https://alchemistsimulator.github.io/
Conclusion
-
Make sure others can reproduce your experiments
-
Lower the maintenance burden of your tools by using DevOps techniques
-
Version Control
-
Build Automation
-
Containerization
-
CI/CD
-
Upload your artifacts where others are likely to search for them
-
Make your experiment future-proof by archiving on software-preservation platforms
-
Don’t forget to apply a license
And of course, leave chicken’s heads alone
