Can Your Research Notebook Survive a Code Update? On maintaining Reproducibility with Continuous Integration
Agnieszka Żaba
From the user's perspective, maintenance of notebooks that reproduce paper results using up-to-date project codebase serves the purpose of documenting and exemplifying project features and applications. It enables exploratory usage with little-to-no effort needed to set up a working environment. However, this is contingent on a design embracing modularity and inversion of control - it is not uncommon in atmospheric modelling for papers to use different simulation flow control or different parameterisations. We present the inversion of control solutions from PySDM that enable the choice of formulae and constants from user code, without trade-offs in: (i) performance, (ii) ability to switch between CPU and GPU backends, and (iii) dimensional analysis of physics-relevant routines for testing unit correctness. The maintenance of notebooks within code repositories also poses challenges in terms of handling embedded visuals. The jupyter-utils project helps with embedding GitHub-renderable and Jupyter-book-compatible vector graphics and animations.
From the developer's perspective, notebooks within the code repository are a source of test scenarios and constraints for the assertions that constitute a robust regression-test suite. We present the notebook_vars() function from the jupyter-utils package, designed to be used in concert with the fixture logic of the pytest framework. It enables the execution of the notebook code once in a test session, allowing us of the final notebook state in multiple automated tests. This occurs without modifications to the notebooks themselves. In notebooks pertaining to specific research studies, a direct link from test code up to subject literature is provided. Overall, we achieve an improvement in code readability and refactoring capability.
The presented solutions, along with the availability of platforms such as Google Colab, mybinder.org or institutional Jupyter hubs, ensure single-click reproducibility of research-paper results, and a structure for retaining this through future releases of the code base. In addition to the benefits for software users and developers, this satisfies the scientific-method and research-journal reproducibility requirements.