There is a fundamental reproducibility crisis in computational science. Many movements are emerging under data and source code discoverability and open access to bring scientific results back to a more trusted state. There are so many things going on that having a topic for all that will quickly become unmanageable. I can just briefly list:
- terminology battle
- numerical reproducibility
- reproducibility at exascale
- workflow tools
- simulation management tools
- software execution capture: environment, inputs, outputs, dependencies, execution workflow (build, setup, …)
- etc, …
This topic focus only on the environment and dependencies captures where docker can be seen as a great solution for that.
Yet if you are into scientific computing and have encountered or diged around you have realized how things like:
- code optimization
- compiling parameters
- architecture major shifts (Xeon, Xeon Phi)
can have terrible and very hard to debug non reproducibility consequences.
The trend here is that we will always want better architectures, optimized computation, faster and faster hardware and software.
This topic is open for anybody having thoughts, issues and solutions round reproducibility guaranty while executing dockerized scientific code on various architectures.
There is a very interesting paper Numerical Reproducibility, Portability and performance of Modern Pseudo Random Number Generators that show a benchmark of stochastic simulation reproducibility use cases with different random number generators (with preset status) by changing many parameters: compiler, os, on virtual machine or not. you will find it interesting. I am currently investigating the effect of the virtualbox layer while using boot2docker on non linux os. I am seriously happy Docker Beta is being spinned out and will probably investigated too. I am also open for collaborations as i am currently pursing my PhD in CS.
I think i have the longest topic description ever.