photo

Marie-Josée Cros

Computer science research and development engineer for life science

Since the early 2000s, I am involved in the agile software development movement.
Since 2013, I get sensible to the reproducible research question in scientific computing (see below).
More recently, in 2015, I coordinated a group of people around the theme of practicals and tools for scientific computing [wiki] [Poster].

Reproducible research in scientific computing

Posters on the question

poster2016 In 2016, a collective work was done to collect in a wiki practices and tools useful for scientific computing.

I presented a poster at the Journées de math-info de l'INRA, Pont Royal France, Oct. 3-7, 2016.
poster2015 In July 2015, I updated my point of view to share it with a poster at the JDEV 15 conference.

The awareness that it is necessary to change seems more present even if practices change quite slowly. The amount of available tools (someone spoked of a 'huge tech soup', see Database of 400+ tools) surprises me, how to know and choose them.
Ensuring reproducibility requires work and ressources. More and more I think that individual movements are necessary but a change in research policies (evaluation of work, individuals ...) has also to occur.
poster2014
I had the opportunity to synthetize my readings and point of view in a poster for the 2014 general meeting of MIA (Applied Mathematics and Computer Science) division at INRA.

Individual improvement in practices (linked to scientific environment, individual and neighborhood ability...) associated to changes in research environment can increase reproductibility in scientific computing. Practices can rely on several tools (generic or specific for a domain).

Introduction for a workshop

Comment être plus reproductible ? Journées bioinformatiques de l'INRA, Toulouse France, 22-24 mars 2016.


A list of available tools (last update in 2016)

Presenting the posters, I was asked the links to the project site of the mentioned tools, here they are.

Revision control software

logo git git is a free and open source distributed version control system.
The most used version control system.
logo Mercurial mercurial is a free, distributed source control management tool.
An alternative to git.
logo Bazaar GNU Bazaar (formerly Bazaar-NG, command line tool bzr) is a distributed revision control system sponsored by Canonical.
logo Subversion (svn) Subversion is an open source version control system.
Not distributed as git and mercurial but still quite used.

Code repository

logo GitHub GitHub is a web-based hosting service for software development projects that use the git revision control system.
logo SourceForge Find, Create, and Publish Open Source software for free.
logo BitBucket Free source code hosting for Git and Mercurial.
logo SourceSup French Forge for Training and Research Public Institutions.

Literate programming

logo Sweave Sweave is a tool that allows to embed the R code for complete data analyses in latex documents. The purpose is to create dynamic reports, which can be updated automatically if data or analysis change.
The notebook interface allows to write and run code, display 2d and 3d plots, and organize and share your work.
logo knitr Elegant, flexible and fast dynamic report generation with R
logo emacs Org mode Org mode is for keeping notes, maintaining TODO lists, planning projects, and authoring documents with a fast and effective plain-text system.
logo IPython IPython is a command shell for interactive computing in multiple programming languages, especially focused on the Python programming language, that offers enhanced introspection, rich media, additional shell syntax, tab completion, and rich history.
logo Sage Sage is a free open-source mathematics software system licensed under the GPL. It builds on top of many existing open-source packages: NumPy, SciPy, matplotlib, Sympy, Maxima, GAP, FLINT, R and many more. Access their combined power through a common, Python-based language or directly via interfaces or wrappers.

Workflow management system and provenance tracker

logo VisTrails An open-source scientific workflow and provenance management system that supports data exploration and visualization.
logo Taverna Taverna is an open source and domain-independent Workflow Management System – a suite of tools used to design and execute scientific workflows and aid in silico experimentation.
logo Pegasus The Pegasus project encompasses a set of technologies that help workflow-based applications execute in a number of different environments including desktops, campus clusters, grids, and clouds.
logo kepler The Kepler Project is dedicated to furthering and supporting the capabilities, use, and awareness of the free and open source, scientific workflow application, Kepler.
logo Galaxy Galaxy is an open, web-based platform for data intensive biomedical research.
logo Sumatra Sumatra is a tool for managing and tracking projects based on numerical simulation or analysis, with the aim of supporting reproducible research. It can be thought of as an ''automated electronic lab notebook'' for simulation/analysis projects.

Environment capture

logo Virtual machine A virtual machine (VM) is a software implementation of a machine (e.g., a computer) that executes programs like a physical machine.
logo Linux package In Linux distributions, a package refers to a compressed file archive containing all of the files that come with a particular application. Most packages also contain installation instructions for the OS, as well as a list of any other packages that are dependencies (prerequisites required for installation).
logo docker Docker is an open-source project that automates the deployment of applications inside software containers, by providing an additional layer of abstraction and automation of operating-system-level virtualization on Linux.
logo Vagrant Vagrant is computer software that creates and configures virtual development environments. It can be seen as a higher-level wrapper around virtualization software such as VirtualBox, VMware, KVM and Linux Containers (LXC), and around configuration management software such as Ansible, Chef, Salt and Puppet.

Publication site

logo figshare Manage your research in the cloud and control who you share it with or make it publicly available and citable.
logo zenodo zenodo is a new simple and innovative service that enables researchers, scientists, EU projects and institutions to share and showcase multidisciplinary research results (data and publications) that are not part of existing institutional or subject-based repositories.
logo DRYAD Dryad is an international disciplinary repository of data underlying scientific and medical publications. Dryad is a curated general-purpose repository that makes data discoverable, freely reusable, and citable.
logo Research Compendia A web service allowing people to share the research software and data associated with a scientific publication (articles and working papers).
logo Dataverse A web site dedicated to sharing, archiving and citing research data.
logo Open Science Framework The Open Science Framework (OSF) is part network of research materials, part version control system, and part collaboration software. The purpose of the software is to support the scientist's workflow.
logo Run&Share Run&Share is a web service allowing people to run computer codes associated with a scientific publication (articles and working papers) using their own data and parameter values. It is a fork of RunMyCode.
logo myExperiment myExperiment makes it easy to find, use and share scientific workflows and other Research Objects, and to build communities.
logo recomputation.org It is a repository for experiments in computational science.