Transparent Execution of Scientific Workflows in Docker Containers | Docker

36

One of the weak points of e-science is the difficulty to make experiments reproducible in different computing infrastructure. Developers of scientific applications do not have substantial expertise in computer sciences. For those users, dealing with software installation and configuration for distributed environments can become a nightmare.

In this talk, we will present a programming framework to easily develop and distribute scientific applications in a Docker-based computing platform. The proposed framework is a combination of the COMP Superscalar (COMPSs) programming model and runtime, and the Docker software stack. On the one hand, the COMPSs framework provides a straightforward way to develop task-based parallel applications from sequential code. Developers just need to identify the application functions which are candidates to be a task and declaring the direction (IN, OUT or INOUT) of the data used in these methods. With this information, the COMPSs runtime can detect data dependencies between tasks inferring the inherent parallelism in the application and coordinating the execution in the available computing devices.

On the other hand, the Docker Software stack provides a set of tools and services to distribute and the deployment applications as a set of Docker Containers. We have extended the COMPSs tools to support the transparent execution of scientific applications on top of Docker containers.

This extension includes the creation and registration of the Docker images from the application code; the decomposition of the application as a set of Docker containers; and its deployment and efficient execution in a Docker-based computing environment. This framework provides scientists with a tool to easily implement parallel distributed applications to perform their experiments and to deploy and execute them in a one-click fashion.