3 - Configuring Docker

OK, decided to ditch the local config. In spite of the fact that Docker is supposed to be platform agnostic, it needs some sort of virtualization layer, so some versions of Windows are a no-go, eg Windows Home Ed. We will use DockerHub connected to GitHub to build images on a remote server.

Let’s talk a little about Docker. Docker, imo, takes us to the extreme end of reproducible research. Docker allows for creation of the entire compute environment as a container. From the docs (see below), Docker is a platform for developers and sysadmins to develop, deploy, and run applications with containers. A container is launched by running an image. An image is an executable package that includes everything needed to run an application – the code, a runtime, libraries, environment variables, and configuration files. The image is everything, except, usually, the data.

The reason we are going this route is two-fold:

  • platform issues plague attempts at teaching parallelization and cause many needless teaching headaches
  • reproducible research is a passion of mine (driven by personal fails :( )

To combat this, we will use Docker to create environments that are reproducibly the same for all of your class computes. We will do this combining DockerHub with build rules that link to GitHub. This will allow us to avoid all local configuration issues and have the images build directly in DockerHub using a configuration file (Dockerfile) in a GitHub repository. Once we have a Docker image that builds, we can pull the image to a compute node via Singularity and finally use Singularity to instantiate a running container. To get to our clusters:

Like with Git, in this class, we don’t need to become experts, we simply need the basics. The basic workflow looks something like this:

Create GitHub repository

  • create new repository
  • add Docker file

We are going to fork one of mine. You can customize from there knowing the base works on ARC clusters.

Create DockerHub repository

  • give it a name
  • give it a description
  • make it connected
    • must first setup link to GitHub under account settings and linked accounts
    • choose your GitHub repo containing the Dockerfile
  • setup build rules
    • the defaults should work, as I version, I change the Dockerfile name to append version
  • choose create and build

This can take a little while, but it will build on the DockerHub servers. Note, any changes upstream from this image will trigger a new build. Ideally, you would use versioning to protect against upstream changes. If that is a possibility and a problem, turn off auto build after the image is built to your liking.

Now to do some stuff on the cluster.

For this class, we will log in via <ood.arc.vt.edu>. From there, choose clusters -> TinkerCliffs

module load containers/singularity 
singularity pull --force --disable-cache --name rstudio.simg \
    docker://rsettlag/stat5566-test1:latest

If you are creating your own containers and pushing to DockerHub, I have found this to work, assuming you are in the directory containing the Dockerfile and the image name will be ood-rstudio-stat3615 version 3.6.1:

docker login
docker build --no-cache -t ood-rstudio-stat3615:3.6.1 .
docker tag ood-rstudio-stat3615:3.6.1 rsettlag/ood-rstudio-stat3615:3.6.1
docker push rsettlag/ood-rstudio-stat3615:3.6.1

Before pushing it, you should probably poke around in it manually and by starting Rstudio:

docker run -it --rm ood-rstudio-stat3615:3.6.1 bash
docker run -e PASSWORD="test" --rm -p 8787:8787 rsettlage/ood-rstudio-qiime2:3.6.1 

If you want to read more about this:

https://docs.docker.com/get-started/
https://ropenscilabs.github.io/r-docker-tutorial/
https://colinfay.me/docker-r-reproducibility/

We won’t go too far down this path except to make sure you know the idea and basics.

Some basic install stuff:

https://www.digitalocean.com/community/tutorials/how-to-install-and-use-docker-on-centos-7 https://www.andrewheiss.com/blog/2017/04/27/super-basic-practical-guide-to-docker-and-rstudio/

Some less basic DockerFile stuff:

https://docs.docker.com/docker-hub/repos/ https://www.linuxnix.com/how-to-push-docker-images-to-docker-hub-repository/

Windows user: Windows Home Edition does not contain the secret sauce to allow virtualization. To get a more capable version of Windows, you can upgrade your Windows using the VT site license: https://apps.itpals.vt.edu/Apps/WebObjects/NetSoftware.woa/wo/0.0.3.3.21.19.1.123.3.7.1.1