Converting an open source project to Docker: why bother?

Stuart Watt from Princess Margaret Hospital in Toronto has written a tracker app that is intended to replicate the experience of Google Docs spreadsheets for research study project managers in a private, local install. The webapp currently supports spreadsheets for multiple projects, fine grained permissions handling, multiple views on the same sheet, and of course it can export to Excel.

A project manager at OICR leading a multi-centre project approached me about his tracker a few weeks ago. Currently, she keeps several spreadsheets up to date and passes them around to keep track of hundreds of samples in various stages of sample preparation, sequencing and analysis. For many clinicians, technicians, project and people managers, Excel spreadsheets are the tool of choice to share data and information about samples and projects. They are flexible and simple for presenting and organizing information. Unfortunately they’re time consuming to keep up to date and they can corrupt data. The tracker looked like a possible time and tear saver.

My first concern was assessing the tracker and enabling the project manager to try it out. Plan A was to launch a virtual machine, open a port on it and give her my machine address. Halfway through building the VM (while my machine was crawling along) I realized I didn’t particularly want to email around the image or host the VM on my local machine forever.

So, jump to plan B: Docker

What? Why Docker?

If you’ve been a little behind in your reading, Docker is the cool new thing that will solve one of the greatest woes in bioinformatics: trying to run someone else’s code. It is a lightweight container that contains a teeny operating system and all of the dependencies needed to run a piece of software. If the Docker container worked for the author of the software, in theory it should work the same way for everyone.

I needed to show a bunch of collaborators what the tracker looked like, but I didn’t want to have to host or email a full virtual machine. All you need to get a Docker container running is the Docker software and a Dockerfile. My Dockerfile ended up being 1.2K in size and therefore extremely easy to email. And it also starts up faster and is less resource intensive than a full VM.

The tracker app didn’t have a Dockerfile yet. But I’ve been putting off actually trying Docker for about a year, so I decided to give it a shot.

How hard is it to do?

Starting from nothing but a vague knowledge of what Docker did, it took me about an hour to wrap the tracker app and get it running. There was plenty of Googling and once I cheated and asked someone instead of Googling, but it didn’t take very long.

Requirements:

  • Docker
  • Reasonable knowledge of the command line in some operating system
  • Being able to search Google
  • An application that you know can be installed

I’m somewhat embarrassed I didn’t try it earlier. I thought it would require more time invested to get started.

Tell us everything

This post isn’t intended to be a tutorial on how to run with Docker. This is a record of my experience as a Docker newbie. But I will summarize what I think I learned here. First, I followed the tutorial on the Docker Getting Started guide and made it as far as Step Four: Build your own image. At that point I figured I was an expert and I could get started on a real container.

Dockerfiles are how you configure a Docker container with base images (mini operating systems), installation commands and default launch commands. Each line in the file starts with one of these keywords: FROM, RUN and CMD. Once you have your Dockerfile, you first build it and then you can run it.

Pick a base image with FROM

I searched Docker Hub and found that Ubuntu:14.04 was there so all I needed to do was put that at the top:

FROM ubuntu:14.04

Kind of like Ubuntu’s apt repositories, Docker knows how to find the image so it will go and get it when you build it.

Install all the things with RUN

In the Dockerfile, everything you need to set up the container is under the RUN commands. These commands are run when the docker container is built. Every RUN command creates a new image in the docker container. When the build is finished, the intermediate builds are thrown away. I believe this lets your container rebuild faster, but I could be wrong.

Two important things to note:

  1. Docker runs all of the RUN commands as root, hence the omission of sudo.
  2. All commands need to be non-interactive. That’s why you’ll see -y’s and -q’s and –non-interactive flags cropping up now and again.

So let’s set up this container. I had four requirements that I knew about: Java 8, Maven 3.1+, git, and the tracker itself.

Java 8: Not available from official Ubuntu repositories, so I had to add a repository: ppa:debupd8team, update apt, install the dependency and then update the java alternatives to make sure Java was linked correctly. Directly inspired by this question on Ask Ubuntu.

Other requirements: software-properties-common for add-apt-repositories

RUN apt-get -y -q update &&\
    apt-get -y -q install software-properties-common &&\
    add-apt-repository ppa:webupd8team/java &&\
    apt-get -y update &&\
    echo oracle-java8-installer shared/accepted-oracle-license-v1-1 select true | sudo /usr/bin/debconf-set-selections &&\
    apt-get -y -q install oracle-java8-installer &&\
    update-java-alternatives -s java-8-oracle

Maven 3.1+:  Less easy (!). Not only unavailable from official repositories, but you actually have to download the .deb package and install with gdebi, then link mvn to bin. Taken from this article.

RUN apt-get install -y -q gdebi &&\
    wget http://ppa.launchpad.net/natecarlson/maven3/ubuntu/pool/main/m/maven3/maven3_3.2.1-0~ppa1_all.deb &&\
    gdebi --non-interactive maven3_3.2.1-0~ppa1_all.deb &&\
    ln -s /usr/share/maven3/bin/mvn /usr/bin/mvn

Git: Easiest of all. Just install from official repositories.

RUN apt-get -y -q install git

Tracker: I cloned tracker from the github page and changed into its directory, and then built. I encountered a few errors.

First, tracker uses Bower, which complains and dies if you try and run it as root. I needed to add –allow-root to the pom.xml file to get around this problem (thanks again StackExchange).

Second, Phantom JS or something it depended on wanted to use make for some reason, so I had to add that to the list of things to install.

RUN git clone https://github.com/morgantaschuk/tracker.git &&\
    cd tracker &&\
    apt-get -y -q install make &&\
    mvn install

Add runtime commands with CMD

Commands under the CMD tag are not executed when building the container, but only when you run a container.

Yes, RUN is for build and CMD is for run. Like a boss.

Once tracker is built, it can be deployed in Jetty using Maven, primarily for testing purposes. Since we’re starting a new context, we have to navigate back inside the tracker folder to use Maven. I could have specified the relative path to the POM but Maven sometimes gets tetchy about that.

CMD cd tracker && mvn -Djetty.port=9999 jetty:run

The web server will launch on http://localhost:9999 in the container.

Build it!

I built the container (continually and incrementally) using

 sudo docker build -t tracker .

With a surprisingly small amount of time and minimal frustration, I had a Docker container of my very own.

The full Dockerfile is available on github.

Running the container

Once the container is built, it’s simple to start:

sudo docker run tracker

It started! I can see regular pings showing the server is listening for requests. But I very rapidly ran into two problems:

  1. It’s not showing up in my browser!
  2. HOW DO I KILL IT?

I’ll address the second question first.

How do I kill it??

I immediately discovered that pressing Ctrl-C did not kill my Docker container.

It's IGNORING me!
It’s IGNORING me!

After another quick Google, I opened another terminal and typed sudo docker ps. This command gave me the container ID, which I used to sudo docker kill that sucker.

$ sudo docker ps
 CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
 c10aa59c02e9 tracker "/bin/sh -c 'cd trac About a minute ago Up About a minute 0.0.0.0:9999->9999/tcp cocky_archimedes
 $ sudo docker kill c10aa59c02e9
 c10aa59c02e9

Much better. In order to be able to kill a container with Ctrl-C, you have to add -i and -t to the run command. Source

sudo docker run -i -t tracker

Now we can kill it like any normal beast.

It’s not showing up in my browser!

The webserver launches the app on port 9999 as I specified, but that port is only available inside the container. I need to publish the container’s port to the host machine to have the app appear at localhost:9999. This is the question I cheated on and asked someone, but pretty easily found through Google if you have the right keywords.

This is another easy fix. I added -p cport:hport to the run command, where cport is the container’s port number (9999) and hport is the host’s port number, where the cport will be published (also 9999).

sudo docker run -i -t -p 9999:9999 tracker

TAH DAH

Now the tracker is running from my Docker container and it is in my browser and it is beautiful.

tracker

Conclusion

The punch line: Neither Stuart Watt nor I can use Docker at work because it requires root to run. We’re fortunate enough at OICR to have root on our local computers, but I can’t use Docker to deploy tracker on one of our servers.

That said, I still think this exercise is useful for a few reasons. Convincing your code to run in Docker is like giving your code to someone else to run for the first time, but less embarrassing. You’ll find all of the hardcoded paths, implicit dependencies, and broken repositories immediately. I’m not saying it’ll force you into better software practices, but it will force you to compensate for and document suspicious choices. And even if someone can’t launch the Dockerfile directly, they can open the simple text file and see all of the dependencies and where they can acquire them.

Blog posts take longer to write than Docker containers do. If you have a piece of software, consider spending an hour or two and throwing it into Docker. Some poor grad student or staff scientist somewhere will thank you.

References and acknowledgements

are scattered throughout the text. Many thanks to Stuart for his quick fixes and advice.


Even though I’m pretty much a Docker expert already, let me know in the comments if you have any tips or corrections.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s