Docker for researchers

Jørgen Aarmo Lund

5/9/23

Introduction

Jørgen Aarmo Lund, industry PhD student at the UiT Machine Learning group for DIPS AS
- “Data-driven pathways”: inferring usage patterns in patient record systems from auditing logs
- Researching explainability, natural language processing
DIPS develops e-health systems: patient records, laboratory services, hospital kiosks, and more
- Gradually moving over applications to containers

Agenda

Part 1: Introduction to containers
- What are containers?
- Why are they useful for ML research?
- Getting started with Docker
Part 2: Putting together our own container images
- Basic Dockerfile syntax
- Where does the model go?
- Debugging tips
Part 3: Deploying containers for ML research
- GPU and device access
- Deploying to UiT’s GPU cluster
- Deploying to NRIS HPC clusters

Follow along

Files available on

https://github.com/jaalu/vigs-docker-workshop

Motivation for software developers

IT around 2008¹: developers handing applications to sysadmins maintaining long-lived servers
- Downtime for manual installation
- Server and application maintenance intertwined
- Conflicts between dependencies
Containers allow isolating applications and running them with their own set of dependencies

Motivation for ML researchers

Replicability: making experimental conditions visible
Flexibility: easing transition from laptop tests to HPC training
Reusability: showing findings work in other settings too!

Docker

Docker allows isolating your script into a container, which:
- Runs isolated from other processes while sharing the OS
- Can package their own set of dependencies
- Can be packaged and started on other servers, including HPC clusters
Maintained by Docker Inc., runtime open source
Docker Desktop packages the software with a GUI, free for researchers

Docker - structure

Key concepts

We separate between containers and images:

A container is a standalone environment with your script and the dependencies it needs
An image is the template for making your container
- Images can be saved to a registry, like Docker Hub

Containers are meant to be disposable: changes you want to keep - like your trained model - should be outside of the container!

Installing Docker - options

Docker Desktop: https://www.docker.com/
Play With Docker: https://labs.play-with-docker.com
- Free online lab with VMs provisioned
Docker also provides an apt repository

Checking that Docker works

When Docker is running, we can get a list of running containers with

$ docker ps 
CONTAINER ID   IMAGE      COMMAND                  CREATED          STATUS         PORTS      NAMES

We can then retrieve an image with docker pull:

$ docker pull hello-world

We can then build a container from the image with docker run:

$ docker run hello-world

Running containers - custom commands

Images specify a default command, but we can specify one ourselves in docker run:

$ docker run ubuntu echo Hello!
Hello!

Running containers - detached

Default: containers do not accept any input, but write to the terminal
More likely you want a container which runs detached in the background, with --detach or -d:

$ docker run -d hello-world
$ docker ps -a
CONTAINER ID   IMAGE         COMMAND         CREATED              STATUS                          PORTS     NAMES
d3a5ee04babd   hello-world   "/hello"        About a minute ago   Exited (0) About a minute ago             elated_feistel
$ docker logs elated_feistel

NOTE: Docker options placed before the container image and the command

Running containers - interactive

Alternatively, we can specify that the container should set up a shell and accept input with --interactive --tty, or -it for short:

$ docker run -it python:3.9
Python 3.9.16 (main, May  4 2023, 06:16:43) 
[GCC 10.2.1 20210110] on linux
Type "help", "copyright", "credits" or "license" for more information.

Running containers - cleaning up:

Containers will stick around after they finish running
Nice for checking logs, restarting, but list easily clogged
Passing --rm will delete the container after it exits:

$ docker run --rm hello-world
$ docker ps -a

Running containers - configuration

We can set environment variables in the container with --env or -e:

$ docker run -e MODEL_ARCH=resnet ubuntu

If we want to expose network ports (e.g. for dashboards) we can map ports from the container to the host with -p:

$ docker run -p 8080:80 httpd

NOTE: the order is host-container, so -p 8080:80 will connect port 80 on the container to port 8080 on the host

Where do we keep the model?

“Containers are meant to be disposable: changes you want to keep should be outside of the container”

So where do we store the trained models?

Where could we keep the model?

Embed it as part of the image
- Not an option for training, gives us large images
Copy it to/from the container after start
- docker cp can copy files
- Can bump into runtime storage limits
Upload to/download from online server
- Extra warmup time, network traffic
- Weights & Biases, Hugging Face libraries provide functionality for this

Where should we keep the model?

Bind mounts
- Creates a temporary link between a directory on the host PC and a directory in the container
- Pros: Can see the directory, pull files quickly
- Cons: Assumes the storage is on your PC, not as flexible as volumes
Volumes
- Docker creates and manages a persistent directory
- Pros: More flexible, can set up plugins to mount cloud storage as volumes
- Cons: Requires a running (temporary) container to copy files to host

Mapping bind mounts

To mount a directory with a bind mount we can use --mount:

$ docker run --mount type=bind,source=$(pwd)/assets/,target=/pictures/ ubuntu

source points to the folder on the host (assets in the working directory), and target is the folder it will appear as in the container (/pictures/)

Setting up a volume

To set up a volume we run docker volume create:

$ docker volume create my-volume
$ docker volume inspect my-volume

We can then mount it in the same way, but with type=volume:

$ docker run --mount type=volume,source=my-volume,target=/results/ ubuntu

Break

Creating an image - FROM

Template file conventionally named Dockerfile (no extension)
Images start with a FROM statement, which specifies which image (ubuntu) and tag (22.04) to build on:

FROM ubuntu:22.04

If you omit the tag, Docker will automatically use latest - but use one if you can!

Building an image

This turns out to be a complete Dockerfile:

FROM ubuntu:22.04

Save this as Dockerfile (no extension!) in a project directory, and run

$ docker build . -t my-image

docker build expects a build context - the directory to pull project files from when building the image - like the current directory .
-t ties the new image to a name and a tag

Creating an image - RUN

Once we have a base, we specify which commands to run to build the environment
RUN statements are followed by commands to run - for instance, we can install packages:

RUN apt-get update && apt-get install -y python3

Creating an image - COPY

Earlier, we specified the build context (usually the project directory)
COPY copies files from the build context to the container:

COPY train.py /experiment/train.py

We can set the working directory in the container with WORKDIR

WORKDIR /experiment/

The .dockerignore file specifies which files not to copy from the context

Creating an image - ENV

We can also specify which settings the container expects/respects when it runs - conventionally done through environment variables
ENV sets the default value for an environment variable:
Default values can be overridden with docker run -e

Creating an image - CMD

Finally, we specify what should happen when the container starts with CMD:

CMD python train.py

Full image example

Using all of the commands:

FROM ubuntu:22.04
ENV BATCH_SIZE=128
RUN apt-get update && apt-get install -y python3
COPY train.py /experiment/train.py
CMD python3 /experiment/train.py

Debugging tip 1

What if: your image fails to build on step 14?
You can temporarily turn off Buildkit to get intermediate images for each layer: On Linux:

$ DOCKER_BUILDKIT=0 docker build .

On Windows:

set DOCKER_BUILDKIT=0& docker build .

You can create a container from the intermediate image and retry the step:

$ docker exec -it <intermediate-image-id> bash 
# pip install ...

Debugging tip 2

What if: the training suddenly stops and nothing happens?
Find the running container

$ docker ps -a
e593fff04794   postgres   "docker-entrypoint.s…"   10 seconds ago   Up 9 seconds   5432/tcp   stupefied_elion

and start an interactive shell inside it:

$ docker exec -it stupefied_elion bash

Archiving images

You can also save images as archives with docker save

$ docker save mnist-demo > mnist-demo.tar

and load them with docker load

$ docker load < mnist-demo.tar

Making cache-friendly images

Images are composed of multiple layers: each set of changes made by RUN and COPY makes up a layer. Try:

$ docker image inspect python

To avoid new images for every build, Docker saves extra info for each layer:
- For RUN, the command is saved
- For COPY, a checksum of the added files is saved
If the commands run/the files added by the layer are the same, and the last layer is the same, the layer is reused

Making cache-friendly images pt. 2

For this reason, the most frequent and smallest changes should come last in your image, e.g.

Installing system packages
Installing Python packages
Adding your script
Running your script

The commands should, as far as possible, have the same results each time you run them

System package installation

When using apt, group together updating and package installation:

apt-get update && apt-get install -y python3

NOTE: This doesn’t guarantee that you get the latest packages every time you build, but makes sure the package index makes sense when you install the packages
Possible to lock apt packages to specific versions, but specifying the distro usually sufficient

Language package installation

Generally we want a lock file with exact package versions, which we can then restore as part of building the image
In Python, pip freeze produces a list of all packages installed

$ pip freeze > requirements.txt

Good practice to set up a virtual environment with venv to isolate exactly which packages the project needs
In R, renv (formerly packrat) can create similar virtual environments with lock files:

renv::snapshot() # saves project deps to renv.lock
renv::restore() # restores deps from renv.lock

M1 Mac: installing amd64 packages

Common problem on M1 Macs: older packages/libraries without ARM binaries
We can ask Docker to have the container act as an Intel Mac with --platform=linux/amd64:

docker run --platform=linux/amd64 ubuntu uname -a

Building images - conclusion

Order changes from least to most frequent
Commands should be deterministic
Virtual environments and lock files useful to have reproducible containers

GPU/device access

Nvidia Container Runtime lets you run CUDA code in containers:

$ apt-get install nvidia-container-runtime

Can use the --gpus switch to grant access to GPU:

$ docker run --gpus all ubuntu

Windows: requires WSL 2, newer CUDA driver
Premade CUDA Docker images: https://github.com/NVIDIA/nvidia-docker/wiki/CUDA
Also possible to grant access to USB/hardware devices with --device

Packaging for cloud services

You can set up your own container registry with the cloud provider:
- Amazon ECR
- Azure Container Registry
- Google Cloud Container Registry
To push the image to your cloud registry, tag your image with the URL of the registry and push:

$ docker tag mnist-demo jludemo.azurecr.io/mnist-demo
$ docker push jludemo.azurecro.io/mnist.demo

Packaging for Springfield (UiT)

UiT’s Springfield cluster uses Kubernetes for orchestration across multiple nodes
Docker Desktop allows setting up your PC as a single-node Kubernetes cluster
Kubernetes lets us define a Job which runs one or more containers until completion:

kind: Job
apiVersion: batch/v1
metadata:
  name: your-training-job
spec:
  template:
    spec:
      containers:
      - name: your-training
        image: "your-training-image"
        workingDir: /storage
        command: ["sh", "train.sh"]
        volumeMounts:
        - name: storage
          mountPath: /storage
      volumes:
      - name: storage
        persistentVolumeClaim:
          claimName: storage
      restartPolicy: OnFailure
  backoffLimit: 0

Packaging for NRIS HPC (Betzy, LUMI)

As long as architecture is the same, we can now push images and run the containers on HPC clusters!
However: container libraries most likely not tuned for the HPC cluster
- If libraries are available as Lmod modules, the modules will be faster
But containers are still useful for
- Portability
- Specifying package/library versions

Packaging for NRIS HPC - Singularity (Betzy, LUMI)

Singularity, the container runtime installed on NRIS HPC computers, supports converting Docker images to Singularity .sif images. Running

$ singularity pull --name train.sif docker://jlu015/train:latest

will retrieve the image jlu015/train from Docker Hub and save it as train.sif

To run the default command specified by CMD, we can call singularity run with the image:

$ singularity run train.sif

Alternatively, singularity exec will run a specific command:

$ singularity exec train.sif echo Hello world!

Packaging for NRIS HPC - Writing a SLURM job (Betzy, LUMI)

See https://documentation.sigma2.no/code_development/guides/containers.html#singularity-in-job-scripts

Resources

Docker in Y Minutes:
- https://learnxinyminutes.com/docs/docker/
The Play with Docker exercises:
- https://training.play-with-docker.com/
NRIS’ documentation on containers:
- https://documentation.sigma2.no/