Docker for researchers
5/9/23
Introduction
- Jørgen Aarmo Lund, industry PhD student at the UiT Machine Learning group for DIPS AS
- “Data-driven pathways”: inferring usage patterns in patient record systems from auditing logs
- Researching explainability, natural language processing
- DIPS develops e-health systems: patient records, laboratory services, hospital kiosks, and more
- Gradually moving over applications to containers
Agenda
- Part 1: Introduction to containers
- What are containers?
- Why are they useful for ML research?
- Getting started with Docker
- Part 2: Putting together our own container images
- Basic Dockerfile syntax
- Where does the model go?
- Debugging tips
- Part 3: Deploying containers for ML research
- GPU and device access
- Deploying to UiT’s GPU cluster
- Deploying to NRIS HPC clusters
Follow along
Files available on
https://github.com/jaalu/vigs-docker-workshop
Motivation for software developers
- IT around 20081: developers handing applications to sysadmins maintaining long-lived servers
- Downtime for manual installation
- Server and application maintenance intertwined
- Conflicts between dependencies
- Containers allow isolating applications and running them with their own set of dependencies
Motivation for ML researchers
- Replicability: making experimental conditions visible
- Flexibility: easing transition from laptop tests to HPC training
- Reusability: showing findings work in other settings too!
Docker
- Docker allows isolating your script into a container, which:
- Runs isolated from other processes while sharing the OS
- Can package their own set of dependencies
- Can be packaged and started on other servers, including HPC clusters
- Maintained by Docker Inc., runtime open source
- Docker Desktop packages the software with a GUI, free for researchers
Docker - structure
Key concepts
We separate between containers and images:
- A container is a standalone environment with your script and the dependencies it needs
- An image is the template for making your container
- Images can be saved to a registry, like Docker Hub
Containers are meant to be disposable: changes you want to keep - like your trained model - should be outside of the container!
Installing Docker - options
- Docker Desktop: https://www.docker.com/
- Play With Docker: https://labs.play-with-docker.com
- Free online lab with VMs provisioned
- Docker also provides an
apt repository
Checking that Docker works
- When Docker is running, we can get a list of running containers with
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
- We can then retrieve an image with
docker pull:
$ docker pull hello-world
- We can then build a container from the image with
docker run:
$ docker run hello-world
Running containers - custom commands
- Images specify a default command, but we can specify one ourselves in
docker run:
$ docker run ubuntu echo Hello!
Hello!
Running containers - detached
- Default: containers do not accept any input, but write to the terminal
- More likely you want a container which runs detached in the background, with
--detach or -d:
$ docker run -d hello-world
$ docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
d3a5ee04babd hello-world "/hello" About a minute ago Exited (0) About a minute ago elated_feistel
$ docker logs elated_feistel
- NOTE: Docker options placed before the container image and the command
Running containers - interactive
- Alternatively, we can specify that the container should set up a shell and accept input with
--interactive --tty, or -it for short:
$ docker run -it python:3.9
Python 3.9.16 (main, May 4 2023, 06:16:43)
[GCC 10.2.1 20210110] on linux
Type "help", "copyright", "credits" or "license" for more information.
Running containers - cleaning up:
- Containers will stick around after they finish running
- Nice for checking logs, restarting, but list easily clogged
- Passing
--rm will delete the container after it exits:
$ docker run --rm hello-world
$ docker ps -a
Running containers - configuration
- We can set environment variables in the container with
--env or -e:
$ docker run -e MODEL_ARCH=resnet ubuntu
- If we want to expose network ports (e.g. for dashboards) we can map ports from the container to the host with
-p:
$ docker run -p 8080:80 httpd
- NOTE: the order is host-container, so
-p 8080:80 will connect port 80 on the container to port 8080 on the host
Where do we keep the model?
“Containers are meant to be disposable: changes you want to keep should be outside of the container”
So where do we store the trained models?
Where could we keep the model?
- Embed it as part of the image
- Not an option for training, gives us large images
- Copy it to/from the container after start
docker cp can copy files
- Can bump into runtime storage limits
- Upload to/download from online server
- Extra warmup time, network traffic
- Weights & Biases, Hugging Face libraries provide functionality for this
Where should we keep the model?
- Bind mounts
- Creates a temporary link between a directory on the host PC and a directory in the container
- Pros: Can see the directory, pull files quickly
- Cons: Assumes the storage is on your PC, not as flexible as volumes
- Volumes
- Docker creates and manages a persistent directory
- Pros: More flexible, can set up plugins to mount cloud storage as volumes
- Cons: Requires a running (temporary) container to copy files to host
Mapping bind mounts
To mount a directory with a bind mount we can use --mount:
$ docker run --mount type=bind,source=$(pwd)/assets/,target=/pictures/ ubuntu
source points to the folder on the host (assets in the working directory), and target is the folder it will appear as in the container (/pictures/)
Setting up a volume
To set up a volume we run docker volume create:
$ docker volume create my-volume
$ docker volume inspect my-volume
We can then mount it in the same way, but with type=volume:
$ docker run --mount type=volume,source=my-volume,target=/results/ ubuntu
Creating an image - FROM
- Template file conventionally named
Dockerfile (no extension)
- Images start with a
FROM statement, which specifies which image (ubuntu) and tag (22.04) to build on:
FROM ubuntu:22.04
- If you omit the tag, Docker will automatically use
latest - but use one if you can!
Building an image
- This turns out to be a complete Dockerfile:
FROM ubuntu:22.04
- Save this as
Dockerfile (no extension!) in a project directory, and run
$ docker build . -t my-image
docker build expects a build context - the directory to pull project files from when building the image - like the current directory .
-t ties the new image to a name and a tag
Creating an image - RUN
- Once we have a base, we specify which commands to run to build the environment
RUN statements are followed by commands to run - for instance, we can install packages:
RUN apt-get update && apt-get install -y python3
Creating an image - COPY
- Earlier, we specified the build context (usually the project directory)
COPY copies files from the build context to the container:
COPY train.py /experiment/train.py
- We can set the working directory in the container with
WORKDIR
WORKDIR /experiment/
- The
.dockerignore file specifies which files not to copy from the context
Creating an image - ENV
- We can also specify which settings the container expects/respects when it runs - conventionally done through environment variables
ENV sets the default value for an environment variable:
- Default values can be overridden with
docker run -e
Creating an image - CMD
- Finally, we specify what should happen when the container starts with
CMD:
CMD python train.py
Full image example
Using all of the commands:
FROM ubuntu:22.04
ENV BATCH_SIZE=128
RUN apt-get update && apt-get install -y python3
COPY train.py /experiment/train.py
CMD python3 /experiment/train.py
Debugging tip 1
- What if: your image fails to build on step 14?
- You can temporarily turn off Buildkit to get intermediate images for each layer: On Linux:
$ DOCKER_BUILDKIT=0 docker build .
On Windows:
set DOCKER_BUILDKIT=0& docker build .
- You can create a container from the intermediate image and retry the step:
$ docker exec -it <intermediate-image-id> bash
# pip install ...
Debugging tip 2
- What if: the training suddenly stops and nothing happens?
- Find the running container
$ docker ps -a
e593fff04794 postgres "docker-entrypoint.s…" 10 seconds ago Up 9 seconds 5432/tcp stupefied_elion
- and start an interactive shell inside it:
$ docker exec -it stupefied_elion bash
Archiving images
You can also save images as archives with docker save
$ docker save mnist-demo > mnist-demo.tar
and load them with docker load
$ docker load < mnist-demo.tar
Making cache-friendly images
- Images are composed of multiple layers: each set of changes made by RUN and COPY makes up a layer. Try:
$ docker image inspect python
- To avoid new images for every build, Docker saves extra info for each layer:
- For RUN, the command is saved
- For COPY, a checksum of the added files is saved
- If the commands run/the files added by the layer are the same, and the last layer is the same, the layer is reused
Making cache-friendly images pt. 2
- For this reason, the most frequent and smallest changes should come last in your image, e.g.
- Installing system packages
- Installing Python packages
- Adding your script
- Running your script
- The commands should, as far as possible, have the same results each time you run them
System package installation
- When using
apt, group together updating and package installation:
apt-get update && apt-get install -y python3
NOTE: This doesn’t guarantee that you get the latest packages every time you build, but makes sure the package index makes sense when you install the packages
Possible to lock apt packages to specific versions, but specifying the distro usually sufficient
Language package installation
- Generally we want a lock file with exact package versions, which we can then restore as part of building the image
- In Python,
pip freeze produces a list of all packages installed
$ pip freeze > requirements.txt
- Good practice to set up a virtual environment with
venv to isolate exactly which packages the project needs
- In R,
renv (formerly packrat) can create similar virtual environments with lock files:
renv::snapshot() # saves project deps to renv.lock
renv::restore() # restores deps from renv.lock
M1 Mac: installing amd64 packages
- Common problem on M1 Macs: older packages/libraries without ARM binaries
- We can ask Docker to have the container act as an Intel Mac with
--platform=linux/amd64:
docker run --platform=linux/amd64 ubuntu uname -a
Building images - conclusion
- Order changes from least to most frequent
- Commands should be deterministic
- Virtual environments and lock files useful to have reproducible containers
GPU/device access
- Nvidia Container Runtime lets you run CUDA code in containers:
$ apt-get install nvidia-container-runtime
- Can use the
--gpus switch to grant access to GPU:
$ docker run --gpus all ubuntu
- Windows: requires WSL 2, newer CUDA driver
- Premade CUDA Docker images: https://github.com/NVIDIA/nvidia-docker/wiki/CUDA
- Also possible to grant access to USB/hardware devices with
--device
Packaging for cloud services
- You can set up your own container registry with the cloud provider:
- Amazon ECR
- Azure Container Registry
- Google Cloud Container Registry
- To push the image to your cloud registry, tag your image with the URL of the registry and push:
$ docker tag mnist-demo jludemo.azurecr.io/mnist-demo
$ docker push jludemo.azurecro.io/mnist.demo
Packaging for Springfield (UiT)
- UiT’s Springfield cluster uses Kubernetes for orchestration across multiple nodes
- Docker Desktop allows setting up your PC as a single-node Kubernetes cluster
- Kubernetes lets us define a Job which runs one or more containers until completion:
kind: Job
apiVersion: batch/v1
metadata:
name: your-training-job
spec:
template:
spec:
containers:
- name: your-training
image: "your-training-image"
workingDir: /storage
command: ["sh", "train.sh"]
volumeMounts:
- name: storage
mountPath: /storage
volumes:
- name: storage
persistentVolumeClaim:
claimName: storage
restartPolicy: OnFailure
backoffLimit: 0
Packaging for NRIS HPC (Betzy, LUMI)
- As long as architecture is the same, we can now push images and run the containers on HPC clusters!
- However: container libraries most likely not tuned for the HPC cluster
- If libraries are available as Lmod modules, the modules will be faster
- But containers are still useful for
- Portability
- Specifying package/library versions
Packaging for NRIS HPC - Singularity (Betzy, LUMI)
- Singularity, the container runtime installed on NRIS HPC computers, supports converting Docker images to Singularity .sif images. Running
$ singularity pull --name train.sif docker://jlu015/train:latest
will retrieve the image jlu015/train from Docker Hub and save it as train.sif
- To run the default command specified by
CMD, we can call singularity run with the image:
$ singularity run train.sif
- Alternatively,
singularity exec will run a specific command:
$ singularity exec train.sif echo Hello world!
Packaging for NRIS HPC - Writing a SLURM job (Betzy, LUMI)
See https://documentation.sigma2.no/code_development/guides/containers.html#singularity-in-job-scripts
Resources
- Docker in Y Minutes:
- https://learnxinyminutes.com/docs/docker/
- The Play with Docker exercises:
- https://training.play-with-docker.com/
- NRIS’ documentation on containers:
- https://documentation.sigma2.no/