Using containers
Containers are a great solution to isolate a software environment, especially in batch systems like lxplus. Currently, two container solutations are supported Apptainer (previously called Singularity), and Docker.
Using Apptainer¶
Quickstart¶
One-line access to TensorFlow + PyTorch + Numpy + JupyterLab on lxplus:
apptainer shell -B /afs -B /eos --nv /cvmfs/unpacked.cern.ch/registry.hub.docker.com/cmsml/cmsml:latest
More information¶
The unpacked.cern.ch service mounts on CVMFS contains many apptainer images, some of which are suitable for machine learning applications. A description of each image is beyond the scope of this document. However, if you find an image useful for your application, you can use it by running an Apptainer container with the appropriate options. For example:
apptainer run --nv --bind <bind_mount_path> /cvmfs/unpacked.cern.ch/<path_to_image>
Examples¶
After installing the package, you can then use GPU-based machine learning algorithms. Two examples are supplied.
-
The first example aims at using a CNN to perform handwritten digits classification with
MNIST
dataset. The notebook can be found at pytorch_mnist. This example is modified from an officialpytorch
example. -
The second example is modified from the simple MLP example from
weaver-benchmark
. The notebook can be found at toptagging_mlp.
Using Docker¶
Docker is currently supported on lxplus9 interactive nodes (through emulation of the CLI with Podman) and on HTCondor for job submission.
This option can be very handy for users, as HTCondor can pull images from any public registry, like DockerHub or GitLab registry. The user can follow this workflow: 1. Define a custom image on top of a commonly available pytorch or tensorflow image 2. Add the desired packages and configuration 3. Push the docker image on a registry 4. Use the image in a HTCondor job
The rest of the page is a step-by-step tutorial for this workflow.
Define the image¶
-
Define a file
Dockerfile
FROM pytorch/pytorch:latest ADD localfolder_with_code /opt/mycode RUN cd /opt/mycode && pip install -e . # or pip install requirements # Install the required Python packages RUN pip install \ numpy \ sympy \ scikit-learn \ numba \ opt_einsum \ h5py \ cytoolz \ tensorboardx \ seaborn \ rich \ pytorch-lightning==1.7 or ADD requirements.txt pip install -r requirements.txt
-
Build the image
docker build -t username/pytorch-condor-gpu:tag .
and push it (after having setup the credentials with
docker login hub.docker.com
)docker push username/pytorch-condor-gpu:tag
-
Setup the condor job with a submission file
submitfile
as:universe = docker docker_image = user/pytorch-condor-gpu:tag executable = job.sh when_to_transfer_output = ON_EXIT output = $(ClusterId).$(ProcId).out error = $(ClusterId).$(ProcId).err log = $(ClusterId).$(ProcId).log request_gpus = 1 request_cpus = 2 +Requirements = OpSysAndVer =?= "CentOS7" +JobFlavour = espresso queue 1
-
For testing purpose one can start a job interactively and debug
condor_submit -interactive submitfile