HTCondor With GPU resources¶
In general, HTCondor supports GPU jobs if there are some worker nodes which are configured with GPU devices. CMS Connect and lxplus both have access to worker nodes equipped with GPUs.
How to require GPUs in HTCondor¶
People can require their jobs to have GPU support by adding the following requirements to the condor submission file.
request_gpus = n # n equal to the number of GPUs required
Further documentation¶
There are good materials providing detailed documentation on how to run HTCondor jobs with GPU support at both machines.
- A complete documentation can be found from the
GPUs
section in CERN Batch Docs. Where aTensorflow
example is supplied. This documentation also contains instructions on advanced HTCondor configuration, for instance constraining GPU device or CUDA version. -
A good example on submitting GPU HTCondor job @ Lxplus is the
weaver-benchmark
project. It provides a concrete example on how to setup environment forweaver
framework and operate trainning and testing process within a single job. Detailed description can be found at sectionParticleNet
of this documentation.In principle, this example can be run elsewhere as HTCondor jobs. However, paths to the datasets should be modified to meet the requirements.
-
CMS Connect also provides a documentation on GPU job submission. In this documentation there is also a
Tensorflow
example.When submitting GPU jobs @ CMS Connect, especially for Machine Learning purpose, EOS space @ CERN are not accessible as a directory, therefore one should consider using
xrootd
utilities as documented in this page