HTCondor With GPU resources¶
In general, HTCondor supports GPU jobs if there are some worker nodes which are configured with GPU devices. CMS Connect and lxplus both have access to worker nodes equipped with GPUs.
How to require GPUs in HTCondor¶
People can require their jobs to have GPU support by adding the following requirements to the condor submission file.
request_gpus = n # n equal to the number of GPUs required
Further documentation¶
There are good materials providing detailed documentation on how to run HTCondor jobs with GPU support at both machines.
The configuration of the software environment for lxplus-gpu and HTcondor is described in the Software Environments page. Moreover the page Using container explains step by step how to build a docker image to be run on HTCondor jobs.
More available resources¶
- A complete documentation can be found from the
GPUssection in CERN Batch Docs. Where aTensorflowexample is supplied. This documentation also contains instructions on advanced HTCondor configuration, for instance constraining GPU device or CUDA version. -
A good example on submitting GPU HTCondor job @ Lxplus is the
weaver-benchmarkproject. It provides a concrete example on how to setup environment forweaverframework and operate trainning and testing process within a single job. Detailed description can be found at sectionParticleNetof this documentation.In principle, this example can be run elsewhere as HTCondor jobs. However, paths to the datasets should be modified to meet the requirements.
-
CMS Connect also provides a documentation on GPU job submission. In this documentation there is also a
Tensorflowexample.When submitting GPU jobs @ CMS Connect, especially for Machine Learning purpose, EOS space @ CERN are not accessible as a directory, therefore one should consider using
xrootdutilities as documented in this page