Direct inference with TensorFlow 2¶
TensorFlow 2 is available since CMSSW_11_1_X (cmssw#28711, cmsdist#5525). The integration into the software stack can be found in cmsdist/tensorflow.spec and the interface is located in cmssw/PhysicsTools/TensorFlow.
Available versions¶
TensorFlow | el8_amd64_gcc10 | el8_amd64_gcc11 |
---|---|---|
v2.6.0 | ≥ CMSSW_12_3_4 | - |
v2.6.4 | ≥ CMSSW_12_5_0 | ≥ CMSSW_12_5_0 |
TensorFlow | slc7_amd64_gcc900 | slc7_amd64_gcc10 | slc7_amd64_gcc11 |
---|---|---|---|
v2.1.0 | ≥ CMSSW_11_1_0 | - | - |
v2.3.1 | ≥ CMSSW_11_2_0 | - | - |
v2.4.1 | ≥ CMSSW_11_3_0 | - | - |
v2.5.0 | ≥ CMSSW_12_0_0 | ≥ CMSSW_12_0_0 | - |
v2.6.0 | ≥ CMSSW_12_1_0 | ≥ CMSSW_12_1_0 | ≥ CMSSW_12_3_0 |
v2.6.4 | - | ≥ CMSSW_12_5_0 | ≥ CMSSW_13_0_0 |
TensorFlow | slc7_amd64_gcc900 |
---|---|
v2.1.0 | ≥ CMSSW_11_1_0 |
v2.3.1 | ≥ CMSSW_11_2_0 |
At this time, only CPU support is provided. While GPU support is generally possible, it is currently disabled due to some interference with production workflows but will be enabled once they are resolved.
Software setup¶
To run the examples shown below, create a mininmal inference setup with the following snippet. Adapt the SCRAM_ARCH
according to your operating system and desired compiler.
1 2 3 4 5 6 7 8 9 10 |
|
Below, the cmsml
Python package is used to convert models from TensorFlow objects (tf.function
's or Keras models) to protobuf graph files (documentation). It should be available after executing the commands above. You can check its version via
python -c "import cmsml; print(cmsml.__version__)"
and compare to the released tags. If you want to install a newer version from either the master branch of the cmsml repository or the Python package index (PyPI), you can simply do that via pip.
# into your user directory (usually ~/.local)
pip install --upgrade --user git+https://github.com/cms-ml/cmsml
# _or_
# into a custom directory
pip install --upgrade --prefix "CUSTOM_DIRECTORY" git+https://github.com/cms-ml/cmsml
# into your user directory (usually ~/.local)
pip install --upgrade --user cmsml
# _or_
# into a custom directory
pip install --upgrade --prefix "CUSTOM_DIRECTORY" cmsml
Saving your model¶
After successfully training, you should save your model in a protobuf graph file which can be read by the interface in CMSSW. Naturally, you only want to save that part of your model that is required to run the network prediction, i.e., it should not contain operations related to model training or loss functions (unless explicitely required). Also, to reduce the memory footprint and to accelerate the inference, variables should be converted to constant tensors. Both of these model transformations are provided by the cmsml
package.
Instructions on how to transform and save your model are shown below, depending on whether you use Keras or plain TensorFlow with tf.function
's.
The code below saves a Keras Model
instance as a protobuf graph file using cmsml.tensorflow.save_graph
. In order for Keras to built the internal graph representation before saving, make sure to either compile the model, or pass an input_shape
to the first layer:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
|
Following the Keras naming conventions for certain layers, the input will be named "input"
while the output is named "sequential/output/Softmax"
. To cross check the names, you can save the graph in text format by using the extension ".pb.txt"
.
Let's consider you write your network model in a single tf.function
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
|
In TensorFlow terms, the model
function is polymorphic - it accepts different types of the input tensor x
(tf.float32
, tf.float64
, ...). For each type, TensorFlow will create a concrete function with an associated tf.Graph
object. This mechanism is referred to as signature tracing. For deeper insights into tf.function
, the concepts of signature tracing, polymorphic and concrete functions, see the guide on Better performance with tf.function
.
To save the model as a protobuf graph file, you explicitely need to create a concrete function. However, this is fairly easy once you know the exact type and shape of all input arguments.
20 21 22 23 24 25 26 27 28 29 30 |
|
The input will be named "x"
while the output is named "y"
. To cross check the names, you can save the graph in text format by using the extension ".pb.txt"
.
Different method: Frozen signatures
Instead of creating a polymorphic tf.function
and extracting a concrete one in a second step, you can directly define an input signature upon definition.
@tf.function(input_signature=(tf.TensorSpec(shape=[2, 10], dtype=tf.float32),))
def model(x):
...
This disables signature tracing since the input signature is frozen. However, you can directly pass it to cmsml.tensorflow.save_graph
.
Inference in CMSSW¶
The inference can be implemented to run in a single thread. In general, this does not mean that the module cannot be executed with multiple threads (cmsRun --numThreads <N> <CFG_FILE>
), but rather that its performance in terms of evaluation time and especially memory consumption is likely to be suboptimal. Therefore, for modules to be integrated into CMSSW, the multi-threaded implementation is strongly recommended.
CMSSW module setup¶
If you aim to use the TensorFlow interface in a CMSSW plugin, make sure to include
1 2 3 |
|
in your plugins/BuildFile.xml
file. If you are using the interface inside the src/
or interface/
directory of your module, make sure to create a global BuildFile.xml
file next to theses directories, containing (at least):
1 2 3 4 5 |
|
Single-threaded inference¶
Despite tf.Session
being removed in the Python interface as of TensorFlow 2, the concepts of
Graph
's, containing the constant computational structure and trained variables of your model,Session
's, handling execution and data exchange, and- the separation between them
live on in the C++ interface. Thus, the overall inference approach is 1) include the interface, 2) initialize Graph
and session
, 3) per event create input tensors and run the inference, and 4) cleanup.
1. Includes¶
1 2 3 4 |
|
2. Initialize objects¶
1 2 3 4 5 6 7 8 |
|
3. Inference¶
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
|
4. Cleanup¶
1 2 |
|
Full example¶
Click to expand
The example assumes the following directory structure:
MySubsystem/MyModule/
│
├── plugins/
│ ├── MyPlugin.cpp
│ └── BuildFile.xml
│
├── test/
│ └── my_plugin_cfg.py
│
└── data/
└── graph.pb
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 |
|
1 2 3 4 5 6 |
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
|
Multi-threaded inference¶
Compared to the single-threaded implementation above, the multi-threaded version has one major difference: both the Graph
and the Session
are no longer members of a particular module instance, but rather shared between all instances in all threads. See the documentation on the C++ interface of stream
modules for details.
Recommendation updated
The previous recommendation stated that the Session
is not constant and thus, should not be placed in the global cache, but rather created once per stream module instance. However, it was discovered that, although not explicitely declared as constant in the tensorflow::run()
/ Session::run()
interface, the session is actually not changed during evaluation and can be treated as being effectively constant.
As a result, it is safe to move it to the global cache, next to the Graph
object. The TensorFlow interface in CMSSW was adjusted in order to accept const
objects in cmssw#40161.
Thus, the overall inference approach is 1) include the interface, 2) let your plugin inherit from edm::stream::EDAnalyzerasdasd
and declare the GlobalCache
, 3) store in cconst Session*
, pointing to the cached session, and 4) per event create input tensors and run the inference.
1. Includes¶
1 2 3 4 |
|
Note that stream/EDAnalyzer.h
is included rather than one/EDAnalyzer.h
.
2. Define and use the global cache¶
The cache definition is done by declaring a simple struct. However, for the purpose of just storing a graph and a session object, a so-called tensorflow::SessionCache
struct is already provided centrally. It was added in cmssw#40284 and its usage is shown in the following. In case the tensorflow::SessionCache
is not (yet) available in your version of CMSSW, expand the "Custom cache struct" section below.
Use it in the edm::GlobalCache
template argument and adjust the plugin accordingly.
1 2 3 4 5 6 7 8 9 |
|
Implement initializeGlobalCache
to control the behavior of how the cache object is created. You also need to implement globalEndJob
, however, it can remain empty as the destructor of tensorflow::SessionCache
already handles the closing of the session itself and the deletion of all objects.
std::unique_ptr<tensorflow::SessionCache> MyPlugin::initializeGlobalCache(const edm::ParameterSet& config) {
std::string graphPath = edm::FileInPath(params.getParameter<std::string>("graphPath")).fullPath();
return std::make_unique<tensorflow::SessionCache>(graphPath);
}
void MyPlugin::globalEndJob(const tensorflow::SessionCache* cache) {}
Custom cache struct
1 2 3 4 5 6 7 |
|
Use it in the edm::GlobalCache
template argument and adjust the plugin accordingly.
1 2 3 4 5 6 7 8 9 |
|
Implement initializeGlobalCache
and globalEndJob
to control the behavior of how the cache object is created and destroyed.
See the full example below for more details.
3. Initialize objects¶
In your module constructor, you can get a pointer to the constant session to perform model evaluation during the event loop.
1 2 3 4 5 6 7 8 |
|
4. Inference¶
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
|
Note
If the TensorFlow interface in your CMSSW release does not yet accept const
sessions, line 19 in the example above will cause an error during compilation. In this case, replace session_
in that line to
const_cast<tensorflow::Session*>(session_)
Full example¶
Click to expand
The example assumes the following directory structure:
MySubsystem/MyModule/
│
├── plugins/
│ ├── MyPlugin.cpp
│ └── BuildFile.xml
│
├── test/
│ └── my_plugin_cfg.py
│
└── data/
└── graph.pb
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 |
|
1 2 3 4 5 6 |
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
|
GPU backend¶
By default the TensorFlow sessions get created for CPU running. Since CMSSW_13_1_X the GPU backend for TensorFlow is available in the cmssw release.
Minimal changes are needed in the inference code to move the model on the GPU. A tensorflow::Options
struct is available to setup the backend.
1 2 3 4 5 6 |
|
CMSSW modules should add an options in the PSets
of the producers and analyzers to configure on the fly the TensorFlow backend for the sessions created by the plugins.
Optimization¶
Depending on the use case, the following approaches can optimize the inference performance. It could be worth checking them out in your algorithm.
Further optimization approaches can be found in the integration checklist.
Reusing tensors¶
In some cases, instead of creating new input tensors for each inference call, you might want to store input tensors as members of your plugin. This is of course possible if you know its exact shape a-prioro and comes with the cost of keeping the tensor in memory for the lifetime of your module instance.
You can use
tensor.flat<float>().setZero();
to reset the values of your tensor prior to each call.
Tensor data access via pointers¶
As shown in the examples above, tensor data can be accessed through methods such as flat<type>()
or matrix<type>()
which return objects that represent the underlying data in the requested structure (tensorflow::Tensor
C++ API). To read and manipulate particular elements, you can directly call this object with the coordinates of an element.
// matrix returns a 2D representation
// set element (b,i) to f
tensor.matrix<float>()(b, i) = float(f);
However, doing this for a large input tensor might entail some overhead. Since the data is actually contiguous in memory (C-style "row-major" memory ordering), a faster (though less explicit) way of interacting with tensor data is using a pointer.
// get the pointer to the first tensor element
float* d = tensor.flat<float>().data();
Now, the tensor data can be filled using simple and fast pointer arithmetic.
// fill tensor data using pointer arithmethic
// memory ordering is row-major, so the most outer loop corresponds dimension 0
for (size_t b = 0; b < batchSize; b++) {
for (size_t i = 0; i < nFeatures; i++, d++) { // note the d++
*d = float(i);
}
}
Inter- and intra-operation parallelism¶
Debugging and local processing only
Parallelism between (inter) and within (intra) operations can greatly improve the inference performance. However, this allows TensorFlow to manage and schedule threads on its own, possibly interfering with the thread model inherent to CMSSW. For inference code that is to be officially integrated, you should avoid inter- and intra-op parallelism and rather adhere to the examples shown above.
You can configure the amount of inter- and infra-op threads via the second argument of the tensorflow::createSession
method.
1 |
|
1 2 3 4 5 |
|
Then, when calling tensorflow::run
, pass the internal name of the TensorFlow threadpool, i.e. "tensorflow"
, as the last argument.
1 2 3 4 5 6 7 8 |
|
Miscellaneous¶
Logging¶
By default, TensorFlow logging is quite verbose. This can be changed by either setting the TF_CPP_MIN_LOG_LEVEL
environment varibale before calling cmsRun
, or within your code through tensorflow::setLogging(level)
.
Verbosity level | TF_CPP_MIN_LOG_LEVEL |
---|---|
debug | "0" |
info | "1" (default) |
warning | "2" |
error | "3" |
none | "4" |
Forwarding logs to the MessageLogger
service is not possible yet.
Links and further reading¶
cmsml
package- CMSSW
- TensorFlow
- Keras
Authors: Marcel Rieger