Why Deep Learning with C++?

Why Deep Learning with C++

At Jolibrain, we’ve been building deep learning system for over five years now. Some of us had been in the academia working on AI Planning for robotics sometimes for over a decade. There, C++ is natural fit, because the need for automation meets embedded performances.

So in 2015 C++ felt natural to us for Deep Learning, and thus DeepDetect is written with C++.

This short post is about sharing our experience, good & bad, regarding C++ for Deep Learning, as of today in early 2021.

This is not one of these Python vs C++ posts. This content is about engineering, making the right choices, reporting in good faith, share accomplishments and frustrations alike!

The points to be reviewed are below:

Performances: when and how C++ matters
Cost of coding neural networks with modern C++
Caveats with C++ for Deep Learning in 2021
Training & Re-training, in what Deep Learning application is C++ most useful
Deep Learning in industry with C++

This is the way

Performances: when and how C++ matters

A common consideration is that C++ matters mostly for deep neural network inference. This is true of course. Since training is dominated by a for loop over dense GPU operations, small overhead is negligible when training, but is not for inference.

But there is more and looking in a slightly more detailed manner reveals some subtleties.

Python is the dominant data science programming language, and appears much more malleable and easier for hacking and research purposes than C++. No doubt about this. The community around Python data science is active and shares widely, very exciting. So in practice, inference with Python, expecially in cloud applications, is fine.
Most of the dominant Deep Learning frameworks are full C++, with Python and bindings for other languages on top. So in practice, it’s always compiled C++ running. One may even argue that on NVidia GPUs, it’s all CUDNN running, almost independently of the framework!
The inference claim may be mitigated. C++ may run faster sometimes, if not all the time, but possibly by small margins. Also Python libraries are often C and extremely if not beautifully optimized.
C++ is conforting in one way, if not by syntax, and that is because it’s optimized by default. What is meant here is C++ often reduces the amount of tweaking for high performance. This is no law, more of a statistical observation.

So what does this means in practice ?

In our experience, C++ shines if your data pipeline, your neural net and training / inference code is generic, reusable and targets multiple platforms.
Python and higher controlling languages are the good choice if you need to research or tweak various techniques first.

Cost of coding neural networks with modern C++

It’s notorious that C++ code might take longer to produce than the same functionnality in Python for instance. This of course depends on engineering level of experience, and there are other caveats:

Development cost is impacted by variables such as salary, and existing codebases to start from. Here C++ is more costly, no doubt in our experience. Now, mid or long term, this may be less obvious.
Cross-platform code, and embedded targets most especially are more demanding, both in time and know-how. Here C++ has an advantage in our experience since the codebase is often unchanged and locally optimized. Also, tweaking of garbage collectors and other higher level subtleties can be avoided.
Modern C++ starting with C++-11 has been a big change. Code is now much cleaner, lambdas and async calls make very efficient programming quick. OpenMP spreads execution on a controllable number of cores with no code change, such a relief!

What to take for this:

If a deep learning application needs to click well with industrial core applications, C++ might be the way to go. Same applies to highly numerical computations, though numpy and pandas are great libraries.
If time to market is to be prioritized over production, performances and costs, sticking with upper Deep Learning programming layers is safe.

Caveats with C++ for Deep Learning in 2021

As of 2021, the general trends for C++ in Deep Learning development as we can experience it is as follows:

Most frameworks like Tensorflow and Pytorch have improved their C++ APIs. We work with libtorch mostly, the C++ API from Pytorch, and it’s been improving very quickly.
These improvements actually reflect a higher rate of adoption among machine learnists and engineers alike. We believe that as Deep Learning has matured, it is now entering a wide range of industries, where many professional stacks are actually C++.

There are many caveats still. Mostly, the C++ counterpart to a e.g. Python functionality often lags behind, sometimes for no real reason. We give two examples below, related to libtorch.

Multi-GPU training with libtorch lags behind its Python counterpart. Pytorch now default to DDP for multi-GPU scaling. This is a great granular implementation of interleaved operations. It is not yet available for C++. However, and this is the point, it’s all written with C++ underneath, from multi-machines gradient sharing to distributed updates! This is good news actually, since this means it is simply a matter of popularity of use, not a technological lock of some sort.
Working with torch::Tensor in C++ was ugly compared to the beautiful indexing operations in Python. Hopefully, C++ with libtorch is now on-par with its Python counterpart.

This shows that the trend is better C++ exposition for Deep Learning, and the reported range of applications in interesting as well.

Training and Re-Training, in what Deep Learning applications is C++ most useful

In practice, the main model lifetime loop is training, usually with Python, testing, then conversion to C++ (e.g. Pytorch’s torchscript). Any update goes back to Python training, and runs the full loop again.

There are application cases Jolibrain has encountered where the conversion across languages, and model conversion can be avoided:

Whenever no human labeling is required, the production loop can include retraining the models automatically, through CI and with regression tests. A few examples:
- In general any self-supervised setting, in which an upgrade to the dataset can trigger the training step automatically;
- Time-series can be fit with models that train from the past to forecast the future trends;
- Whenever training data is synthetic, the training loop can be automated.
Embedded training/re-training: on embedded platforms, small retraining or gradient steps as model updates can be implemented directly with C++.

What to take from this: when the syntactic sugar of higher level languages is not need because there’s no human in the loop, C++ can be used directly in production for both training & inference.

At Jolibrain we’ve built closed-loop systems that run onboard flight test airliners, trains and embedded cameras on construction sites.

Deep Learning in industry with C++

There’s a range of industrial applications that are written with C++:

Large numerical operations in CFD or other settings
Critical controllers in autonomous and semi-autonomous systems.

Deep Learning has matured enough to be of very high interest to these industries. In practice our experience is that C++ is great way to quickly integrate with these applications.

Technically, directly linking to an application with C++ allows share memory, zero-copy of possibly large input tensors, and especially on GPUs.

For these reasons, we observe C++ for Deep Learning is a trend that is already growing and should continue to do so in the coming years.

Blog