Blog

DeepDetect v0.15.0

2 April 2021

DeepDetect release v0.15.0

DeepDetect v0.15.0 was released last week. Below we review the main features, fixes and additions.

In summary

  • Stochastic Weight Averaging (SWA) for training with the torch backend
  • All our builds are now using Ubuntu 20.04
  • Many fixes & improvements

Fixes & improvements

  • MAE and MSE metrics for time-series
  • Faster training database creation for training from large time-serie datasets
  • Improvements to NBEATs for time-series by capturing any seasonality via a dedicated block
  • Torch model publishing with the DeepDetect Platform
  • Fix to a rare spurious object detection decoding issue

Docker images

  • CPU version: docker pull jolibrain/deepdetect_cpu:v0.15.0
  • GPU (CUDA only): docker pull jolibrain/deepdetect_gpu:v0.15.0
  • GPU (CUDA and Tensorrt) :docker pull jolibrain/deepdetect_cpu_tensorrt:v0.15.0
  • GPU with torch backend: docker pull jolibrain/deepdetect_gpu_torch:v0.15.0

All images available on https://hub.docker.com/u/jolibrain

DeepDetect v0.15.0 Release

Training with SWA

Stochastic Weight Averaging was described in Averaging Weights Leads to Wider Optima and Better Generalization published at UAI-18, and in subsequent works (e.g. https://arxiv.org/abs/1902.02476).

It is a simple optimization trick that does help reaching better generalization when training deep neural networks. SWA averages the values found in weight space by SGD or any other optimizer. There are two elements to SWA:

  • The learning rate schedule is modified so that the optimizer keeps finding new solutions instead of narrowing down / converging onto one local minima
  • The algorithm keeps a running average of the traversed weights.

In practice, this leads to the finding of optima that is wider than that found without SWA. This is usually correlated with better test error, generalization and robustness of the trained networks.

This Pytorch blog post on SWA has a good description of inner details, applications and results.

In our implementation with DeepDetect, SWA is implemented in C++ for the torch backend and the RANGER and RANGER_PLUS optimizers. These optimizers are already state-of-the-art, so adding SWA makes sense for even better trained models. As a remainder RANGER is RAdam with lookahead, and our DeepDetect RANGER_PLUS is RANGER with adabelief and gradient centralization.

All these combinations of recent improvements from the academia keep DeepDetect optimizers as the fringe of the state of the art. And at Jolibrain we do this for a reason: automation is our motto, and thus improving optimizers directly, automates and improves all models and hyper-parameter finding at once. It does not solve all difficulties, but it pushes the results overall into the right direction.

To use SWA with DeepDetect Server or Platform alike is made very simple: add "swa":true to your training call with the torch backend, and make sure to set your solver_type to either RANGER or RANGER_PLUS.

Note that swa can be combined with SAM, setting sam to true, as demonstrated on some simple results below.

Training losses for various combinations of SAM and SWA on small models