DeepDetect release v0.15.0
DeepDetect v0.15.0 was released last week. Below we review the main features, fixes and additions.
DeepDetect v0.15 was released last week with new SWA optimizer for deep nets, fixes and improvements from time-series forecasting with seasonality to object detectors.https://t.co/00WZppZ2xh— jolibrain (@jolibrain) April 2, 2021
DD is our swiss army knife for applied deep learning!
All docker https://t.co/ISsZ3ljAM4
- Stochastic Weight Averaging (SWA) for training with the torch backend
- All our builds are now using Ubuntu 20.04
- Many fixes & improvements
Fixes & improvements
- MAE and MSE metrics for time-series
- Faster training database creation for training from large time-serie datasets
- Improvements to NBEATs for time-series by capturing any seasonality via a dedicated block
- Torch model publishing with the DeepDetect Platform
- Fix to a rare spurious object detection decoding issue
- CPU version:
docker pull jolibrain/deepdetect_cpu:v0.15.0
- GPU (CUDA only):
docker pull jolibrain/deepdetect_gpu:v0.15.0
- GPU (CUDA and Tensorrt) :
docker pull jolibrain/deepdetect_cpu_tensorrt:v0.15.0
- GPU with torch backend:
docker pull jolibrain/deepdetect_gpu_torch:v0.15.0
All images available on https://hub.docker.com/u/jolibrain
Training with SWA
Stochastic Weight Averaging was described in Averaging Weights Leads to Wider Optima and Better Generalization published at UAI-18, and in subsequent works (e.g. https://arxiv.org/abs/1902.02476).
It is a simple optimization trick that does help reaching better generalization when training deep neural networks. SWA averages the values found in weight space by SGD or any other optimizer. There are two elements to SWA:
- The learning rate schedule is modified so that the optimizer keeps finding new solutions instead of narrowing down / converging onto one local minima
- The algorithm keeps a running average of the traversed weights.
In practice, this leads to the finding of optima that is wider than that found without SWA. This is usually correlated with better test error, generalization and robustness of the trained networks.
This Pytorch blog post on SWA has a good description of inner details, applications and results.
In our implementation with DeepDetect, SWA is implemented in C++ for the torch backend and the
RANGER_PLUS optimizers. These optimizers are already state-of-the-art, so adding SWA makes sense for even better trained models. As a remainder
RAdam with lookahead, and our DeepDetect
adabelief and gradient centralization.
All these combinations of recent improvements from the academia keep DeepDetect optimizers at the fringe of the state of the art. And at Jolibrain we do this for a reason: automation is our motto, and thus improving optimizers directly, automates and improves all models and hyper-parameter finding at once. It does not solve all difficulties, but it pushes the results overall into the right direction.
To use SWA with DeepDetect Server or Platform alike is made very simple: add
"swa":true to your training call with the torch backend, and make sure to set your
solver_type to either
swa can be combined with SAM, setting
sam to true, as demonstrated on some simple results below.