DeepDetect v0.13.0


DeepDetect release v0.13.0

DeepDetect v0.13.0 was released a couple weeks ago. Below we review the main features, fixes and additions.

In summary:

  • NCNN backend for efficient and lightweight inference on ARM and CPUs:
    • Support for batches of multiple images in inference
    • Ability to use NCNN models with multi-model chains
    • Updated to the latest NCNN code
  • Basic image data augmentation for vision models trained with the Torch backend

Improvements to the NCNN backend

NCNN is a great library for neural network inference on ARM and embedded GPU devices.

DeepDetect has had support for NCNN for a while now, with custom modifications for running even complex recurrent networks for time-series and OCR.

DeepDetect v0.13.0 updates NCNN to its latest and unlocks to new capabilities.

Inference with multiple images

Sending multiple images at once is mostly useful for performance reasons on GPUs. Since NCNN is mostly dedicated to CPUs and ARM in particular, it is not a built-in feature per se. The underlying reason is that inference time remains linear the number of images, more or less.

However, in practice it is easier to have the ability to send multiple images at once through the API. With NCNN, DeepDetect dispatches the jobs on multiple CPU cores, as available.

Using NCNN models in chains

DeepDetect chains are multi-model pipelines or interleaved models and actions. Typically, a detection model followed by a cropping action, and possibly another model that works from the crop, e.g. an OCR reading detected text elements.

DeepDetect chains are very useful in production as more tasks are solved with deep neural networks, and as complicated software pieces are traded away for multiple models.

Using NCNN models in chains brings complex processing down to embedded systems or CPU cloud machines, for lightweight inference. Very useful to our customers and users alike.

Basic data augmentation with the Torch backend

This blog has already reported on the novel transformer-based architectures, here and here. These architectures require larger data volumes than convolution-based deep networks.

This is mostly due to the global attention layers, that offer a better capture of patterns at all stages of the networks, with the drawbacks of requiring more data.

One common way to remediate to the lack of data is to rely on synthetic data augmentation. This aspect is absolutely critical for transformer-based architecture and the literature keeps on reporting how difficult these models are to train without very agressive data augmentation techniques.

DeepDetect v0.13.0 now includes the placeholder for online data augmentation while training computer vision models with the C++-based torch backend. In addition to the placeholder a small set of augmentation transforms has already implemented, including image mirroring, rotation and cutting with random patches.

This is the first step toward implementing very agressive data augmentation policies and unlocking the power of transformer architectures while keeping the full simplicity of DeepDetect API and deployment procedures.