Training high accuracy Object Detectors

7 May 2021

Training an object detector using Pytorch and DeepDetect

Pytorch is the most widely used framework in ML research and industry today. Pytorch also offers one of the most versatile and useful pure C++ API among the Deep Learning frameworks.

For these reasons our focus has been to develop the DeepDetect C++ torch backend so that it is as complete as possible wrt most of the automation task DeepDetect supports: computer vision, NLP, time-series and CSV data mostly.

This short post details the last addition to the DeepDetect torch backend for training high accuracy object detectors.

High accuracy object detection models

DeepDetect supports a range of object detection deep neural architectures. These have been selected and battle-tested so that they cover a useful range of trade-off between computational requirements (FLOPS) and accuracy.

The rational is that easier object detection tasks can use lighter architectures and unlock high FPS, i.e. up around ~1500FPS on a single GPU. On the opposite side, some detection tasks are more difficult and/or require very high accuracies that are only within reach of much larger and computational deep neural architectures.

Faster-RCNN is a single, unified network for object detection. The RPC module serves as 'attention' for this unified network, see

For this second type of tasks, thaks to the new addition of detection via the torch backend, DeepDetect now embeds the three architectures below:

  • RefineDet and VoVNet-RefineDet: single-stage architectures trained with DeepDetect using our custom version of caffe and exportable to TensorRT for the fastest inference performances
  • Faster-RCNN: a two-stages architecture trained with DeepDetect torch backend and exportable to ONNX and TensorRT
  • RetinaNet: a two-stages architecture trained with Deepetect torch backend and exportable to ONNX and TensorRT

In what follows, the few steps to train a high accuracy objector are detailed, with no code involved via the DeepDetect Open Source Server.

We are using the Cars dataset that is part of the DeepDetect Open Source Platform.

Model setup

For this example use the Faster R-CNN model implementation from the Pytorch Vision repository. Pytorch Vision contains multiple models that achieve SotA performances, and provides everything we need to train these models using DeepDetect Server or Platform.

The first step is to convert the model to Pytorch jit, since the original implementation is in python and we want to run the model on our C++ server. The model can be traced using a script available directly from the DeepDetect repository

python3 tools/torch/ -v fasterrcnn_resnet50_fpn -o /opt/platform/models/training/fasterrcnn_cars/ --num_classes 2

The model head is replaced with an untrained head with the correct number of classes for our problem. Here we use a pretrained Faster R-CNN with a Resnet 50 as a backbone. Faster RCNN can also be exported with other backbones, for example:

python3 tools/torch/ -v fasterrcnn --backbone resnet18 -o /opt/platform/models/training/fasterrcnn_cars/ --num_classes 2


We can create the service by calling the DeepDetect API:

curl -X PUT 'http://localhost:8080/services/fasterrcnn_train' -d '{
    "description": "Torch Faster R-CNN training on cars",
    "mllib": "torch",
    "model": {
        "repository": "/opt/platform/models/training/fasterrcnn_cars/"
    "parameters": {
        "input": {
            "connector": "image",
            "finetuning": true
        "output": {
            "bbox": true,
            "store_config": true
    "mltype": "detection",
    "type": "supervised"

And start the training with:

curl -X POST 'http://localhost:8080/train' -d '{
   "async": true,

All the training process can be monitored from the DeepDetect platform or by polling from the command line with:

curl -X GET "http://localhost:8080/train?service=fasterrcnn_train&job=1"

Below are the metrics as reported on the Platform’s Web UI:

Training Web UI

Using the trained model

Once the model has finished training, it can be published and used for prediction.

Predict Web UI