Setup of an object detector

This tutorial sets an image object detector that will distinguish among 21 objects. The detector returns a bounding box for every detected object, centered around it along with a label, e.g. person, car, … This tutorial uses a pre-trained deep neural net on the VOC task.

A few examples:

The detecting service allows for an application to send images and to receive the set of bounding boxes per image in return, in JSON format.

The following pre-supposes that DeepDetect runs as a Docker container, see how to quickstart.

Setting up the pre-trained model

mkdir models

This prepares the model directory.

Setting up the detector service

Let’s start the DeepDetect server:

docker run -d -p 8080:8080 -v /path/to/models:/opt/models/ jolibrain/deepdetect_cpu

and create a service:

curl -X PUT "http://localhost:8080/services/ilsvrc_googlenet" -d '{
    "description": "image classification service",
    "mllib": "caffe",
    "model": {
        "init": "",
        "repository": "/opt/models/detection_voc0712",
    "create_repository": true
    "parameters": {
        "input": {
            "connector": "image"
    "type": "supervised"

This should yield:


And this is all it takes to setup the pre-trained model.

Testing object detection

We can now pass any image filepath or URL to our object detector, here is an example:

curl -X POST "http://localhost:8080/predict" -d '{
           "bbox": true,
       "confidence_threshold": 0.1


  "status": {
    "msg": "OK",
    "code": 200
  "body": {
    "predictions": [
        "classes": [
            "cat": "bird",
            "prob": 0.8333460688591003,
            "bbox": {
              "xmin": 67.03402709960938,
              "ymin": 414.25286865234375,
              "ymax": 64.85651397705078,
              "xmax": 354.663330078125
            "cat": "person",
            "prob": 0.5956286191940308,
            "bbox": {
              "xmin": 75.99663543701172,
              "ymin": 475.9880676269531,
              "ymax": 66.72187805175781,
              "xmax": 363.94293212890625
            "cat": "person",
            "prob": 0.2928898334503174,
            "bbox": {
              "xmin": 495.8335876464844,
              "ymin": 735.4041748046875,
              "ymax": 506.434326171875,
              "xmax": 652.080078125
            "cat": "person",
            "prob": 0.24435117840766907,
            "bbox": {
              "xmin": 437.17041015625,
              "ymin": 540.1434936523438,
              "ymax": 111.70045471191406,
              "xmax": 633.19970703125
            "cat": "bird",
            "prob": 0.16601955890655518,
            "bbox": {
              "xmin": 40.96523666381836,
              "ymin": 280.6235046386719,
              "ymax": 71.90843200683594,
              "xmax": 259.4865417480469
            "cat": "person",
            "prob": 0.12583601474761963,
            "bbox": {
              "xmin": 358.8877868652344,
              "ymin": 763.8483276367188,
              "ymax": 532.8911743164062,
              "xmax": 491.5361022949219
            "cat": "person",
            "last": true,
            "prob": 0.11492644995450974,
            "bbox": {
              "xmin": 213.4755096435547,
              "ymin": 793.69287109375,
              "ymax": 545.5011596679688,
              "xmax": 355.6097717285156
        "uri": ""
  "head": {
    "method": "/predict",
    "service": "imgserv",
    "time": 1903

The resulting JSON contains:

  • bounding boxes as bbox JSON objects
  • the estimated category cat of the object
  • the confidence of the detection as a probability prob, the higher the better

Note that confidence_threshold allows to remove any prediction that has a prob strictly below the threshold.

You can look at the object detection Python script to generate the bounding boxes:
