Setup of an object detector

This tutorial sets an image object detector that will distinguish among 21 objects. The detector returns a bounding box for every detected object, centered around it along with a label, e.g. person, car, … This tutorial uses a pre-trained deep neural net on the VOC task.

A few examples:

The detecting service allows for an application to send images and to receive the set of bounding boxes per image in return, in JSON format.

The following pre-supposes that DeepDetect runs as a Docker container, see how to quickstart.

Setting up the pre-trained model


mkdir models

This prepares the model directory.

Setting up the detector service

Let’s start the DeepDetect server:


docker run -d -p 8080:8080 -v /path/to/models:/opt/models/ jolibrain/deepdetect_cpu

and create a service:


curl -X PUT "http://localhost:8080/services/ilsvrc_googlenet" -d '{
    "description": "image classification service",
    "mllib": "caffe",
    "model": {
        "init": "https://deepdetect.com/models/init/desktop/images/detection/detection_voc0712.tar.gz",
        "repository": "/opt/models/detection_voc0712",
    "create_repository": true
    },
    "parameters": {
        "input": {
            "connector": "image"
        }
    },
    "type": "supervised"
}
'

This should yield:


{
  “status”:{
    “code”:201,
    “msg”:“Created”
  }
}

And this is all it takes to setup the pre-trained model.

Testing object detection

We can now pass any image filepath or URL to our object detector, here is an example:


curl -X POST "http://localhost:8080/predict" -d '{
       "service":"imageserv",
       "parameters":{
         "output":{
           "bbox": true,
       "confidence_threshold": 0.1
         }
       },
       "data":["https://photos.wi.gcs.trstatic.net/e9hyHkaRFZdDV_jLZuTS6jcYq1eLUfiFzfl9zNavmNuoyZ-3UCX_EGg6D5TNU--V-f-z2CT8Kg0u3mF9gccUiA"]
     }'

yields:


{
  "status": {
    "msg": "OK",
    "code": 200
  },
  "body": {
    "predictions": [
      {
        "classes": [
          {
            "cat": "bird",
            "prob": 0.8333460688591003,
            "bbox": {
              "xmin": 67.03402709960938,
              "ymin": 414.25286865234375,
              "ymax": 64.85651397705078,
              "xmax": 354.663330078125
            }
          },
          {
            "cat": "person",
            "prob": 0.5956286191940308,
            "bbox": {
              "xmin": 75.99663543701172,
              "ymin": 475.9880676269531,
              "ymax": 66.72187805175781,
              "xmax": 363.94293212890625
            }
          },
          {
            "cat": "person",
            "prob": 0.2928898334503174,
            "bbox": {
              "xmin": 495.8335876464844,
              "ymin": 735.4041748046875,
              "ymax": 506.434326171875,
              "xmax": 652.080078125
            }
          },
          {
            "cat": "person",
            "prob": 0.24435117840766907,
            "bbox": {
              "xmin": 437.17041015625,
              "ymin": 540.1434936523438,
              "ymax": 111.70045471191406,
              "xmax": 633.19970703125
            }
          },
          {
            "cat": "bird",
            "prob": 0.16601955890655518,
            "bbox": {
              "xmin": 40.96523666381836,
              "ymin": 280.6235046386719,
              "ymax": 71.90843200683594,
              "xmax": 259.4865417480469
            }
          },
          {
            "cat": "person",
            "prob": 0.12583601474761963,
            "bbox": {
              "xmin": 358.8877868652344,
              "ymin": 763.8483276367188,
              "ymax": 532.8911743164062,
              "xmax": 491.5361022949219
            }
          },
          {
            "cat": "person",
            "last": true,
            "prob": 0.11492644995450974,
            "bbox": {
              "xmin": 213.4755096435547,
              "ymin": 793.69287109375,
              "ymax": 545.5011596679688,
              "xmax": 355.6097717285156
            }
          }
        ],
        "uri": "https://photos.wi.gcs.trstatic.net/e9hyHkaRFZdDV_jLZuTS6jcYq1eLUfiFzfl9zNavmNuoyZ-3UCX_EGg6D5TNU--V-f-z2CT8Kg0u3mF9gccUiA"
      }
    ]
  },
  "head": {
    "method": "/predict",
    "service": "imgserv",
    "time": 1903
  }
}

The resulting JSON contains:

  • bounding boxes as bbox JSON objects
  • the estimated category cat of the object
  • the confidence of the detection as a probability prob, the higher the better

Note that confidence_threshold allows to remove any prediction that has a prob strictly below the threshold.

You can look at the object detection Python script to generate the bounding boxes:

Related