Setup of an object detector
This tutorial sets an image object detector that will distinguish among 21 objects. The detector returns a bounding box for every detected object, centered around it along with a label, e.g. person, car, … This tutorial uses a pre-trained deep neural net on the VOC task.
A few examples:
The detecting service allows for an application to send images and to receive the set of bounding boxes per image in return, in JSON format.
The following pre-supposes that DeepDetect runs as a Docker container, see how to quickstart.
Setting up the pre-trained model
mkdir models
This prepares the model directory.
Setting up the detector service
Let’s start the DeepDetect server:
docker run -d -p 8080:8080 -v /path/to/models:/opt/models/ jolibrain/deepdetect_cpu
and create a service:
curl -X PUT "http://localhost:8080/services/ilsvrc_googlenet" -d '{
"description": "image classification service",
"mllib": "caffe",
"model": {
"init": "https://deepdetect.com/models/init/desktop/images/detection/detection_voc0712.tar.gz",
"repository": "/opt/models/detection_voc0712",
"create_repository": true
},
"parameters": {
"input": {
"connector": "image"
}
},
"type": "supervised"
}
'
This should yield:
{
“status”:{
“code”:201,
“msg”:“Created”
}
}
And this is all it takes to setup the pre-trained model.
Testing object detection
We can now pass any image filepath or URL to our object detector, here is an example:
curl -X POST "http://localhost:8080/predict" -d '{
"service":"imageserv",
"parameters":{
"output":{
"bbox": true,
"confidence_threshold": 0.1
}
},
"data":["https://photos.wi.gcs.trstatic.net/e9hyHkaRFZdDV_jLZuTS6jcYq1eLUfiFzfl9zNavmNuoyZ-3UCX_EGg6D5TNU--V-f-z2CT8Kg0u3mF9gccUiA"]
}'
yields:
{
"status": {
"msg": "OK",
"code": 200
},
"body": {
"predictions": [
{
"classes": [
{
"cat": "bird",
"prob": 0.8333460688591003,
"bbox": {
"xmin": 67.03402709960938,
"ymin": 414.25286865234375,
"ymax": 64.85651397705078,
"xmax": 354.663330078125
}
},
{
"cat": "person",
"prob": 0.5956286191940308,
"bbox": {
"xmin": 75.99663543701172,
"ymin": 475.9880676269531,
"ymax": 66.72187805175781,
"xmax": 363.94293212890625
}
},
{
"cat": "person",
"prob": 0.2928898334503174,
"bbox": {
"xmin": 495.8335876464844,
"ymin": 735.4041748046875,
"ymax": 506.434326171875,
"xmax": 652.080078125
}
},
{
"cat": "person",
"prob": 0.24435117840766907,
"bbox": {
"xmin": 437.17041015625,
"ymin": 540.1434936523438,
"ymax": 111.70045471191406,
"xmax": 633.19970703125
}
},
{
"cat": "bird",
"prob": 0.16601955890655518,
"bbox": {
"xmin": 40.96523666381836,
"ymin": 280.6235046386719,
"ymax": 71.90843200683594,
"xmax": 259.4865417480469
}
},
{
"cat": "person",
"prob": 0.12583601474761963,
"bbox": {
"xmin": 358.8877868652344,
"ymin": 763.8483276367188,
"ymax": 532.8911743164062,
"xmax": 491.5361022949219
}
},
{
"cat": "person",
"last": true,
"prob": 0.11492644995450974,
"bbox": {
"xmin": 213.4755096435547,
"ymin": 793.69287109375,
"ymax": 545.5011596679688,
"xmax": 355.6097717285156
}
}
],
"uri": "https://photos.wi.gcs.trstatic.net/e9hyHkaRFZdDV_jLZuTS6jcYq1eLUfiFzfl9zNavmNuoyZ-3UCX_EGg6D5TNU--V-f-z2CT8Kg0u3mF9gccUiA"
}
]
},
"head": {
"method": "/predict",
"service": "imgserv",
"time": 1903
}
}
The resulting JSON contains:
- bounding boxes as
bbox
JSON objects - the estimated category
cat
of the object - the confidence of the detection as a probability
prob
, the higher the better
Note that confidence_threshold
allows to remove any prediction that has a prob
strictly below the threshold.
You can look at the object detection Python script to generate the bounding boxes: