Blog

Setting up an OCR REST API with DeepDetect

02/19/2021

This article shows how to setup a REST API for an OCR system in five minutes.

  • Goal: setup an API endpoint to which send images and get text position and characters in return
  • Technology:
    • A deep neural object detector that locates text in images
    • A deep neural OCR model that reads detected text into a character string

For this, DeepDetect provides:

  • A REST API for Deep Learning applications
  • Pre-trained models that are free to use
  • A simple way to chain models so that a single API call does all the work

DeepDetect setup

Let’s start a ready-to-use docker image of DeepDetect server with CPU and/or GPU support.

This is as easy as:

docker pull jolibrain/deepdetect_gpu
docker run -d -p 8080:8080 jolibrain/deepdetect_gpu 

For the CPU only version, simply replace _gpu with _cpu in the above calls.

To monitor the logs of the server, you can do:

docker ps
# get the docker id
docker logs -f <dockerid>

That is all for running the Deep Learning REST API DeepDetect server.

Text detector setup

Loading the text detector deep model is as easy as:

curl -X PUT http://localhost:8080/services/word_detect -d '{
 "description": "Word detection",
 "model": {
  "repository": "/opt/word_detect",
  "create_repository": true,
  "init":"https://deepdetect.com/models/init/desktop/images/detection/word_detect_v2.tar.gz"
 },
 "mllib": "caffe",
 "type": "supervised",
 "parameters": {
  "input": {
   "connector": "image"
  }
 }
}'

OCR model setup

Loading the pre-trained OCR model is as easy as executing the command below:

curl -X PUT http://localhost:8080/services/word_ocr -d '{
 "description": "Word ocr",
 "model": {
  "repository": "/opt/multiword_ocr",
  "create_repository": true,
  "init":"https://deepdetect.com/models/init/desktop/images/ocr/multiword_ocr.tar.gz"
 },
 "mllib": "caffe",
 "type": "supervised",
 "parameters": {
  "input": {
   "connector": "image"
  }
 }
}'

Using the OCR REST API

The DeepDetect server is now ready to take queries, let’s try on the image below:

Share the road!

Now a typical API call looks like:

{
    "chain": {
        "calls": [
            {
                "data": [],
                "parameters": {
                    "input": {
                        "connector": "image",
                        "keep_orig": true
                    },
                    "mllib": {
                        "gpu": true
                    },
                    "output": {
                        "bbox": true,
                        "confidence_threshold": 0.25
                    }
                },
                "service": "word_detect",
		"data": ["https://ggwash.org/images/made/images/posts/_resized/sign-share_800_600_90.jpg"]
            },
            {
                "action": {
                    "parameters": {
                        "padding_ratio": 0.1
                    },
                    "type": "crop"
                },
                "id": "crop"
            },
            {
                "parameters": {
                    "input": {
                        "connector": "image"
                    },
                    "mllib": {
                        "gpu": true
                    },
                    "output": {
                        "blank_label": 0,
                        "confidence_threshold": 0,
                        "ctc": true
                    }
                },
                "parent_id": "crop",
                "service": "word_ocr"
            }
        ],
        "name": "ocr_api"
    }
}

The JSON answer contains the words localization and characters:

{
    "body": {
        "predictions": [
            {
                "classes": [
                    {
                        "bbox": {
                            "xmax": 610.9924926757812,
                            "xmin": 566.7879638671875,
                            "ymax": 340.05706787109375,
                            "ymin": 316.2816162109375
                        },
                        "cat": "1",
                        "prob": 0.9992087483406067,
                        "word_ocr": {
                            "classes": [
                                {
                                    "cat": "the",
                                    "last": true,
                                    "prob": 0.9995963592082262
                                }
                            ]
                        }
                    },
                    {
                        "bbox": {
                            "xmax": 626.5315551757812,
                            "xmin": 551.6160888671875,
                            "ymax": 305.5302429199219,
                            "ymin": 278.2040100097656
                        },
                        "cat": "1",
                        "prob": 0.9990863800048828,
                        "word_ocr": {
                            "classes": [
                                {
                                    "cat": "share",
                                    "last": true,
                                    "prob": 0.9981672442518175
                                }
                            ]
                        }
                    },
                    {
                        "bbox": {
                            "xmax": 622.3469848632812,
                            "xmin": 557.6710815429688,
                            "ymax": 383.107177734375,
                            "ymin": 351.66339111328125
                        },
                        "cat": "1",
                        "last": true,
                        "prob": 0.9974628686904907,
                        "word_ocr": {
                            "classes": [
                                {
                                    "cat": "road",
                                    "last": true,
                                    "prob": 0.9990777596831322
                                }
                            ]
                        }
                    }
                ],
                "uri": "https://ggwash.org/images/made/images/posts/_resized/sign-share_800_600_90.jpg"
            }
        ]
    },
    "head": {
        "method": "/chain",
        "time": 4302.0
    },
    "status": {
        "code": 200,
        "msg": "OK"
    }
}

Simply replace https://ggwash.org/images/made/images/posts/_resized/sign-share_800_600_90.jpg with the URL of your choice.

This should have been no more than five minutes by now !