Training an image classifier service

In another tutorial it was shown how to setup an image classifier from an existing (i.e. pre-trained) neural network model. Here we show how to train this model with DeepDetect and the underlying Caffe library. This yields a useful example on how to train your own image recognition models.

Setup of the ILSVRC12 subset of the Imagenet dataset

The first step is to acquire and setup the dataset. Building a viable, large scope classifier requires a decent amount of data. Here we propose to use the famous subset ILSVRC12 of the Imagenet repository. This subset has originally been designed for a machine learning competition, bears 1000 classes ranging from dogs to cars, and is convenient to demonstrate the training of a good image classifier.

Unfortunately, the raw data is only available to researchers. Since our tutorials do focus on demonstrating the feasability and easyness of the process to everyone, below we give the means to build the dataset by yourselves. However, if you have access to the full dataset, just unrar it, and jump to the next section.

Keep in mind that the whole dataset size is around 138Gb. This means that building it up by downloading the images from the Internet takes a while, and that you need the required amount of space on your hard drive. However, in what follows you can definitely rely on a smaller portion of the dataset. That is, if you dowload 5% of it, you’ll still be able to follow up the steps below, the only difference will be that your final classifier has less than 1000 classes.

Downloading the dataset

In what follows, a list of URLs to images to be download is provided, along with a Python script to download and organize them properly.

Let’s get the list:

gzip -d ilsvrc12_urls.txt.gz

Now, clone the following repository that tracks the download script:

git clone

Note that I am not the original author of this script but that I’ve fixed and improved it. Since the author is not responding and did not merge my PR, my repository above is the one you need.

Now let’s build the dataset from the list of images to be fetched online. Bear in mind that some images have moved and that therefore your final dataset will be slightly different than the one from the ILSVRC12 competition. The script continuously shows you the success rate of image retrieval. Building the full dataset will take around seven hours (~50 images / sec), so again, you may stop after a few % have completed.

To start the download and building of the dataset, go to the repository you want to store the dataset in and call on the, depending on where it is on your system:

python ilsvrc12_urls.txt . --jobs 100 --retry 3 --sleep 0

If for some reason the script dies or stales on you, just restart it, it will skip the already downloaded images. On my machine, I get around 81.2% of the images successfully and the dataset weights around 98Gb.

Note on building a smaller dataset

Note that on a first trial, it is recommended that you download only the the URLs of the first five categories. This allows to test all the steps below without the burden of dealing with millions of files. Below we assume you are using the smaller dataset made of the following five categories: n01440764, n01443537, n01484850, n01491361, n01494475.

Conversion of gif files

Gif images need to be converted as the network we are only deals with static images. This is easily done from the command line, so go into your dataset repository, say ilsvrc12:

cd ilsvrc12
find . -name "*.gif" -print0 | xargs -0 mogrify -format jpg
find . -name "*.gif" -print0 | xargs -0 rm

You need the imagemagick package installed in order to use mogrify.

Creating the service

First, start DeepDetect:

cd deepdetect/build/main

Create a model directory where you want, in the following we refer to the path and model directory as imgnet, and the machine learning service as imageserv. So to create the service:

curl -X PUT "http://localhost:8080/services/imageserv" -d '{
       "description":"image classification service",

In the call above, you can adjust the number of classes as needed (up to 1000, not below 5, due to a specificity of the model, but that can be adjusted).

Training the classifier

The training phase is complex phase. Luckily it is fully automated from within DeepDetect. Basically, the data flow into an image data connector. The connector prepares the data for the neural net and deep learning library. The neural net is trained and tested regularly until completion. At this stage, the machine learning service has a model to use for classifying images automatically. More details on each of the hidden steps:

  • building of training and testing sets of image databases: the image dataset built above is turned into two databases on images, one for training, the other for validating the net regularly along the training process. The rational under the the building of a database, as used by the Caffe deep learning library, is that each image is passed thousands of times to the net and that reading and re-reading from the hard drive is too slow. The database is much more efficient for non sequential access.

  • training of the net: batches of random images are passed to the net for training, the process is repeated until the requested number of iterations has been reached. The training job can be stopped at any time through the API.

Below is a training call for a five classes version of the dataset:

curl -X POST "http://localhost:8080/train" -d '{

The main options are explained as follows:

  • batch_size: the number of training images sent at once for training
  • iterations: the total number of times a batch of images is sent in for training
  • base_lr: the initial learning rate
  • test_split: the part of the training set used for testing, e.g. 0.1 means 10% is used for testing
  • shuffle: whether to shuffle the dataset before training (recommended)
  • measure: the list of measures to be computed and returned as status by the server

For more details, see the API.

Upon the start of the training, the server will output some image file processing information:

INFO - Processed 1000 files.
INFO - Processed 2000 files.
INFO - Processed 3000 files.
INFO - Processed 4000 files.
INFO - Processed 4198 files.

The bash script below calls on the training status every 20 seconds. It should take around 3000 iterations to reach 64% accuracy or so.

while true; do
    out=$(curl -s -X GET "http://localhost:8080/train?service=imageserv&job=1&timeout=20")
    echo $out
    if [[ $out == *"running"* ]]

Testing the classifier

Once training has completed, the service is immediately available for prediction. A simple prediction call looks like this:

curl -X POST "http://localhost:8080/predict" -d '{

Replace ambulance.jpg with your own image.

Note that the trained model is saved on file, and that the service can be safely destroyed without hurting the model. Simply create a new identical service and it will load the existing model, it is then immediately ready for prediction.

Do not expect great results if you did the quick train with 5 classes. Instead, take the steps again with the full 1000 classes dataset, and train a model. Depending on your GPU, this will take up to a few days.