Frugal models: strategies for deep models with small data

19 August 2021

Deep Learning models can be understood as super-human replacements to hand-coded applications.

However, among others, there are two very common bottlenecks when building deep neural networks in practice for industrial applications: heavy compute & data volume.

Jolibrain was recently invited to share its results on building Deep Learning applications in industry, and especially under low data and sometimes no labels. This post summarizes some of the presented elements.

Deep learning bottlenecks

Heavy compute

Compute requirements, especially when training large neural networks, have kept growing of the past decade. Fortunately, the technology underlying the GPUs has sustained this growth for a good part. The remaining factors are the number of GPUs and the code to distribute the training phase across large number of machines and GPUs.

So compute is a matter of costs mostly, and ambition, maybe.

Data volume

Data however, isn’t as universally available as it seems. Typically, many very useful and highly seeked applications are found around defect spotting, anomaly awareness, and other rare events detection. These applications naturally suffer from a lack of data, or at least extremely imbalanced datasets in which the rare occurences of interest are under-represented both in volume and variety. At Jolibrain we refer to these cases as the high and low data regimes, and study how to cope with them when using Deep Learning techniques.

More applications even fall into the category for which there’s data, but no to very low amounts of labels. This is because most of the time labeling cannot be fully automated (as it would defeats the purpose of automation!), and hand-labeling may be very costly, and sometimes even impossible (e.g. in metrology). We refer to theses situations as being the low labels or no label regimes of Deep Learning applications.

Data regimes of Deep Learning

Data regimes of Deep Learning

The question arises then in practice, that taking the required compute for granted and universally available, what are the applicable strategies to the data and labels regimes of Deep Learning applications ?

Jolibrain has an extensive in practice know-how of these situations. This is because they are very common when solving real-world applications. And a solution path has been found and tested on many combinations of these situations.

We refer to it as the building of frugal models.

High data regime

In the high data regime, the model building process is constrained mostly by the available labels.

  • A first common technique when data is available, independently of the label volume (can be zero) is self-supervision. Self-supervision allows training from unlabeled data by filling up artificially created holes in the data, or other self-contained puzzles.

  • A second technique that works in the low labels regime with high data is self-distillation. Interestingly, and quite mysteriously, training a model against a former version of itself, can improve test accuracy, and allow to label some more data automatically along the way. The gain usually plateaus after a few iterations.

Low data regime

In the low data regime, there can be only low label volume or no label at all. The latter is studied in the next section, so here we look at the low data/low label setup.

  • In this situation, transfer learning is a very common solution path. The key is to find an alternate task with high data and a larger volume of labels that bears similarity or intersects the original target task. This is easily understood with computer vision tasks since image understanding at large requires near universal pattern recognition capabilities at the lowest level of details (e.g. think of contours, colors, basic shapes, …).

  • Domain randomization, also sometimes known as data augmentation is a widely used technique to artificially grow the data and in some cases the labels as well.

  • Another less common, though very powerful technique is what we refer to as semantic domain adaptation at Jolibrain. Basically domain adaptation trains a neural model to turn data from a domain A into data from a domain B (e.g. cats to dogs, or synthetic data to real-looking data). It is a very powerful technique since it can in practice find even highly non-linear and highly parametrized transforms in between domains of interest.

Semantic domain adaptation not only transforms data but labels as well. A typical use-case is taking synthetic data and labels from a hand-written simulator as domain A, and transform them into domain-B looking data and labels.

This technique is actually so powerful and has proven so successful in Jolibrain customers’ application solving, that we’ve built it into an Open Source product as joliGAN. This earlier blog post on joliGAN details a sample application.

Low labels vs no labels

The low labels and most especially the no label situations are actually very common.

The sole technique that works in both cases is the generation of synthetic data. This is widely applicable in the case of physical systems, that can be simulated directly from the laws of physics typically (e.g. Navier-Stokes, …). As mentioned above the remaining differences to real measurements can be mitigated with domain adaptation.


There’s a rational view to the data regimes of Deep Learning applications in practice. This allows devising useful replicable strategies to unlock automation even under low data and labels, and even no labels at all. This short post summarizes the few solution paths that can be tested and even mixed.

Over a handful of customers now, Jolibrain has been able to obtain amazingly positive results building AI applications from synthetic data alone. A future blog post should go into more details on our process and its results.