Experimenting with Visformer for computer vision

DeepDetect now has support for one of the promising new architectures for computer vision, the Visformer. This short post introduces the current trends around novel neural achitectures for computer vision, and shows how to experiment with the Visformer in DeepDetect in just under a few minutes. The Visformer mixes the convolutional approach with full-attention transformer layers in an hybrid approach with good properties: the FLOPs/accuracy ratio is 3x better than that of a ResNet-50 with a 2% increase in accuracy.