Exploring Invariances in Deep Convolutional Neural Networks Using Synthetic Images (original) (raw)

Crowdsourced 3D CAD models are becoming easily accessible online, and can potentially generate an infinite number of training images for almost any object category. We show that adapting contemporary Deep Convolutional Neural Net (DCNN) models to such data can be effective, especially in the few-shot regime where none or only a few annotated real images are available, or where the images are not well matched to the target domain. Little is known about the degree of realism necessary to train models with deep features on CAD data. In a detailed analysis, we use synthetic images to probe DCNN invariance to object-class variations caused by 3D shape, pose, and photorealism, with surprising findings. In particular, we show that DC-NNs used as a fixed representation exhibit a large amount of invariance to these factors, but, if allowed to adapt, can still learn effectively from synthetic data. These findings guide us in designing a method for adaptive training of DCNNs using real and synthetic data. We show that our approach significantly outperforms previous methods on the benchmark PASCAL VOC2007 dataset when learning in the fewshot scenario, and outperform training with real data in a domain shift scenario on the Office benchmark.