During recent years, deep learning methods have shown to be effective for image classification, localization and detection. Convolutional Neural Networks (CNN) are used to extract information from images and are the main element of modern machine learning and computer vision methods. CNNs can be used for logo detection and recognition. Logo detection consist on locate and recognize commercial brand logos within an image. These methods are useful in the areas of online brand management or ad placement. The performance of this methods is closely related on the quantity and the quality of the data, typically image/label pairs, used to train the CNNs. Collecting the pair of images and labels, commonly referred as ground truth, can be expensive and time consuming. Multiple techniques try to solve this problem by either transforming the available data using data augmentation methods or by creating new images from scratch or from other images using image synthesis methods. In this paper, we investigate the latter approach. We segment background images, extract depth information and then blend logo images accordingly in order to create new real looking images. This approach allows us to create an indefinite number of images with a minimum manual labeling effort. The synthetic images can later be used to train CNNs for logo detection and recognition.
Recent progress in deep learning methods has shown that key steps in object detection and recognition, including feature extraction, region proposals, and classification, can be done using Convolutional Neural Networks (CNN) with high accuracy. However, the use of CNNs for object detection and recognition has significant technical challenges that still need to be addressed. One of the most daunting problems is the very large number of training images required for each class/label. One way to address this problem is through the use of data augmentation methods where linear and nonlinear transforms are done on the training data to create "new" training images. Typical transformations include spatial flipping, warping and other deformations. An important concept of data augmentation is that the deformations applied to the labeled training images do not change the semantic meaning of the classes/labels. In this paper we investigate several approaches to data augmentation. First, several data augmentation techniques are used to increase the size of the training dataset. Then, a Faster R-CNN is trained with the augmented dataset for detect and recognize objects. Our work is focused on two different scenarios: detecting objects in the wild (i.e. commercial logos) and detecting objects captured using a camera mounted on a computer system (i.e. toy animals).