By using a new materials system, we developed invisible passive infrared markers that can take on various visual foreground patterns and colors, including white. The material can be coated over many different surfaces such as paper, plastic, wood, metal, and others. Dual-purpose signs are demonstrated where the visual foreground is for human view while the infrared background is for machine view. By hiding digital information in the infrared spectral range, we can enable fiducial markers to enter public spaces without introducing any intrusive visual features for humans. These fiducial markers are robust and easy to detect using off-the-shelf near infrared cameras to assist robot positioning and object identification. This can reduce the barrier for low-cost robots, that are currently deployed in warehouses and factories, to enter offices, stores, and other public spaces and to work alongside with people.
Modern warehouses utilize fleets of robots for inventory management. To ensure efficient and safe operation, real-time localization of each agent is essential. Most robots follow metal tracks buried in the floor and use a grid of precisely mounted RFID tags for localization. As robotic agents in warehouses and manufacturing plants become ubiquitous, it would be advantageous to eliminate the need for these metal wires and RFID tags. Not only do they suffer from significant installation costs, the removal of wires would allow agents to travel to any area inside the building. Sensors including cameras and LiDAR have provided meaningful localization information for many different positioning system implementations. Fusing localization features from multiple sensor sources is a challenging task especially when the target localization task’s dataset is small. We propose a deep-learning based localization system which fuses features from an omnidirectional camera image and a 3D LiDAR point cloud to create a robust robot positioning model. Although the usage of vision and LiDAR eliminate the need for the precisely installed RFID tags, they do require the collection and annotation of ground truth training data. Deep neural networks thrive on lots of supervised data, and the collection of this data can be time consuming. Using a dataset collected in a warehouse environment, we evaluate the performance of two individual sensor models for localization accuracy. To minimize the need for extensive ground truth data collection, we introduce a self-supervised pretraining regimen to populate the image feature extraction network with meaningful weights before training on the target localization task with limited data. In this research, we demonstrate how our self-supervision improves accuracy and convergence of localization models without the need for additional sample annotation.
Modern robotic systems allow you to automate processes and increase employee productivity. To create such systems, finite state machines (sensor systems) and machine vision systems are used. The scope of their application may be the development of a robotic system within the framework of the INDUSTRY 4.0 project. The article proposes an approach based on combining data obtained by fusion from cameras working in various electromagnetic ranges. An approach is proposed that is based on a fusion of data on the shape, boundaries, and parameters of objects. The search for the boundaries and shape of objects is based on the layer-bylayer simplification of images with the allocation of local features at each level. The search for local features is based on the allocation of local stationary sections, the allocation of the boundaries of objects, the determination of their parameters and the combination of information in a single information space. The search for boundaries is based on the use of the method of combined image analysis with a joint analysis of the L2 norm criterion. As a measure of a discrepancy, the square of the difference of the input value and the resulting estimate is used. As an example, the results of fusing images based on the combination of infrared data, RGB data, and IR cameras are presented.
To achieve one of the tasks required for disaster response robots, this paper proposes a method for locating 3D structured switches’ points to be pressed by the robot in disaster sites using RGBD images acquired by Kinect sensor attached to our disaster response robot. Our method consists of the following five steps: 1)Obtain RGB and depth images using an RGB-D sensor. 2) Detect the bounding box of switch area from the RGB image using YOLOv3. 3)Generate 3D point cloud data of the target switch by combining the bounding box and the depth image.4)Detect the center position of the switch button from the RGB image in the bounding box using Convolutional Neural Network (CNN). 5)Estimate the center of the button’s face in real space from the detection result in step 4) and the 3D point cloud data generated in step3) In the experiment, the proposed method is applied to two types of 3D structured switch boxes to evaluate the effectiveness. The results show that our proposed method can locate the switch button accurately enough for the robot operation.
Face detection is crucial to computer vision and many similar applications. Past decades have witnessed great progress in solving this problem. Contrary to traditional methods, recently many researchers have proposed a variety of CNN(Convolutional Neural Network) methods and have given out impressive results in diverse ways. Although many comprehensive evaluations or reviews about face detection are available, very few focuses on small face detection strategies. In this paper, we systematically survey some of the prevailing methods; divide them into two categories and compare them qualitatively on three real-world image data sets in terms of mAP. The experimental results show that feature pyramid with multiple predictors can produce better performance, which is helpful in future direction of research work.
Overweight vehicles are a common source of pavement and bridge damage. Especially mobile crane vehicles are often beyond legal per-axle weight limits, carrying their lifting blocks and ballast on the vehicle instead of on a separate trailer. To prevent road deterioration, the detection of overweight cranes is desirable for law enforcement. As the source of crane weight is visible, we propose a camera-based detection system based on convolutional neural networks. We iteratively label our dataset to vastly reduce labeling and extensively investigate the impact of image resolution, network depth and dataset size to choose optimal parameters during iterative labeling. We show that iterative labeling with intelligently chosen image resolutions and network depths can vastly improve (up to 70×) the speed at which data can be labeled, to train classification systems for practical surveillance applications. The experiments provide an estimate of the optimal amount of data required to train an effective classification system, which is valuable for classification problems in general. The proposed system achieves an AUC score of 0.985 for distinguishing cranes from other vehicles and an AUC of 0.92 and 0.77 on lifting block and ballast classification, respectively. The proposed classification system enables effective road monitoring for semi-automatic law enforcement and is attractive for rare-class extraction in general surveillance classification problems.
State departments of transportation often maintain extensive “video logs” of their roadways that include signs, lane markings, as well as non-image-based information such as grade, curvature, etc. In this work we use the Roadway Information Database (RID), developed for the Second Strategic Highway Research Program, as a surrogate for a video log to design and test algorithms to detect rumble strips in the roadway images. Rumble strips are grooved patterns at the lane extremities designed to produce an audible queue to drivers who are in danger of lane departure. The RID contains 6,203,576 images of roads in six locations across the United States with extensive ground truth information and measurements, but the rumble strip measurements (length and spacing) were not recorded. We use an image correction process along with automated feature extraction and convolutional neural networks to detect rumble strip locations and measure their length and pitch. Based on independent measurements, we estimate our true positive rate to be 93% and false positive rate to be 10% with errors in length and spacing on the order of 0.09 meters RMS and 0.04 meters RMS. Our results illustrate the feasibility of this approach to add value to video logs after initial capture as well as identify potential methods for autonomous navigation.
We present a novel method for super-resolution (SR) of license plate images based on an end-to-end convolutional neural networks (CNN) combining generative adversial networks (GANs) and optical character recognition (OCR). License plate SR systems play an important role in number of security applications such as improvement of road safety, traffic monitoring or surveillance. The specific task requires not only realistic-looking reconstructed images but it also needs to preserve the text information. Standard CNN SR and GANs fail to accomplish this requirment. The incorporation of the OCR pipeline into the method also allows training of the network without the need of ground truth high resolution data which enables easy training on real data with all the real image degradations including compression.
In this work, we explore the ability to estimate vehicle fuel consumption using imagery from overhead fisheye lens cameras deployed as traffic sensors. We utilize this information to simulate vision-based control of a traffic intersection, with a goal of improving fuel economy with minimal impact to mobility. We introduce the ORNL Overhead Vehicle Data set (OOVD), consisting of a data set of paired, labeled vehicle images from a ground-based camera and an overhead fisheye lens traffic camera. The data set includes segmentation masks based on Gaussian mixture models for vehicle detection. We show the data set utility through three applications: estimation of fuel consumption based on segmentation bounding boxes, vehicle discrimination for vehicles with large bounding boxes, and fine-grained classification on a limited number of vehicle makes and models using a pre-trained set of convolutional neural network models. We compare these results with estimates based on a large open-source data set of web-scraped imagery. Finally, we show the utility of the approach using reinforcement learning in a traffic simulator using the open source Simulation of Urban Mobility (SUMO) package. Our results demonstrate the feasibility of the approach for controlling traffic lights for better fuel efficiency based solely on visual vehicle estimates from commercial, fisheye lens cameras.