Back to articles
Volume: 32 | Article ID: art00003
Improving Multimodal Localization Through Self-Supervision
  DOI :  10.2352/ISSN.2470-1173.2020.6.IRIACV-014  Published OnlineJanuary 2020

Modern warehouses utilize fleets of robots for inventory management. To ensure efficient and safe operation, real-time localization of each agent is essential. Most robots follow metal tracks buried in the floor and use a grid of precisely mounted RFID tags for localization. As robotic agents in warehouses and manufacturing plants become ubiquitous, it would be advantageous to eliminate the need for these metal wires and RFID tags. Not only do they suffer from significant installation costs, the removal of wires would allow agents to travel to any area inside the building. Sensors including cameras and LiDAR have provided meaningful localization information for many different positioning system implementations. Fusing localization features from multiple sensor sources is a challenging task especially when the target localization task’s dataset is small. We propose a deep-learning based localization system which fuses features from an omnidirectional camera image and a 3D LiDAR point cloud to create a robust robot positioning model. Although the usage of vision and LiDAR eliminate the need for the precisely installed RFID tags, they do require the collection and annotation of ground truth training data. Deep neural networks thrive on lots of supervised data, and the collection of this data can be time consuming. Using a dataset collected in a warehouse environment, we evaluate the performance of two individual sensor models for localization accuracy. To minimize the need for extensive ground truth data collection, we introduce a self-supervised pretraining regimen to populate the image feature extraction network with meaningful weights before training on the target localization task with limited data. In this research, we demonstrate how our self-supervision improves accuracy and convergence of localization models without the need for additional sample annotation.

Subject Areas :
Views 64
Downloads 1
 articleview.views 64
 articleview.downloads 1
  Cite this article 

Robert Relyea, Darshan Bhanushali, Karan Manghi, Abhishek Vashist, Clark Hochgraf, Amlan Ganguly, Andres Kwasinski, Michael E. Kuhl, Raymond Ptucha, "Improving Multimodal Localization Through Self-Supervisionin Proc. IS&T Int’l. Symp. on Electronic Imaging: Intelligent Robotics and Industrial Applications using Computer Vision,  2020,  pp 14-1 - 14-8,

 Copy citation
  Copyright statement 
Copyright © Society for Imaging Science and Technology 2020
Electronic Imaging
Society for Imaging Science and Technology
7003 Kilworth Lane, Springfield, VA 22151 USA