IS&T | Library

Tangible extended reality with sensor fusion

Abstract

This survey provides a comprehensive overview of LiDAR-based panoptic segmentation methods for autonomous driving. We motivate the importance of panoptic segmentation in autonomous vehicle perception, emphasizing its advantages over traditional 3D object detection in capturing a more detailed and comprehensive understanding of the environment. We summarize and categorize 42 panoptic segmentation methods based on their architectural approaches, with a focus on the kind of clustering utilized: machine learned or non-learned heuristic clustering. We discuss direct methods, most of which use single-stage architectures to predict binary masks for each instance, and clustering-based methods, most of which predict offsets to object centers for efficient clustering. We also highlight relevant datasets, evaluation metrics, and compile performance results on SemanticKITTI and panoptic nuScenes benchmarks. Our analysis reveals trends in the field, including the effectiveness of attention mechanisms, the competitiveness of center-based approaches, and the benefits of sensor fusion. This survey aims to guide practitioners in selecting suitable architectures and to inspire researchers in identifying promising directions for future work in LiDAR-based panoptic segmentation for autonomous driving.

Digital Library: EI

Published Online: February 2025

Article

77 34

Augmented Reality
Haptic
Tangible
Sensor Fusion
Extended Reality
XR
Virtual Reality
Guesture

Yang Cai, Mel Siegel

DOI

10.2352/EI.2023.35.12.ERVR-214

Volume 35

Issue 12

Abstract

Many extended reality systems use controllers, e.g. near-infrared motion trackers or magnetic coil-based hand-tracking devices for users to interact with virtual objects. These interfaces lack tangible sensation, especially during walking, running, crawling, and manipulating an object. Special devices such as the Tesla suit and omnidirectional treadmills can improve tangible interaction. However, they are not flexible for broader applications, builky, and expensive. In this study, we developed a configurable multi-modal sensor fusion interface for extended reality applications. The system includes wearable IMU motion sensors, gait classification, gesture tracking, and data streaming interfaces to AR/VR systems. This system has several advantages: First, it is reconfigurable for multiple dynamic tangible interactions such as walking, running, crawling, and operating with an actual physical object without any controllers. Second, it fuses multi-modal sensor data from the IMU and sensors on the AR/VR headset such as floor detection. And third, it is more affordable than many existing solutions. We have prototyped tangible extended reality in several applications, including medical helicopter preflight walking around checkups, firefighter search and rescue training, and tool tracking for airway intubation training with haptic interaction with a physical mannequin.

Digital Library: EI

Published Online: January 2023

LiDAR-Camera Fusion for 3D Object Detection

198 65

Object detection
Sensor Fusion
Robotics
Computer Vision
LiDAR
camera
3d

Darshan Bhanushali, Robert Relyea, Karan Manghi, Abhishek Vashist, Clark Hochgraf, Amlan Ganguly, Andres Kwasinski, Michael E. Kuhl, Raymond Ptucha

Pages 257-1 - 257-9, January 2020, © Society for Imaging Science and Technology 2020

DOI

10.2352/ISSN.2470-1173.2020.16.AVM-257

Volume 32

Issue 16

The performance of autonomous agents in both commercial and consumer applications increases along with their situational awareness. Tasks such as obstacle avoidance, agent to agent interaction, and path planning are directly dependent upon their ability to convert sensor readings into scene understanding. Central to this is the ability to detect and recognize objects. Many object detection methodologies operate on a single modality such as vision or LiDAR. Camera-based object detection models benefit from an abundance of feature-rich information for classifying different types of objects. LiDAR-based object detection models use sparse point clouds, where each point contains accurate 3D position of object surfaces. Camera-based methods lack accurate object to lens distance measurements, while LiDAR-based methods lack dense feature-rich details. By utilizing information from both camera and LiDAR sensors, advanced object detection and identification is possible. In this work, we introduce a deep learning framework for fusing these modalities and produce a robust real-time 3D bounding box object detection network. We demonstrate qualitative and quantitative analysis of the proposed fusion model on the popular KITTI dataset.

Digital Library: EI

Published Online: January 2020

Conference Overview and Papers Program

22 1

Intelligent Robots
Industrial Inspection
Computer Vision
Sensing and Imaging Techniques
Sensor Fusion

Pages A06-1 - A06-5, January 2020, © Society for Imaging Science and Technology 2020

DOI

10.2352/ISSN.2470-1173.2020.6.IRIACV-A06

Volume 32

Issue 6

Digital Library: EI

Published Online: January 2020

Improving Multimodal Localization Through Self-Supervision

156 7

Robotics
Deep Learning
Self-supervision
LiDAR
Sensor Fusion
Localization

Robert Relyea, Darshan Bhanushali, Karan Manghi, Abhishek Vashist, Clark Hochgraf, Amlan Ganguly, Andres Kwasinski, Michael E. Kuhl, Raymond Ptucha

Pages 14-1 - 14-8, January 2020, © Society for Imaging Science and Technology 2020

DOI

10.2352/ISSN.2470-1173.2020.6.IRIACV-014

Volume 32

Issue 6

Modern warehouses utilize fleets of robots for inventory management. To ensure efficient and safe operation, real-time localization of each agent is essential. Most robots follow metal tracks buried in the floor and use a grid of precisely mounted RFID tags for localization. As robotic agents in warehouses and manufacturing plants become ubiquitous, it would be advantageous to eliminate the need for these metal wires and RFID tags. Not only do they suffer from significant installation costs, the removal of wires would allow agents to travel to any area inside the building. Sensors including cameras and LiDAR have provided meaningful localization information for many different positioning system implementations. Fusing localization features from multiple sensor sources is a challenging task especially when the target localization task’s dataset is small. We propose a deep-learning based localization system which fuses features from an omnidirectional camera image and a 3D LiDAR point cloud to create a robust robot positioning model. Although the usage of vision and LiDAR eliminate the need for the precisely installed RFID tags, they do require the collection and annotation of ground truth training data. Deep neural networks thrive on lots of supervised data, and the collection of this data can be time consuming. Using a dataset collected in a warehouse environment, we evaluate the performance of two individual sensor models for localization accuracy. To minimize the need for extensive ground truth data collection, we introduce a self-supervised pretraining regimen to populate the image feature extraction network with meaningful weights before training on the target localization task with limited data. In this research, we demonstrate how our self-supervision improves accuracy and convergence of localization models without the need for additional sample annotation.

Digital Library: EI

Published Online: January 2020

Conference Overview and Papers Program

25 1

Intelligent Robots
Industrial Inspection
Computer Vision
Sensing and Imaging Techniques
Sensor Fusion

DOI

10.2352/ISSN.2470-1173.2019.7.IRIACV-A07

Volume 31

Issue 7

AVM Conference Overview and Papers Program

Digital Library: EI

Published Online: January 2019

Front Matter

32 2

Machine Vision
Image Understanding
Automated Machines
Human-machine Interfaces
Perception and Analytics
Localization Mapping & Navigation
Image Processing
Image and Vision Processors
Sensing Technology
Sensor Fusion

DOI

10.2352/ISSN.2470-1173.2018.17.AVM-567

Volume 30

Issue 17

Digital Library: EI

Published Online: January 2018

Multi-sensor fusion for Automated Driving: Selecting model and optimizing on Embedded platform

108 9

Sensor Fusion
Kalman Filter
Extended Kalman Filter
Unscented Kalman Filter
Optimization
DSP

Shyam Jagannathan, Mihir Mody, Jason Jones, Pramod Swami, Deepak Poddar

DOI

10.2352/ISSN.2470-1173.2018.17.AVM-256

Volume 30

Issue 17