Many extended reality systems use controllers, e.g. near-infrared motion trackers or magnetic coil-based hand-tracking devices for users to interact with virtual objects. These interfaces lack tangible sensation, especially during walking, running, crawling, and manipulating an object. Special devices such as the Tesla suit and omnidirectional treadmills can improve tangible interaction. However, they are not flexible for broader applications, builky, and expensive. In this study, we developed a configurable multi-modal sensor fusion interface for extended reality applications. The system includes wearable IMU motion sensors, gait classification, gesture tracking, and data streaming interfaces to AR/VR systems. This system has several advantages: First, it is reconfigurable for multiple dynamic tangible interactions such as walking, running, crawling, and operating with an actual physical object without any controllers. Second, it fuses multi-modal sensor data from the IMU and sensors on the AR/VR headset such as floor detection. And third, it is more affordable than many existing solutions. We have prototyped tangible extended reality in several applications, including medical helicopter preflight walking around checkups, firefighter search and rescue training, and tool tracking for airway intubation training with haptic interaction with a physical mannequin.
The performance of autonomous agents in both commercial and consumer applications increases along with their situational awareness. Tasks such as obstacle avoidance, agent to agent interaction, and path planning are directly dependent upon their ability to convert sensor readings into scene understanding. Central to this is the ability to detect and recognize objects. Many object detection methodologies operate on a single modality such as vision or LiDAR. Camera-based object detection models benefit from an abundance of feature-rich information for classifying different types of objects. LiDAR-based object detection models use sparse point clouds, where each point contains accurate 3D position of object surfaces. Camera-based methods lack accurate object to lens distance measurements, while LiDAR-based methods lack dense feature-rich details. By utilizing information from both camera and LiDAR sensors, advanced object detection and identification is possible. In this work, we introduce a deep learning framework for fusing these modalities and produce a robust real-time 3D bounding box object detection network. We demonstrate qualitative and quantitative analysis of the proposed fusion model on the popular KITTI dataset.
Modern warehouses utilize fleets of robots for inventory management. To ensure efficient and safe operation, real-time localization of each agent is essential. Most robots follow metal tracks buried in the floor and use a grid of precisely mounted RFID tags for localization. As robotic agents in warehouses and manufacturing plants become ubiquitous, it would be advantageous to eliminate the need for these metal wires and RFID tags. Not only do they suffer from significant installation costs, the removal of wires would allow agents to travel to any area inside the building. Sensors including cameras and LiDAR have provided meaningful localization information for many different positioning system implementations. Fusing localization features from multiple sensor sources is a challenging task especially when the target localization task’s dataset is small. We propose a deep-learning based localization system which fuses features from an omnidirectional camera image and a 3D LiDAR point cloud to create a robust robot positioning model. Although the usage of vision and LiDAR eliminate the need for the precisely installed RFID tags, they do require the collection and annotation of ground truth training data. Deep neural networks thrive on lots of supervised data, and the collection of this data can be time consuming. Using a dataset collected in a warehouse environment, we evaluate the performance of two individual sensor models for localization accuracy. To minimize the need for extensive ground truth data collection, we introduce a self-supervised pretraining regimen to populate the image feature extraction network with meaningful weights before training on the target localization task with limited data. In this research, we demonstrate how our self-supervision improves accuracy and convergence of localization models without the need for additional sample annotation.
Automated Driving requires fusing information from multitude of sensors such as cameras, radars, lidars mounted around car to handle various driving scenarios e.g. highway, parking, urban driving and traffic jam. Fusion also enables better functional safety by handling challenging scenarios such as weather conditions, time of day, occlusion etc. The paper gives an overview of the popular fusion techniques namely Kalman filters and its variation e.g. Extended Kalman filters and Unscented Kalman filters. The paper proposes choice of fusing techniques for given sensor configuration and its model parameters. The second part of paper focuses on efficient solution for series production using embedded platform using Texas Instrument's TDAx Automotive SoC. The performance is benchmarked separately for "predict" and "update" phases on for different sensor modalities. For typical L3/L4 automated driving consisting of multiple cameras, radars and lidars, fusion can supported in real time by single DSP using proposed techniques enabling cost optimized solution.