
Human gestures in the real-world are complex, ranging from sign language, to full body motion, to extremely dynamic poses such as crawling and dancing. This study examines a set of multimodal sensory fusion methods to support the real-time operation and training without the need for wearable equipment. We articulate the gesture tracking sensor modality based on the gesture tracking types, accuracy, detection latency, distance, key point requirements, and accuracy with LiDAR, webcam and inertial measurement unit (IMU) for complex gesture recognition. We applied the methodology to applications of gait detection and tracking in a high altitude, sign language detection, and background noise removal in a crewed space. Our experiments show that the usability of the multimodal interfaces can be tested in a simulated environment and measured with instruments objectively.

In this paper, we present a novel Lidar imaging system for heads-up display. The imaging system consists of the onedimensional laser distance sensor and IMU sensors, including an accelerometer and gyroscope. By fusing the sensory data when the user moves their head, it creates a three-dimensional point cloud for mapping the space around. Compared to prevailing 2D and 3D Lidar imaging systems, the proposed system has no moving parts; it’s simple, light-weight, and affordable. Our tests show that the horizontal and vertical profile accuracy of the points versus the floor plan is 3 cm on average. For the bump detection the minimal detectable step height is 2.5 cm. The system can be applied to first responses such as firefighting, and to detect bumps on pavement for lowvision pedestrians.