This survey provides a comprehensive overview of LiDAR-based panoptic segmentation methods for autonomous driving. We motivate the importance of panoptic segmentation in autonomous vehicle perception, emphasizing its advantages over traditional 3D object detection in capturing a more detailed and comprehensive understanding of the environment. We summarize and categorize 42 panoptic segmentation methods based on their architectural approaches, with a focus on the kind of clustering utilized: machine learned or non-learned heuristic clustering. We discuss direct methods, most of which use single-stage architectures to predict binary masks for each instance, and clustering-based methods, most of which predict offsets to object centers for efficient clustering. We also highlight relevant datasets, evaluation metrics, and compile performance results on SemanticKITTI and panoptic nuScenes benchmarks. Our analysis reveals trends in the field, including the effectiveness of attention mechanisms, the competitiveness of center-based approaches, and the benefits of sensor fusion. This survey aims to guide practitioners in selecting suitable architectures and to inspire researchers in identifying promising directions for future work in LiDAR-based panoptic segmentation for autonomous driving.
Can artists be recognized from the way they render certain materials, such as fabric, skin, or hair? In this paper, we study this problem with a focus on recognizing works by Rembrandt, Van Dyck, and other Dutch and Flemish artists from the same era. This paper proposes a novel material-based approach based on Swin Transformer and Cascade Mask R-CNN to address artist recognition task. We report the performance on a dataset of 644 images. Additionally, the models robustness to image variations is studied.
This paper presents AInBody, a novel deep learning-based body shape measurement solution. We have devised a user-centered design that automatically tracks the progress of the body by adequately integrating various methods, including human parsing, instance segmentation, and image matting. Our system guides a user's pose when taking photos by displaying the outline of the latest picture of the user, divides the human body into several parts, and compares before and after photos of the body part level. The parsing performance has been improved through an ensemble approach and a denoising phase in our main module, Advanced Human Parser. In evaluation, the proposed method is 0.1% to 4.8% better than the other best-performing model in average precision in 3 out of 5 parts, and 1.4% and 2.4% superior in mAP and mean IoU, respectively. Furthermore, the inference time of our framework takes approximately three seconds to process one HD image, demonstrating that our structure can be applied to real-time applications.