Spinal CT image segmentation is actively researched in the field of medical image processing. However, due to factors such as high variability among spinal CT slices and image artifacts, and so on, automatic and accurate spinal CT segmentation tasks are extremely challenging. To address these issues, we propose a cascaded U-shaped framework that combines multi-scale features and attention mechanisms (MA-WNet) for the automatic segmentation of spinal CT images. Specifically, our framework combines two U-shaped networks to achieve coarse and fine segmentation separately for spinal CT images. Within each U-shaped network, we add multi-scale feature extraction modules during both the encoding and decoding phases to address variations in spine shape across different slices. Additionally, various attention mechanisms are embedded to mitigate the effects of image artifacts and irrelevant information on segmentation outcomes. Experimental results show that our proposed method achieves average segmentation Dice similarity coefficients of 94.53% and 91.38% on the CSI 2014 and VerSe 2020 datasets, respectively, indicating highly accurate segmentation performance, which is valuable for potential clinical applications.
Sparse representation is the key part of shape registration, compression, and regeneration. Most existing models generate sparse representation by detecting salient points directly from input point clouds, but they are susceptible to noise, deformations, and outliers. The authors propose a novel alternative solution that combines global distribution probabilities and local contextual features to learn semantic structural consistency and adaptively generate sparse structural representation for arbitrary 3D point clouds. First, they construct a 3D variational auto-encoder network to learn an optimal latent space aligned with multiple anisotropic Gaussian mixture models (GMMs). Then, they combine GMM parameters with contextual properties to construct enhanced point features that effectively resist noise and geometric deformations, better revealing underlying semantic structural consistency. Second, they design a weight scoring unit that computes a contribution matrix to the semantic structure and adaptively generates sparse structural points. Finally, the authors enforce semantic correspondence and structural consistency to ensure that the generated structural points have stronger discriminative ability in both feature and distribution domains. Extensive experiments on shape benchmarks have shown that the proposed network outperforms state-of-the-art methods, with lower costs and more significant performance in shape segmentation and classification.
Accurate traffic flow forecasting plays a crucial role in alleviating road congestion and optimizing traffic management. Although numerous effective models have been proposed in existing research to predict future traffic flow, most models exhibit certain limitations in modeling spatiotemporal dependencies, especially in capturing multiscale spatiotemporal relationships. To address this, we propose a novel model called Spatiotemporal Augmented Interactive Learning and Temporal Attention (STAIL-TA) for traffic flow prediction, which is designed for dynamic and interactive adaptive modeling of spatiotemporal features in traffic flow data. Specifically, we first design a feature augmentation layer that enhances the interaction of time-based features. Next, we introduce an interactive dynamic graph convolutional network, which uses an interactive learning strategy to simultaneously capture spatiotemporal characteristics of traffic data. Additionally, a new dynamic graph generation method is employed to design a dynamic graph convolutional block, which is capable of capturing the spatial correlations that change dynamically within the traffic network. Finally, we construct a novel temporal attention mechanism that effectively leverages local contextual information and is specifically designed for transforming numerical sequence representations. This enables the prediction model to capture the dynamic temporal dependencies of traffic flow better, thus facilitating long-term forecasting. The experimental results show that the STAIL-TA model improves the mean absolute error and root mean squared error on the PEMS-BAY dataset by 7.75%, 3.68% and 5.59%, 2.72% in the 15-minute and 30-minute predictions, respectively, when compared to the existing optimal baseline method, MRA-BGCN.
In modern life, with the explosive growth of video, images, and other data, the use of computers to automatically and efficiently classify and analyze human actions has become increasingly important. Action recognition, a problem of perceiving and understanding the behavioral state of objects in a dynamic scene, is a fundamental yet key task in the computer field. However, analyzing a video with multiple objects or a video with irregular shooting angles poses a significant challenge for existing action recognition algorithms. To address these problems, the authors propose a novel deep-learning-based method called SlowFast-Convolutional Block Attention Module (SlowFast-CBAM). Specifically, the training dataset is preprocessed using the YOLOX network, where individual frames of action videos are separately placed in slow and fast pathways. Then, CBAM is incorporated into both the slow and fast pathways to highlight features and dynamics in the surrounding environment. Subsequently, the authors establish a relationship between the convolutional attention mechanism and the SlowFast network, allowing them to focus on distinguishing features of objects and behaviors appearing before and after different actions, thereby enabling action detection and performer recognition. Experimental results demonstrate that this approach better emphasizes the features of action performers, leading to more accurate action labeling and improved action recognition accuracy.
Purpose: Gliomas, particularly brain tumors, pose significant challenges due to their complex pathology and life-threatening potential. The goal of this study is to introduce LU-net, a novel semantic segmentation algorithm designed to enhance the diagnosis and treatment planning of gliomas. This research seeks to address the limitations of traditional classification and detection methods by improving the accuracy and robustness of tumor boundary delineation in medical images. Methods: LU-net employs a multiscale image pyramid along with a Bayesian-inference-based multiscale probability search to capture complex tumor features. The algorithm is further strengthened by integrating a Conditional Random Field model, enabling more precise segmentation. The performance of LU-net is evaluated against existing segmentation algorithms using standard metrics such as accuracy, Intersection over Union (IoU), and Dice score. Results: The experimental results demonstrate that LU-net outperforms current segmentation algorithms in terms of both accuracy and robustness. Specifically, LU-net achieves an accuracy of 0.9953, an IoU of 0.667, and a Dice score of 0.566, effectively addressing the pathological heterogeneity and invasiveness of gliomas. These results highlight LU-net’s superior ability to delineate tumor boundaries and improve diagnostic accuracy. Conclusion: LU-net sets a new benchmark in glioma lesion detection, offering a more effective approach for brain tumor segmentation. By improving the accuracy, reliability, and interpretability of brain tumor boundary delineation, LU-net enhances diagnostic and treatment strategies, providing significant benefits to patients, clinicians, and healthcare providers. Overall, this work marks a significant contribution to the field of medical imaging and glioma diagnosis.
The progressive fusion algorithm enhances image boundary smoothness, preserves details, and improves visual harmony. However, issues with multi-scale fusion and improper color space conversion can lead to blurred details and color distortion, which do not meet modern image processing standards for high-quality output. Therefore, a progressive fusion image transparency-guided enhancement algorithm based on generative adversarial learning is proposed. The method combines wavelet transform with gradient field fusion to enhance image details, preserve spectral features, and generate high-resolution true-color fused images. It extracts the image mean, standard deviation, and smoothness features, and uses these along with the original image input to generate an adversarial network. The optimization design introduces global context, transparency mask prediction, and a dual-discriminator structure to enhance the transparency of progressively fused images. The experimental results showed that using the designed method, the information entropy was 7.638, the blind image quality index was 24.331, the natural image quality evaluator value was 3.611, and the processing time was 0.036 s. The overall evaluation indices were excellent, effectively restoring image detail information and spatial color while avoiding artifacts. The processed images exhibited high quality with complete detail preservation.
This paper presents a self-supervised monocular visual odometry (VO) method guided by attention feature maps, aimed at effectively mitigating the impact of redundant pixels during network training. Existing self-supervised VO methods typically treat all pixels equally when computing photometric error, which can lead to increased sensitivity to noise from irrelevant pixels and, consequently, training errors. To address this issue, the authors adopt a soft-attention mechanism to generate attention feature maps that allow the model to focus on more relevant pixels while downweighting the influence of disruptive ones. This approach enhances the robustness and accuracy of depth estimation and pose tracking. The proposed method achieves competitive results on the KITTI dataset, with Sequences 09 and 10 demonstrating relative rotation errors of 0.022 and 0.032 and relative translation errors of 5.56 and 7.29, respectively.
For the automated analysis of metaphase chromosome images, the chromosomes on the images need to be segmented first. However, the segmentation results often contain several non-chromosome objects. Elimination of non-chromosome objects is essential in automated chromosome image analysis. This study aims to exclude non-chromosome objects from segmented chromosome candidates for further analysis. A feature-based method was developed to eliminate non-chromosome objects from metaphase chromosome images. In a metaphase chromosome image, the chromosome candidates were segmented by a threshold first. After segmenting the chromosome candidates, four classes of features, namely, area, density-based features, roughness-based features, and widths, of the segmented candidates were extracted to discriminate between chromosomes and non-chromosome objects. Seven classifiers were used and compared to combine the extracted features to perform classifications. The experimental results show the usefulness of the combination of extracted features in distinguishing between chromosomes and non-chromosome objects. The proposed method can effectively separate non-chromosome objects from chromosomes and could be used as the preprocessing procedure for chromosome image analysis.
Recent advancements in artificial intelligence have significantly impacted many color imaging applications, including skin segmentation and enhancement. Although state-of-the-art methods emphasize geometric image augmentations to improve model performance, the role of color-based augmentations and color spaces in enhancing skin segmentation accuracy remains underexplored. This study addresses this gap by systematically evaluating the impact of various color-based image augmentations and color spaces on skin segmentation models based on convolutional neural networks (CNNs). We investigate the effects of color transformations—including brightness, contrast, and saturation adjustments—in three color spaces: sRGB, YCbCr, and CIELab. To represent CNN models, an existing semantic segmentation model is trained using these color augmentations on a custom dataset of 900 images with annotated skin masks, covering diverse skin tones and lighting conditions. Our findings reveal that current training practices, which primarily rely on single-color augmentation in the sRGB space and focus mainly on geometric augmentations, limit model generalization in color-related applications like skin segmentation. Models trained with a greater variety of color augmentations show improved skin segmentation, particularly under over- and underexposure conditions. Additionally, models trained in YCbCr outperform those trained in sRGB color space when combined with color augmentation while CIELab leads to comparable performance to sRGB. We also observe significant performance discrepancies across skin tones, highlighting challenges in achieving consistent segmentation under varying lighting. This study highlights gaps in existing image augmentation approaches and provides insights into the role of various color augmentations and color spaces in improving the accuracy and inclusivity of skin segmentation models.
Seal-related tasks in document processing—such as seal segmentation, authenticity verification, seal removal, and text recognition under seals—hold substantial commercial importance. However, progress in these areas has been hindered by the scarcity of labeled document seal datasets, which are essential for supervised learning. To address this limitation, we propose Seal2Real, a novel generative framework designed to synthesize large-scale labeled document seal data. As part of this work, we also present Seal-DB, a comprehensive dataset containing 20,000 labeled images to support seal-related research. Seal2Real introduces a prompt prior learning architecture built upon a pretrained Stable Diffusion model, effectively transferring its generative capability to the unsupervised domain of seal image synthesis. By producing highly realistic synthetic seal images, Seal2Real significantly enhances the performance of downstream seal-related tasks on real-world data. Experimental evaluations on the Seal-DB dataset demonstrate the effectiveness and practical value of the proposed framework.