Deep neural networks (DNNs) have heavily relied on traditional computational units, such as CPUs and GPUs. However, this conventional approach brings significant computational burden, latency issues, and high power consumption, limiting their effectiveness. This has sparked the need for lightweight networks such as ExtremeC3Net. Meanwhile, there have been notable advancements in optical computational units, particularly with metamaterials, offering the exciting prospect of energy-efficient neural networks operating at the speed of light. Yet, the digital design of metamaterial neural networks (MNNs) faces precision, noise, and bandwidth challenges, limiting their application to intuitive tasks and low-resolution images. In this study, we proposed a large kernel lightweight segmentation model, ExtremeMETA. Based on ExtremeC3Net, our proposed model, ExtremeMETA maximized the ability of the first convolution layer by exploring a larger convolution kernel and multiple processing paths. With the large kernel convolution model, we extended the optic neural network application boundary to the segmentation task. To further lighten the computation burden of the digital processing part, a set of model compression methods was applied to improve model efficiency in the inference stage. The experimental results on three publicly available datasets demonstrated that the optimized efficient design improved segmentation performance from 92.45 to 95.97 on mIoU while reducing computational FLOPs from 461.07 MMacs to 166.03 MMacs. The large kernel lightweight model ExtremeMETA showcased the hybrid design’s ability on complex tasks.
This study applies YOLOv7-tiny object detection to inspect guava covering and count their quantity. Real-time monitoring enhances efficiency and reduces labor costs in agriculture. A custom dataset was created by collecting and labeling guava images. The YOLOv7-tiny model, trained with default parameters, achieved an initial mean Average Precision (mAP) of 66.7%. To improve accuracy, parameter adjustments, data augmentation (mosaic, mixup), and learning rate strategies (warm-up, decay) were employed, raising the mAP to 76.7%. The optimized model was transferred to mobile devices for convenient detection. This research provides an effective method for guava covering inspection and quantity counting, contributing to advancements in agricultural applications.
The detection of urban appearance violation in unmanned aerial vehicle imagery faces several challenges. To address this problem, an optimized YOLOv8n-based urban appearance violation detection model is proposed. A custom dataset including four classes is created owing to the lack of a sufficient dataset. The Convolutional Block Attention Module attention mechanism is applied to improve the feature extraction ability of the model. A small target detection head is added to capture the characteristics of small targets and context information more effectively. The loss function Wise Intersection over Union is applied to improve the regression performance of the bounding box and the robustness of detection. Experimental results show that compared with the YOLOv8n model, the Precision, Recall, mAP0.5, and mAP0.5−0.95 of the optimized method increase by 3.8%, 2.1%, 3.3%, and 4.8%, respectively. Besides, an intelligent urban appearance violation detection system is developed, which generates and delivers warning messages via the WeChat official account platform.
Physicians attempt to detect different colonic polyps at the same time during endoscopy inspection. A deep-learning-based object detection method is proposed to aim at the problem of simultaneous detection of different colonic polyps. This study used a single-shot detector (SSD) with a Resnet50 backbone, called the SSD-Resnet50 model, to detect two types of colonic polyps, which are adenomas and hyperplastic polyps, in endoscopic images. The Taguchi method was used to optimize algorithm hyperparameter combinations for the SSD-Resnet50 model to promote the detection accuracy of colonic polyps. The SSD-Resnet50 model along with its optimized algorithm hyperparameters was employed for simultaneous detection of two types of colonic polyps. The experimental findings revealed that the SSD-Resnet50 model achieved an average mAP of 0.8933 on a test set comprising 300 × 300 × 3 images of colonic polyps. Notably, the detection accuracy attained with the SSD-Resnet50 model and its optimized algorithm hyperparameters, derived from the Taguchi method, surpassed that of the SSD-Resnet50 model and its algorithm hyperparameter combination obtained from the Matlab example. Additionally, the SSD-Resnet50 model achieved higher detection accuracy compared to the SSD-MobileNetV2, SSD-InceptionV3, SSD-Shufflenet, SSD-Squeezenet, and SSD-VGG16 models. The proposed SSD-Resnet50 model with its optimized algorithm hyperparameters had higher accuracy in detecting the adenomas and hyperplastic polyps in endoscopic images at the same time.
This paper presents a self-supervised monocular visual odometry (VO) method guided by attention feature maps, aimed at effectively mitigating the impact of redundant pixels during network training. Existing self-supervised VO methods typically treat all pixels equally when computing photometric error, which can lead to increased sensitivity to noise from irrelevant pixels and, consequently, training errors. To address this issue, the authors adopt a soft-attention mechanism to generate attention feature maps that allow the model to focus on more relevant pixels while downweighting the influence of disruptive ones. This approach enhances the robustness and accuracy of depth estimation and pose tracking. The proposed method achieves competitive results on the KITTI dataset, with Sequences 09 and 10 demonstrating relative rotation errors of 0.022 and 0.032 and relative translation errors of 5.56 and 7.29, respectively.
For the automated analysis of metaphase chromosome images, the chromosomes on the images need to be segmented first. However, the segmentation results often contain several non-chromosome objects. Elimination of non-chromosome objects is essential in automated chromosome image analysis. This study aims to exclude non-chromosome objects from segmented chromosome candidates for further analysis. A feature-based method was developed to eliminate non-chromosome objects from metaphase chromosome images. In a metaphase chromosome image, the chromosome candidates were segmented by a threshold first. After segmenting the chromosome candidates, four classes of features, namely, area, density-based features, roughness-based features, and widths, of the segmented candidates were extracted to discriminate between chromosomes and non-chromosome objects. Seven classifiers were used and compared to combine the extracted features to perform classifications. The experimental results show the usefulness of the combination of extracted features in distinguishing between chromosomes and non-chromosome objects. The proposed method can effectively separate non-chromosome objects from chromosomes and could be used as the preprocessing procedure for chromosome image analysis.
Recent advancements in artificial intelligence have significantly impacted many color imaging applications, including skin segmentation and enhancement. Although state-of-the-art methods emphasize geometric image augmentations to improve model performance, the role of color-based augmentations and color spaces in enhancing skin segmentation accuracy remains underexplored. This study addresses this gap by systematically evaluating the impact of various color-based image augmentations and color spaces on skin segmentation models based on convolutional neural networks (CNNs). We investigate the effects of color transformations—including brightness, contrast, and saturation adjustments—in three color spaces: sRGB, YCbCr, and CIELab. To represent CNN models, an existing semantic segmentation model is trained using these color augmentations on a custom dataset of 900 images with annotated skin masks, covering diverse skin tones and lighting conditions. Our findings reveal that current training practices, which primarily rely on single-color augmentation in the sRGB space and focus mainly on geometric augmentations, limit model generalization in color-related applications like skin segmentation. Models trained with a greater variety of color augmentations show improved skin segmentation, particularly under over- and underexposure conditions. Additionally, models trained in YCbCr outperform those trained in sRGB color space when combined with color augmentation while CIELab leads to comparable performance to sRGB. We also observe significant performance discrepancies across skin tones, highlighting challenges in achieving consistent segmentation under varying lighting. This study highlights gaps in existing image augmentation approaches and provides insights into the role of various color augmentations and color spaces in improving the accuracy and inclusivity of skin segmentation models.
Seal-related tasks in document processing—such as seal segmentation, authenticity verification, seal removal, and text recognition under seals—hold substantial commercial importance. However, progress in these areas has been hindered by the scarcity of labeled document seal datasets, which are essential for supervised learning. To address this limitation, we propose Seal2Real, a novel generative framework designed to synthesize large-scale labeled document seal data. As part of this work, we also present Seal-DB, a comprehensive dataset containing 20,000 labeled images to support seal-related research. Seal2Real introduces a prompt prior learning architecture built upon a pretrained Stable Diffusion model, effectively transferring its generative capability to the unsupervised domain of seal image synthesis. By producing highly realistic synthetic seal images, Seal2Real significantly enhances the performance of downstream seal-related tasks on real-world data. Experimental evaluations on the Seal-DB dataset demonstrate the effectiveness and practical value of the proposed framework.
Cerebral hemorrhage is a common cerebrovascular disease, and magnetic induction tomography (MIT), a new electromagnetic imaging technique is used as a good auxiliary method for preoperative and postoperative bedside monitoring, and supplement computed tomography (CT) and magnetic resonance imaging (MRI) studies. In practical applications, space constraints limit the number of coils set up by the MIT system, which in turn limits the imaging information obtained from the detection coils. In order to reduce the number of coils while ensuring imaging accuracy and explore the relationship between the relative positions of coil arrays and lesions so as to obtain better reconstruction results, a regional detection method of MIT for cerebral hemorrhage based on the stacked autoencoder (SAE) neural network algorithm is proposed. Based on the complex brain model, simulation experiments were carried out to divide into different regions according to the position of the coils and compare the reconstruction effect of the lesions. Results showed that the reconstruction effect of the lesions near the excitation coil was poor, and the reconstruction effect of the lesions near the detection coil was better. Phantom experiments were carried out to further verify the results. The sensitive region detection approach provides a novel idea for MIT optimization in detecting cerebral hemorrhage.
The tomato leaf is a significant organ that reflects the health and growth of tomato plants. Early detection of leaf diseases is crucial to both crop yield and the income of farmers. However, the global distribution of diseases across tomato leaves coupled with fine-grained differences among various diseases poses significant challenges for accurate disease detection. To tackle these obstacles, we propose an accurate tomato leaf disease identification method based on an improved Swin Transformer. The proposed method consists of three parts: the Swin Transformer backbone, a Local Feature Perception (LFP) module, and a Spatial Texture Attention (STA) module. The backbone can model long-range dependencies of leaf diseases for representative features while the LFP module adopts a multi-scale aggregation strategy to enhance the capability of the Swin Transformer in local feature extraction. Moreover, the STA module integrates hierarchical features from different stages of the Swin Transformer to capture fine-grained features for the classification head and boost overall performance. Extensive experiments are conducted on the public LBFtomato dataset, and the results demonstrate the superior performance of our proposed method. Our proposed model achieves the scores of 99.28% in Accuracy, 99.07% in Precision, 99.36% in Recall, and 99.24% in F1-score metrics.