Crop diseases have always been a major threat to agricultural production, significantly reducing both yield and quality of agricultural products. Traditional methods for disease recognition suffer from high costs and low efficiency, making them inadequate for modern agricultural requirements. With the continuous development of artificial intelligence technology, utilizing deep learning for crop disease image recognition has become a research hotspot. Convolutional neural networks can automatically extract features for end-to-end learning, resulting in better recognition performance. However, they also face challenges such as high computational costs and difficulties in deployment on mobile devices. In this study, we aim to improve the recognition accuracy of models, reduce computational costs, and scale down for deployment on mobile platforms. Specifically targeting the recognition of tomato leaf diseases, we propose an innovative image recognition method based on a lightweight MCA-MobileNet and WGAN. By incorporating an improved multiscale feature fusion module and coordinate attention mechanism into MobileNetV2, we developed the lightweight MCA-MobileNet model. This model focuses more on disease spot information in tomato leaves while significantly reducing the model’s parameter count. We employ WGAN for data augmentation to address issues such as insufficient and imbalanced original sample data. Experimental results demonstrate that using the augmented dataset effectively improves the model’s recognition accuracy and enhances its robustness. Compared to traditional networks, MCA-MobileNet shows significant improvements in parameters such as accuracy, precision, recall, and F1-score. With a training parameter count of only 2.75M, it exhibits outstanding performance in recognizing tomato leaf diseases and can be widely applied in mobile or embedded devices.
The rapid evolution of modern society has triggered a surge in the production of diverse waste in daily life. Effective implementation of waste classification through intelligent methods is essential for promoting green and sustainable development. Traditional waste classification techniques suffer from inefficiencies and limited accuracy. To address these challenges, this study proposed a waste image classification model based on DenseNet-121 by adding an attention module. To enhance the efficiency and accuracy of waste classification techniques, publicly available waste datasets, TrashNet and Garbage classification, were utilized for their comprehensive coverage and balanced distribution of waste categories. 80% of the dataset was allocated for training, and the remaining 20% for testing. Within the architecture of DenseNet-121, an enhanced attention module, series-parallel attention module (SPAM), was integrated, building upon convolutional block attention module (CBAM), resulting in a new network model called dense series-parallel attention neural network (DSPA-Net). DSPA-Net was trained and evaluated alongside other CNN models on TrashNet and Garbage classification. DSPA-Net demonstrated superior performance and achieved accuracies of 90.2% and 92.5% on TrashNet and Garbage classification, respectively, surpassing DenseNet-121 and alternative image classification algorithms. These findings underscore the potential for executing efficient and accurate intelligent waste classification.
Accurate and precise classification/quantification of skin pigmentation is critical to address health inequities such as for example racial bias in pulse oximetry. Current skintone classification methods rely on measuring or estimating the color. These methods include a measurement device or subjective matching with skintone color scales. Robust detection of skin type and melanin index is challenging, as these methods require precise calibration. And recently acquired sun exposure may affect the measurements due to tanning or erythema. The proposed system differentiates and quantifies skin type and melanin index by exploiting the variance in skin structures and skin pigmentation network across skin types. Our result with a small study shows skin structure patterns are a robust, color independent method for skin tone classification. A real-time system demo shows the practical viability of the method.
The aim of this work is to transfer the model trained on magnetic resonance images of human autosomal dominant polycystic kidney disease (ADPKD) to rat and mouse ADPKD models. A dataset of 756 MRI images of ADPKD kidneys was employed to train a modified UNet3+ architecture, which incorporated residual layers, switchable normalization, and concatenated skip connections for kidney and cyst segmentation tasks. The trained model was then subjected to transfer learning (TL) using data from two commonly utilized animal PKD models: the Pkdh1pck (PCK) rat and the Pkd1RC∕RC (RC) mouse. Transfer learning achieved Dice similarity coefficients of 0.93±0.04 and 0.63±0.16 (mean±SD) for a sample combination of PCK+RC kidneys and cysts, respectively, on the test datasets of animal images. We showcased the utilization of TL in situations involving constrained source and target datasets and have achieved good accuracy in the cases of class imbalance.
In response to challenges such as large parameter count, difficulty in deployment, low accuracy, and slow speed of facial state recognition models in driver fatigue detection, the authors propose a lightweight real-time facial state recognition model called YOLOv5-fatigue based on YOLOv5n. First, a bilateral convolution is proposed, which can fully utilize the feature information in the channel. Then an innovative deep lightweight module is proposed, which reduces the number of network parameters as well as the computational effort by replacing the ordinary convolution in the neck network. Lastly, the normalization-based attention module is added to solve the problem of accuracy decline caused by lightweight models while keeping the number of parameters unchanged. In this paper, we first recognize the facial state by YOLOv5-fatigue and then use the proportion of eyes closed per unit of time and the proportion of mouth closed per unit of time to determine fatigue. In comparison experiments conducted on our self-built VIGP-fatigue dataset with other detection algorithms, our proposed method achieved an increase of 1% in AP50 compared to the baseline model YOLOv5n, reaching 92.6%. The inference time was reduced by 9% to 2.1 ms, and the parameter count decreased by 42.6% to 1.01 M.
In response to the current challenges in the detection of solder ball defects in ball grid array (BGA) packaged chips, which include slow detection speed, low efficiency, and poor accuracy, our research has addressed these issues. We have designed an algorithm for detecting solder ball defects in BGA-packaged chips by leveraging the specific characteristics of these defects and harnessing the advantages of deep learning. Building upon the YOLOv8 network model, we have made adaptive improvements to enhance the algorithm. First, we have introduced an adaptive weighted downsampling method to boost detection accuracy and make the model more lightweight. Second, to improve the extraction of image features, we have proposed an efficient multi-scale convolution method. Finally, to enhance convergence speed and regression accuracy, we have replaced the traditional Complete Intersection over Union loss function with Minimum Points Distance Intersection over Union (MPDIoU). Through a series of controlled experiments, our enhanced model has shown significant improvements when compared to the original network. Specifically, we have achieved a 1.7% increase in mean average precision, a 1.5% boost in precision, a 0.9% increase in recall, a reduction of 4.3 M parameters, and a decrease of 0.4 G floating-point operations per second. In comparative experiments, our algorithm has demonstrated superior overall performance when compared to other networks, thereby effectively achieving the goal of solder ball defect detection.
Archeological textiles can provide invaluable insight into the past. However, they are often highly fragmented, and a puzzle has to be solved to re-assemble the object and recover the original motifs. Unlike common jigsaw puzzles, archeological fragments are highly damaged, and no correct solution to the puzzle is known. Although automatic puzzle solving has fascinated computer scientists for a long time, this work is one of the first attempts to apply modern machine learning solutions to archeological textile re-assembly. First and foremost, it is important to know which fragments belong to the same object. Therefore, features are extracted from digital images of textile fragments using color statistics, classical texture descriptors, and deep learning methods. These features are used to conduct clustering and identify similar fragments. Four different case studies with increasing complexity are discussed in this article: from well-preserved textiles with available ground truth to an actual open problem of Oseberg archeological tapestry with unknown solution. This work reveals significant knowledge gaps in current machine learning, which helps us to outline a future avenue toward more specialized application-specific models.
The purpose of this work is to present a new dataset of hyperspectral images of historical documents consisting of 66 historical family tree samples from the 16th and 17th centuries in two spectral ranges: VNIR (400-1000 nm) and SWIR (900-1700 nm). In addition, we performed an evaluation of different binarization algorithms, both using a single spectral band and generating false RGB images from the hyperspectral cube.
In this paper, we investigate the challenge of image restoration from severely incomplete data, encompassing compressive sensing image restoration and image inpainting. We propose a versatile implementation framework of plug-and-play ADMM image reconstruction, leveraging readily several available denoisers including model-based nonlocal denoisers and deep learning-based denoisers. We conduct a comprehensive comparative analysis against state-of-the-art methods, showcasing superior performance in both qualitative and quantitative aspects, including image quality and implementation complexity.
In this article, we study the properties of quantitative steganography detectors (estimators of the payload size) for content-adaptive steganography. In contrast to non-adaptive embedding, the estimator's bias as well as variance strongly depend on the true payload size. Initially, and depending on the image content, the estimator may not react to embedding. With increased payload size, it starts responding as the embedding changes begin to ``spill'' into regions where their detection is more reliable. We quantify this behavior with the concepts of reactive and estimable payloads. To better understand how the payload estimate and its bias depend on image content, we study a maximum likelihood estimator derived for the MiPOD model of the cover image. This model correctly predicts trends observed in outputs of a state-of-the-art deep learning payload regressor. Moreover, we use the model to demonstrate that the cover bias can be caused by a small number of ``outlier'' pixels in the cover image. This is also confirmed for the deep learning regressor on a dataset of artificial images via attribution maps.