Accurate and precise classification/quantification of skin pigmentation is critical to address health inequities such as for example racial bias in pulse oximetry. Current skintone classification methods rely on measuring or estimating the color. These methods include a measurement device or subjective matching with skintone color scales. Robust detection of skin type and melanin index is challenging, as these methods require precise calibration. And recently acquired sun exposure may affect the measurements due to tanning or erythema. The proposed system differentiates and quantifies skin type and melanin index by exploiting the variance in skin structures and skin pigmentation network across skin types. Our result with a small study shows skin structure patterns are a robust, color independent method for skin tone classification. A real-time system demo shows the practical viability of the method.
The aim of this work is to transfer the model trained on magnetic resonance images of human autosomal dominant polycystic kidney disease (ADPKD) to rat and mouse ADPKD models. A dataset of 756 MRI images of ADPKD kidneys was employed to train a modified UNet3+ architecture, which incorporated residual layers, switchable normalization, and concatenated skip connections for kidney and cyst segmentation tasks. The trained model was then subjected to transfer learning (TL) using data from two commonly utilized animal PKD models: the Pkdh1pck (PCK) rat and the Pkd1RC∕RC (RC) mouse. Transfer learning achieved Dice similarity coefficients of 0.93±0.04 and 0.63±0.16 (mean±SD) for a sample combination of PCK+RC kidneys and cysts, respectively, on the test datasets of animal images. We showcased the utilization of TL in situations involving constrained source and target datasets and have achieved good accuracy in the cases of class imbalance.
In response to challenges such as large parameter count, difficulty in deployment, low accuracy, and slow speed of facial state recognition models in driver fatigue detection, the authors propose a lightweight real-time facial state recognition model called YOLOv5-fatigue based on YOLOv5n. First, a bilateral convolution is proposed, which can fully utilize the feature information in the channel. Then an innovative deep lightweight module is proposed, which reduces the number of network parameters as well as the computational effort by replacing the ordinary convolution in the neck network. Lastly, the normalization-based attention module is added to solve the problem of accuracy decline caused by lightweight models while keeping the number of parameters unchanged. In this paper, we first recognize the facial state by YOLOv5-fatigue and then use the proportion of eyes closed per unit of time and the proportion of mouth closed per unit of time to determine fatigue. In comparison experiments conducted on our self-built VIGP-fatigue dataset with other detection algorithms, our proposed method achieved an increase of 1% in AP50 compared to the baseline model YOLOv5n, reaching 92.6%. The inference time was reduced by 9% to 2.1 ms, and the parameter count decreased by 42.6% to 1.01 M.
In response to the current challenges in the detection of solder ball defects in ball grid array (BGA) packaged chips, which include slow detection speed, low efficiency, and poor accuracy, our research has addressed these issues. We have designed an algorithm for detecting solder ball defects in BGA-packaged chips by leveraging the specific characteristics of these defects and harnessing the advantages of deep learning. Building upon the YOLOv8 network model, we have made adaptive improvements to enhance the algorithm. First, we have introduced an adaptive weighted downsampling method to boost detection accuracy and make the model more lightweight. Second, to improve the extraction of image features, we have proposed an efficient multi-scale convolution method. Finally, to enhance convergence speed and regression accuracy, we have replaced the traditional Complete Intersection over Union loss function with Minimum Points Distance Intersection over Union (MPDIoU). Through a series of controlled experiments, our enhanced model has shown significant improvements when compared to the original network. Specifically, we have achieved a 1.7% increase in mean average precision, a 1.5% boost in precision, a 0.9% increase in recall, a reduction of 4.3 M parameters, and a decrease of 0.4 G floating-point operations per second. In comparative experiments, our algorithm has demonstrated superior overall performance when compared to other networks, thereby effectively achieving the goal of solder ball defect detection.
Archeological textiles can provide invaluable insight into the past. However, they are often highly fragmented, and a puzzle has to be solved to re-assemble the object and recover the original motifs. Unlike common jigsaw puzzles, archeological fragments are highly damaged, and no correct solution to the puzzle is known. Although automatic puzzle solving has fascinated computer scientists for a long time, this work is one of the first attempts to apply modern machine learning solutions to archeological textile re-assembly. First and foremost, it is important to know which fragments belong to the same object. Therefore, features are extracted from digital images of textile fragments using color statistics, classical texture descriptors, and deep learning methods. These features are used to conduct clustering and identify similar fragments. Four different case studies with increasing complexity are discussed in this article: from well-preserved textiles with available ground truth to an actual open problem of Oseberg archeological tapestry with unknown solution. This work reveals significant knowledge gaps in current machine learning, which helps us to outline a future avenue toward more specialized application-specific models.
The purpose of this work is to present a new dataset of hyperspectral images of historical documents consisting of 66 historical family tree samples from the 16th and 17th centuries in two spectral ranges: VNIR (400-1000 nm) and SWIR (900-1700 nm). In addition, we performed an evaluation of different binarization algorithms, both using a single spectral band and generating false RGB images from the hyperspectral cube.
In this paper, we investigate the challenge of image restoration from severely incomplete data, encompassing compressive sensing image restoration and image inpainting. We propose a versatile implementation framework of plug-and-play ADMM image reconstruction, leveraging readily several available denoisers including model-based nonlocal denoisers and deep learning-based denoisers. We conduct a comprehensive comparative analysis against state-of-the-art methods, showcasing superior performance in both qualitative and quantitative aspects, including image quality and implementation complexity.
In this article, we study the properties of quantitative steganography detectors (estimators of the payload size) for content-adaptive steganography. In contrast to non-adaptive embedding, the estimator's bias as well as variance strongly depend on the true payload size. Initially, and depending on the image content, the estimator may not react to embedding. With increased payload size, it starts responding as the embedding changes begin to ``spill'' into regions where their detection is more reliable. We quantify this behavior with the concepts of reactive and estimable payloads. To better understand how the payload estimate and its bias depend on image content, we study a maximum likelihood estimator derived for the MiPOD model of the cover image. This model correctly predicts trends observed in outputs of a state-of-the-art deep learning payload regressor. Moreover, we use the model to demonstrate that the cover bias can be caused by a small number of ``outlier'' pixels in the cover image. This is also confirmed for the deep learning regressor on a dataset of artificial images via attribution maps.
The first paper investigating the use of machine learning to learn the relationship between an image of a scene and the color of the scene illuminant was published by Funt et al. in 1996. Specifically, they investigated if such a relationship could be learned by a neural network. During the last 30 years we have witnessed a remarkable series of advancements in machine learning, and in particular deep learning approaches based on artificial neural networks. In this paper we want to update the method by Funt et al. by including recent techniques introduced to train deep neural networks. Experimental results on a standard dataset show how the updated version can improve the median angular error in illuminant estimation by almost 51% with respect to its original formulation, even outperforming recent illuminant estimation methods.
Advancements in sensing, computing, image processing, and computer vision technologies are enabling unprecedented growth and interest in autonomous vehicles and intelligent machines, from self-driving cars to unmanned drones, to personal service robots. These new capabilities have the potential to fundamentally change the way people live, work, commute, and connect with each other, and will undoubtedly provoke entirely new applications and commercial opportunities for generations to come. The main focus of AVM is perception. This begins with sensing. While imaging continues to be an essential emphasis in all EI conferences, AVM also embraces other sensing modalities important to autonomous navigation, including radar, LiDAR, and time of flight. Realization of autonomous systems also includes purpose-built processors, e.g., ISPs, vision processors, DNN accelerators, as well core image processing and computer vision algorithms, system design and architecture, simulation, and image/video quality. AVM topics are at the intersection of these multi-disciplinary areas. AVM is the Perception Conference that bridges the imaging and vision communities, connecting the dots for the entire software and hardware stack for perception, helping people design globally optimized algorithms, processors, and systems for intelligent “eyes” for vehicles and machines.