In response to the current challenges in the detection of solder ball defects in ball grid array (BGA) packaged chips, which include slow detection speed, low efficiency, and poor accuracy, our research has addressed these issues. We have designed an algorithm for detecting solder ball defects in BGA-packaged chips by leveraging the specific characteristics of these defects and harnessing the advantages of deep learning. Building upon the YOLOv8 network model, we have made adaptive improvements to enhance the algorithm. First, we have introduced an adaptive weighted downsampling method to boost detection accuracy and make the model more lightweight. Second, to improve the extraction of image features, we have proposed an efficient multi-scale convolution method. Finally, to enhance convergence speed and regression accuracy, we have replaced the traditional Complete Intersection over Union loss function with Minimum Points Distance Intersection over Union (MPDIoU). Through a series of controlled experiments, our enhanced model has shown significant improvements when compared to the original network. Specifically, we have achieved a 1.7% increase in mean average precision, a 1.5% boost in precision, a 0.9% increase in recall, a reduction of 4.3 M parameters, and a decrease of 0.4 G floating-point operations per second. In comparative experiments, our algorithm has demonstrated superior overall performance when compared to other networks, thereby effectively achieving the goal of solder ball defect detection.
Archeological textiles can provide invaluable insight into the past. However, they are often highly fragmented, and a puzzle has to be solved to re-assemble the object and recover the original motifs. Unlike common jigsaw puzzles, archeological fragments are highly damaged, and no correct solution to the puzzle is known. Although automatic puzzle solving has fascinated computer scientists for a long time, this work is one of the first attempts to apply modern machine learning solutions to archeological textile re-assembly. First and foremost, it is important to know which fragments belong to the same object. Therefore, features are extracted from digital images of textile fragments using color statistics, classical texture descriptors, and deep learning methods. These features are used to conduct clustering and identify similar fragments. Four different case studies with increasing complexity are discussed in this article: from well-preserved textiles with available ground truth to an actual open problem of Oseberg archeological tapestry with unknown solution. This work reveals significant knowledge gaps in current machine learning, which helps us to outline a future avenue toward more specialized application-specific models.
The purpose of this work is to present a new dataset of hyperspectral images of historical documents consisting of 66 historical family tree samples from the 16th and 17th centuries in two spectral ranges: VNIR (400-1000 nm) and SWIR (900-1700 nm). In addition, we performed an evaluation of different binarization algorithms, both using a single spectral band and generating false RGB images from the hyperspectral cube.
In this paper, we investigate the challenge of image restoration from severely incomplete data, encompassing compressive sensing image restoration and image inpainting. We propose a versatile implementation framework of plug-and-play ADMM image reconstruction, leveraging readily several available denoisers including model-based nonlocal denoisers and deep learning-based denoisers. We conduct a comprehensive comparative analysis against state-of-the-art methods, showcasing superior performance in both qualitative and quantitative aspects, including image quality and implementation complexity.
In this article, we study the properties of quantitative steganography detectors (estimators of the payload size) for content-adaptive steganography. In contrast to non-adaptive embedding, the estimator's bias as well as variance strongly depend on the true payload size. Initially, and depending on the image content, the estimator may not react to embedding. With increased payload size, it starts responding as the embedding changes begin to ``spill'' into regions where their detection is more reliable. We quantify this behavior with the concepts of reactive and estimable payloads. To better understand how the payload estimate and its bias depend on image content, we study a maximum likelihood estimator derived for the MiPOD model of the cover image. This model correctly predicts trends observed in outputs of a state-of-the-art deep learning payload regressor. Moreover, we use the model to demonstrate that the cover bias can be caused by a small number of ``outlier'' pixels in the cover image. This is also confirmed for the deep learning regressor on a dataset of artificial images via attribution maps.
The first paper investigating the use of machine learning to learn the relationship between an image of a scene and the color of the scene illuminant was published by Funt et al. in 1996. Specifically, they investigated if such a relationship could be learned by a neural network. During the last 30 years we have witnessed a remarkable series of advancements in machine learning, and in particular deep learning approaches based on artificial neural networks. In this paper we want to update the method by Funt et al. by including recent techniques introduced to train deep neural networks. Experimental results on a standard dataset show how the updated version can improve the median angular error in illuminant estimation by almost 51% with respect to its original formulation, even outperforming recent illuminant estimation methods.
Advancements in sensing, computing, image processing, and computer vision technologies are enabling unprecedented growth and interest in autonomous vehicles and intelligent machines, from self-driving cars to unmanned drones, to personal service robots. These new capabilities have the potential to fundamentally change the way people live, work, commute, and connect with each other, and will undoubtedly provoke entirely new applications and commercial opportunities for generations to come. The main focus of AVM is perception. This begins with sensing. While imaging continues to be an essential emphasis in all EI conferences, AVM also embraces other sensing modalities important to autonomous navigation, including radar, LiDAR, and time of flight. Realization of autonomous systems also includes purpose-built processors, e.g., ISPs, vision processors, DNN accelerators, as well core image processing and computer vision algorithms, system design and architecture, simulation, and image/video quality. AVM topics are at the intersection of these multi-disciplinary areas. AVM is the Perception Conference that bridges the imaging and vision communities, connecting the dots for the entire software and hardware stack for perception, helping people design globally optimized algorithms, processors, and systems for intelligent “eyes” for vehicles and machines.
Phase retrieval (PR) consists of recovering complex-valued objects from their oversampled Fourier magnitudes and takes a central place in scientific imaging. A critical issue around PR is the typical nonconvexity in natural formulations and the associated bad local minimizers. The issue is exacerbated when the support of the object is not precisely known and hence must be overspecified in practice. Practical methods for PR hence involve convolved algorithms, e.g., multiple cycles of hybrid input-output (HIO) + error reduction (ER), to avoid the bad local minimizers and attain reasonable speed, and heuristics to refine the support of the object, e.g., the famous shrinkwrap trick. Overall, the convolved algorithms and the support-refinement heuristics induce multiple algorithm hyperparameters, to which the recovery quality is often sensitive. In this work, we propose a novel PR method by parameterizing the object as the output of a learnable neural network, i.e., deep image prior (DIP). For complex-valued objects in PR, we can flexibly parametrize the magnitude and phase, or the real and imaginary parts separately by two DIPs. We show that this simple idea, free from multi-hyperparameter tuning and support-refinement heuristics, can obtain superior performance than gold-standard PR methods. For the session: Computational Imaging using Fourier Ptychography and Phase Retrieval.
Lightness perception is a long-standing topic in research on human vision, but very few image-computable models of lightness have been formulated. Recent work in computer vision has used artifical neural networks and deep learning to estimate surface reflectance and other intrinsic image properties. Here we investigate whether such networks are useful as models of human lightness perception. We train a standard deep learning architecture on a novel image set that consists of simple geometric objects with a few different surface reflectance patterns. We find that the model performs well on this image set, generalizes well across small variations, and outperforms three other computational models. The network has partial lightness constancy, much like human observers, in that illumination changes have a systematic but moderate effect on its reflectance estimates. However, the network generalizes poorly beyond the type of images in its training set: it fails on a lightness matching task with unfamiliar stimuli, and does not account for several lightness illusions experienced by human observers.
Deep learning, which has been very successful in recent years, requires a large amount of data. Active learning has been widely studied and used for decades to reduce annotation costs and now attracts lots of attention in deep learning. Many real-world deep learning applications use active learning to select the informative data to be annotated. In this paper, we first investigate laboratory settings for active learning. We show significant gaps between the results from different laboratory settings and describe our practical laboratory setting that reasonably reflects the active learning use cases in real-world applications. Then, we introduce a problem setting of blind imbalanced domains. Any data set includes multiple domains, e.g., individuals in handwritten character recognition with different social attributes. Major domains have many samples, and minor domains have few samples in the training set. However, we must accurately infer both major and minor domains in the test phase. We experimentally compare different methods of active learning for blind imbalanced domains in our practical laboratory setting. We show that a simple active learning method using softmax margin and a model training method using distance-based sampling with center loss, both working in the deep feature space, perform well.