Recently, the use of neural networks for image classification has become widely spread. Thanks to the availability of increased computational power, better performing architectures have been designed, such as the Deep Neural networks. In this work, we propose a novel image representation framework exploiting the Deep p-Fibonacci scattering network. The architecture is based on the structured p-Fibonacci scattering over graph data. This approach allows to provide good accuracy in classification while reducing the computational complexity. Experimental results demonstrate that the performance of the proposed method is comparable to state-of-the-art unsupervised methods while being computationally more efficient.
In this paper, we propose a method for automatically estimating three typical human-impression factors, "hard-soft", "flashy-sober", and "stable-unstable" which are obtained from objects by analyzing their three-dimensional shapes. By realizing this method, a designer's will in directly shaping an object can be reflected during the design process. Here, the focus is highly correlating human impressions to the three-dimensional shape representation of objects. Previous work includes a method for estimating human impressions by using specially designed features and linear classifiers. However, it can be used for only the "hard-soft" impression factor because the feature has been optimized for this impression. The performance of this method is relatively low, and its processing time is low. In addition to, there is a serious problem in which this method can be used for only a particular impression factor. The purpose of this research is to propose a new method that can apply to all three typical impression factors mentioned above. First, we use a single RGB image that was acquired from a specific view direction instead of general three-dimensional mesh data from the range finder. This enables a very simple system consisting of a single camera. Second, we use a deep neural network as a nonlinear classifier. For our experiment, a lot of learning sample images with numerical human-impression factors were used. As for annotating correct impression factors as ground-truths, we utilized the SD (semantic differential) method, which is very popular in the field of psychological statistics. We have shown that the success rate of the proposed method is 83% for "hard-sofi", 78% for "flashy-sober", and 80% for "stable-unstable" when using test images that are not included in the learning data.
The non-stationary nature of image characteristics calls for adaptive processing, based on the local image content. We propose a simple and flexible method to learn local tuning of parameters in adaptive image processing: we extract simple local features from an image and learn the relation between these features and the optimal filtering parameters. Learning is performed by optimizing a user defined cost function (any image quality metric) on a training set. We apply our method to three classical problems (denoising, demosaicing and deblurring) and we show the effectiveness of the learned parameter modulation strategies. We also show that these strategies are consistent with theoretical results from the literature.
We suggest a method for sharpening an image or video stream without using convolution, as in unsharp masking, or deconvolution, as in constrained least-squares filtering. Instead, our technique is based on a local analysis of phase congruency and hence focuses on perceptually important details. The image is partitioned into overlapping tiles, and is processed tile by tile. We perform a Fourier transform for each of the tiles, and define congruency for each of the components in such a way that it is large when the component's neighbours are aligned with it, and small otherwise. We then amplify weak components with high phase congruency and reduce strong components with low phase congruency. Following this method, we avoid strengthening the Fourier components corresponding to sharp edges, while amplifying those details that underwent a slight or moderate defocus blur. The tiles are then seamlessly stitched. As a result, the image sharpness is improved wherever perceptually important details are present.
In this paper, we introduce the human visual system-based several new (a) methods to visualize the very small differences in intensities without big changes of primary image information and (b) measures that quality the visuality of both grayscale and color images. Several illustrative examples are also presented. The proposed concepts can be used for many image processing, computer vision and recognition system applications.
Existing full-reference metrics still do not provide a desirable degree of adequacy to a human visual perception, for evaluation of images with different types and levels of distortions. One reason for this is that it is difficult to incorporate the peculiarities of human visual system in the metrics design. In this paper, a robust approach to full-reference metrics' design is proposed, based on a combination of several existing full-reference metrics. A preliminary linearization (fitting) of the dependence of MOS with respect to the components metrics is performed in order to compensate shortcomings of each component. The proposed method is tested on several known databases, and demonstrate better performance than existing metrics.
Higher-order tensor structured data arise in many imaging scenarios, including hyperspectral imaging and color video. The recovery of a tensor from an incomplete set of its entries, known as tensor completion, is crucial in applications like compression. Furthermore, in many cases observations are not only incomplete, but also highly quantized. Quantization is a critical step for high dimensional data transmission and storage in order to reduce storage requirements and power consumption, especially for energy-limited systems. In this paper, we propose a novel approach for the recovery of low-rank tensors from a small number of binary (1-bit) measurements. The proposed method, called 1-bit Tensor Completion, relies on the application of 1-bit matrix completion over different matricizations of the underlying tensor. Experimental results on hyperspectral images demonstrate that directly operating with the binary measurements, rather than treating them as real values, results in lower recovery error.
In these days and age, printed and digital images are the principal means of communication chosen by companies to convey information about their products, since visual contents produce a direct and effective influence on people. At the same time, imagery can be digitally enriched with additional information, imperceptible to the human eye, yet still retrievable using specific software or hardware: this is the case of digital watermarking. In this work, we propose a digital watermarking pipeline that performs information embedding robust to printing operations and enables blind detection from digital acquisitions. We select a watermark from a set of orthogonal antipodal matrices and adaptively insert repeated copies in the horizontal, vertical and diagonal sub-bands of the Wavelet domain. Blind detection is achieved denoising the digitally acquired marked image, retrieving the watermark information from the Wavelet domain, restoring the original scaling with an optimization algorithm and computing a similarity score with each of the possible orthogonal marks. Our system is able to reconstruct the embedded information both in the cases of acquisition from digital and printed watermarked images. It is able to recognize the correct mark among the set of possible messages, even when considering poor-quality printing systems.
The use of Multi-Function Printers (MFPs) in the office and home-office setting continues to trend upwards despite the advent and use of mobile technology in that space. One such novel use case, is scanning multiple distinct objects that are either related or different from each other in terms of content. The proposed algorithm seeks to achieve separation of distinct objects placed on a flat-bed scanner and save each object as a different file. Further processing and identification can then be performed on each item for enhanced use. However, due to the memory limitations on some MFPs a strip based processing technique is utilized in this case, wherein only a portion of the acquired scan is processed each time. The technique utilizes edge detection, thresholding, morphology and a connect component analysis scheme along with label propagation from one strip to another to achieve the goal of media separation. The results obtained are promising for the speed and memory constraints imposed.
Noise suppression in complex-valued data is an important task for a wide class of applications, in particular concerning the phase retrieval in coherent imaging. The approaches based on BM3D techniques are ones of the most successful in the field. In this paper, we propose and develop a new class of BM3D-style algorithms, which use high order (3D and 4D) singular value decomposition (HOSVD) for transform design in complex domain. This set of the novel algorithms is implemented as a toolbox In Matlab. This development is produced for various types of the complex-domain sparsity: directly in complex domain, real/imaginary and phase/ amplitude parts of complexvalued variables. The group-wise transform design is combined with the different kinds of thresholding including multivariable Wiener filtering. The toolbox includes iterative and non-iterative novel complex-domain algorithms (filters). The efficiency of the developed algorithms is demonstrated on denoising problems with an additive Gaussian complex-valued noise. A special set of the complex-valued test-images was developed with spatially varying correlated phase and amplitudes imitating data typical for optical interferometry and holography. It is shown that for this class of the test-images the developed algorithms demonstrate the state-of-the-art performance.