Grayscale images are essential in image processing and computer vision tasks. They effectively emphasize luminance and contrast, highlighting important visual features, while also being easily compatible with other algorithms. Moreover, their simplified representation makes them efficient for storage and transmission purposes. While preserving contrast is important for maintaining visual quality, other factors such as preserving information relevant to the specific application or task at hand may be more critical for achieving optimal performance. To evaluate and compare different decolorization algorithms, we designed a psychological experiment. During the experiment, participants were instructed to imagine color images in a hypothetical ”colorless world” and select the grayscale image that best resembled their mental visualization. We conducted a comparison between two types of algorithms: (i) perceptual-based simple color space conversion algorithms, and (ii) spatial contrast-based algorithms, including iteration-based methods. Our experimental findings indicate that CIELAB exhibited superior performance on average, providing further evidence for the effectiveness of perception-based decolorization algorithms. On the other hand, the spatial contrast-based algorithms showed relatively poorer performance, possibly due to factors such as DC-offset and artificial contrast generation. However, these algorithms demonstrated shorter selection times. Notably, no single algorithm consistently outperformed the others across all test images. In this paper, we will delve into a comprehensive discussion on the significance of contrast and luminance in color-to-grayscale mapping based on our experimental results and analysis.
In order to train a learning-based prediction model, large datasets are typically required. One of the major restrictions of machine learning applications using customized databases is the cost of human labor. In the previous papers [3, 4, 5], it is demonstrated through experiments that the correlation between thin-film nitrate sensor performance and surface texture exists. In the previous papers, several methods for extracting texture features from sensor images are explored, repeated cross-validation and a hyperparameter auto-tuning method are performed, and several machine learning models are built to improve prediction accuracy. In this paper, a new way to achieve the same accuracy with a much smaller dataset of labels by using an active learning structure is presented.
We live in a visual world. The perceived quality of images is of crucial importance in industrial, medical, and entertainment application environments. Developments in camera sensors, image processing, 3D imaging, display technology, and digital printing are enabling new or enhanced possibilities for creating and conveying visual content that informs or entertains. Wireless networks and mobile devices expand the ways to share imagery and autonomous vehicles bring image processing into new aspects of society. The power of imaging rests directly on the visual quality of the images and the performance of the systems that produce them. As the images are generally intended to be viewed by humans, a deep understanding of human visual perception is key to the effective assessment of image quality.
The COVID-19 virus induces infection in both the upper respiratory tract and the lungs. Chest X-ray are widely used to diagnose various lung diseases. Considering chest X-ray and CT images, we explore deep-learning-based models namely: AlexNet, VGG16, VGG19, Resnet50, and Resnet101v2 to classify images representing COVID-19 infection and normal health situation. We analyze and present the impact of transfer learning, normalization, resizing, augmentation, and shuffling on the performance of these models. We explored the vision transformer (ViT) model to classify the CXR images. The ViT model incorporates multi-headed attention to disclose more global information in constrast to CNN models at lower layers. This mechanism leads to quantitatively diverse features. The ViT model renders consolidated intermediate representations considering the training data. For experimental analysis, we use two standard datasets and exploit performance metrics: accuracy, precision, recall, and F1-score. The ViT model, driven by self-attention mechanism and longrange context learning, outperforms other models.
Recent advances in convolutional neural networks and vision transformers have brought about a revolution in the area of computer vision. Studies have shown that the performance of deep learning-based models is sensitive to image quality. The human visual system is trained to infer semantic information from poor quality images, but deep learning algorithms may find it challenging to perform this task. In this paper, we study the effect of image quality and color parameters on deep learning models trained for the task of semantic segmentation. One of the major challenges in benchmarking robust deep learning-based computer vision models is lack of challenging data covering different quality and colour parameters. In this paper, we have generated data using the subset of the standard benchmark semantic segmentation dataset (ADE20K) with the goal of studying the effect of different quality and colour parameters for the semantic segmentation task. To the best of our knowledge, this is one of the first attempts to benchmark semantic segmentation algorithms under different colour and quality parameters, and this study will motivate further research in this direction.
Recent advances in convolutional neural networks and vision transformers have brought about a revolution in the area of computer vision. Studies have shown that the performance of deep learning-based models is sensitive to image quality. The human visual system is trained to infer semantic information from poor quality images, but deep learning algorithms may find it challenging to perform this task. In this paper, we study the effect of image quality and color parameters on deep learning models trained for the task of semantic segmentation. One of the major challenges in benchmarking robust deep learning-based computer vision models is lack of challenging data covering different quality and colour parameters. In this paper, we have generated data using the subset of the standard benchmark semantic segmentation dataset (ADE20K) with the goal of studying the effect of different quality and colour parameters for the semantic segmentation task. To the best of our knowledge, this is one of the first attempts to benchmark semantic segmentation algorithms under different colour and quality parameters, and this study will motivate further research in this direction.
Circle detection of edge images can involve significant time and memory requirements, particularly if the circles have unknown radii over a large range. We describe an algorithm that processes an edge image in a single linear pass, compiling statistics of connected components that can be used by two distinct least square methods. Because the compiled statistics are all sums, these components can then be quickly merged without any further examination of image pixels. Fusing multiple circle detectors allows more powerful circle detection. The resulting algorithm is of linear complexity in the number of image pixels, and quadratic complexity in a much smaller number of cluster statistics.
We live in a visual world. The perceived quality of images is of crucial importance in industrial, medical, and entertainment application environments. Developments in camera sensors, image processing, 3D imaging, display technology, and digital printing are enabling new or enhanced possibilities for creating and conveying visual content that informs or entertains. Wireless networks and mobile devices expand the ways to share imagery and autonomous vehicles bring image processing into new aspects of society. The power of imaging rests directly on the visual quality of the images and the performance of the systems that produce them. As the images are generally intended to be viewed by humans, a deep understanding of human visual perception is key to the effective assessment of image quality.
Correspondences are prevalent in natural videos among different frames, as well as a set of images sharing a common attribute. Dense correspondences are important for the core problem of many natural image and video reconstruction tasks: recovering texture details with high fidelity. In this paper, we will discuss recent methods in learning and utilizing such correspondences in image and video reconstruction. Specifically, we decompose the network design into several switchable components of different purposes and discuss their applications to different images and video restoration tasks such as super-resolution, denoising, and video frame interpolation. In this way, we can analyze the performance and uncover the generic and efficient network design. Benefiting from the above investigations, our proposed methods achieve state-of-the-art performance on multiple tasks with fewer parameters. Our findings could inspire the network design of multiple image and video reconstruction tasks for the future.
Material appearance is a perceptual phenomenon that the brain interprets from the retinal image. Though, it is not easy to analyze what features of optical images are effectively related to the stimulus inside the visual cortex. For this reason, an intuitive or heuristic approach has been taken to simulate the material appearance. The simulation results are expected to drive innovation for not only traditional craft or plastic arts industry but also more realistic picture displays on 4K/8K HDTV and Virtual Reality or Computer Graphics. Optical surface property of material is modeled by BRDF (Bidirectional Reflectance Distribution Function). Specular S and Diffusion D components are responsible for the "glossiness" and "texture" and are used to emphasize the material appearance by simply adjusting the mixing ratio. This study introduces the following two key models to emphasize the material appearance of a given image without using such measuring means as BRDF and discusses how they work individually and cooperatively. (1) α-based Dehazing model to emphasize clarity, wetness, gloss. (2) β-based Contrast model to emphasize texture, roughness.