Back to articles
Papers Presented at CIC30: Color and Imaging 2022
Volume: 66 | Article ID: 050401
Image
Exploring Effects of Colour and Image Quality in Semantic Segmentation by Deep Learning Methods
  DOI :  10.2352/J.ImagingSci.Technol.2022.66.5.050401  Published OnlineSeptember 2022
Abstract
Abstract

Recent advances in convolutional neural networks and vision transformers have brought about a revolution in the area of computer vision. Studies have shown that the performance of deep learning-based models is sensitive to image quality. The human visual system is trained to infer semantic information from poor quality images, but deep learning algorithms may find it challenging to perform this task. In this paper, we study the effect of image quality and color parameters on deep learning models trained for the task of semantic segmentation. One of the major challenges in benchmarking robust deep learning-based computer vision models is lack of challenging data covering different quality and colour parameters. In this paper, we have generated data using the subset of the standard benchmark semantic segmentation dataset (ADE20K) with the goal of studying the effect of different quality and colour parameters for the semantic segmentation task. To the best of our knowledge, this is one of the first attempts to benchmark semantic segmentation algorithms under different colour and quality parameters, and this study will motivate further research in this direction.

Subject Areas :
Views 168
Downloads 54
 articleview.views 168
 articleview.downloads 54
  Cite this article 

Kanjar De, "Exploring Effects of Colour and Image Quality in Semantic Segmentation by Deep Learning Methodsin Journal of Imaging Science and Technology,  2022,  pp 050401-1 - 050401-10,  https://doi.org/10.2352/J.ImagingSci.Technol.2022.66.5.050401

 Copy citation
  Copyright statement 
Copyright © 2022 Society for Imaging Science and Technology 2022
 Open access
  Article timeline 
  • received May 2022
  • accepted July 2022
  • PublishedSeptember 2022
jist
JIMTE6
Journal of Imaging Science and Technology
J. Imaging Sci. Technol.
J. Imaging Sci. Technol.
1062-3701
1943-3522
Society for Imaging Science and Technology
1.
Introduction
After the success of AlexNet [1] in ImageNet [2] Challenge 2012, deep convolutional neural networks have become an indispensable tool in computer vision. From the early successes in image classification, these networks have been used in different computer vision tasks such as object detection, object tracking [3, 4] and image segmentation [58]. Image classification is the task of assigning a class label to a particular image, but merely assigning a class label does not provide information on the understanding of the entire scene. Human visual systems are naturally trained to develop a deeper understanding simply by looking at the image. Semantic segmentation is the task of assigning a class label to a particular pixel and grouping together similar pixels. Semantic segmentation provides a deeper understanding of the context of the entire image. Several architectures based on convolutional neural networks have been developed for image classification over the last decade. An example of semantic segmentation is shown in Figure 1 (Ref. [9]). Some of the most popular ones are ResNet [10], DenseNet [11], VGGNets [12], GoogleNet [13], EfficientNets [14] among others. Recently, next-generation ConvNeXts [15] have been proposed to take on transformers, where ResNet architectures have made some design changes to mimic patchifying. In these architectures, depth-wise convolutions and inverted bottlenecks, larger kernel sizes permit global receptive power and micro-designs like fewer activations and normalization layers.
Figure 1.
Example from ADE20K dataset [9] with segmentation map.
Transformers have been popular in the field of natural language processing. Recently, computer vision researchers have achieved state-of-the-art results on the task of image classification outperforming most CNN based architectures. Popular architectures include the vision transformer (VIT) [16] and the shifted window transformer (SWIN) [17]. The core idea of VIT is to split the input image into patches followed by vectorization. The vectors are then followed by dense layers with shared parameters. The next step is positional encoding, which is used to represent structural information. The vectors along with a token for classification are passed through a series of multihead self-attention layers followed by dense layers, which constitute the transformer encoder network. The SWIN transformer also transforms the image into patches that are passed through a linear transform. The SWIN transformer uses small patches in the first transformer layer and merges into larger patches in the deeper transformer layers. Then the concept of shifted window-based self-attention is applied followed by a series of transformers with limited attention and merging layers followed by a linear dimensionality reducers. One of the recent advances is the data-efficient image transformer (DEIT), which uses the concept of distillation and the attention mechanism [18].
Generally, deep learning-based models are data-dependent and need a large amount of data to develop robust models, and as a result, the quality of data is an important parameter when training these models. One of the most critical challenges in the deployment of machine learning-based systems in the real world is that during testing time, if there is a shift in distribution of the data, the system is vulnerable to failures. Recent work [1921] has demonstrated that image quality is a very important attribute in developing a machine learning-based system involving images. Multiple studies by Hendrycks et al. [2224] in the ImageNet database have demonstrated that perturbations and distribution shifts in images have a significant impact on the performance of deep learning-based computer vision models.
One of the lesser explored areas in deep learning, in computer vision systems is the impact of colour on the performance of these networks. Deep learning networks such as CNNs [25] and generative adversarial networks (GANs) [26, 27] have shown promising results in converting grayscale images into colour images, and also some approaches have been proposed in the area of demosaicing [28, 29]. Recent studies [3032] have shown that colour information has a significant impact on image classification tasks. Colour parameters such as the hue angle shift [33] have shown a significant impact on the performance of state-of-the-art deep neural networks trained on pristine ImageNet data. Colour information has been exploited successfully in the past by image segmentation algorithms [3436]. Kantipudi [37] et al. have shown that colour channels can be exploited to attack deep learning systems. Previous studies have shown that CNNs trained on ImageNet are biased towards texture [38]. Much current research is focused on the robustness [3943] of deep learning research; therefore,it is important to investigate the effect of colour and image quality on the robustness of these deep learning methods. To the best of our knowledge, there has been very little work exploring the impact of colour information on modern deep learning-based semantic segmentation networks.
In this paper, we try to study the robustness of deep learning-based semantic segmentation models [44]. One of the first challenges is to identify and inspect quality and colour based parameters which are likely to have an impact on the performance of deep learning based semantic segmentation models and generate a dataset to bench-mark the performance. The parameters used for our study include color space information, ISONoise, gamut, hue angle shift, saturation, contrast, brightness, etc. to name a few. The proposed dataset is built using the standard ADE20K [9, 45]. We tested some of the CNN- and transformer-based methods for semantic segmentation to gain insight on how these models respond to inputs which have been perturbed from the distribution on which they are trained. With more and more real-life tasks being deployed based on deep learning trained computer vision models, understanding the robustness parameters of these models is of paramount importance. The rest of the paper is organized as follows: we briefly describe the methods and architectures used in this study, and then we describe in detail the dataset generation and investigations conducted.
2.
Architectures and Methods
For this work, we have studied a few state-of-the-art semantic segmentation networks and their backbones. We have included CNN- and transformer-based methods for this study. For our analysis, we have used models pre-trained in ADE20K from the MMSegementation [46] model zoo.
Fully convolutional networks (FCN) [47] were one of the first techniques to explore the use of convolutional networks to perform semantic segmentation. They transformed the classification network by adding upsampling into a segmentation network. For this particular study, we have included an FCN with a ResNet-based backbone, namely ResNet-50 and ResNet-101 backbones.
Pyramid Scene Parsing networks (PSPNets) [48] is the next image segmentation method included in our study. PSPNets were designed to extract context information and improve the quality of segmentation. Like FCN we tested our augmented data for PSPNets and also the backbones used are ResNet-50 and ResNet-101 backbones for a fair comparison.
Unified perceptual parsing networks (UPerNet) [49] are networks designed based on unified perceptual parsing, which involves learning multiple possible visual concepts from a given image. We have conducted extensive experiments using UPerNets and also we have performed experiments on models which are combination of UPerNets with the latest backbones namely ConvNeXt, VIT, DEIT, and SWIN transformers in addition to the ResNet-50 and ResNet-101 backbones and found some interesting results, which give us an idea about robustness of these models.
With current advances in the applications of transformers in computer vision tasks, Strude et al. [50] proposed a semantic segmentation method using transformers, where the authors have extended the VIT architecture for the task of segmentation. The method uses the output embeddings corresponding to the patches of the images, and these embeddings yield the class labels using a mask transformer decoder or a pointwise linear decoder.
3.
Pixel Accuracy
In this section, we describe the performance measure used for our analysis. Common problems in semantic segmentation include mismatched class labels, getting inconspicuous classes, mismatched relationships, among others to name a few. For semantic segmentation, each and every pixel in the image is assigned a class label. To measure performance, we calculate the ratio of the number of pixel labels identified by the semantic segmentation algorithm to the pixel annotations from the ground truth. Let GT and Pred be the ground truth and the predicted segmented maps, respectively. Let TN be the total number of pixels in the ground truth and Tmatched be the total number of pixels where the ground truth and the predicted segmentation map labels are in a pair. Pixel accuracy (PA) is the ratio between Tmatched and TN. For this analysis, we report the mean pixel accuracy for 2000 images considered the dataset and the values are reported as a percentage.
4.
Dataset Generation and Analysis
One of the key challenges, to our knowledge, is that there are no data to support the study of how image quality and colour affect the process of semantic segmentation in images. ADE20K is one of the most challenging benchmarks in semantic segmentation, so we decided to build our image quality and colour dataset on the classes of the ADE20K dataset by using 2000 of their labeled validation datasets, and we only just modified those colour and quality parameters so that the boundaries of segmentation are not altered. We have used the pixel-level accuracy between the predicted and ground-truth maps to evaluate different semantic segmentation techniques. Table I shows the average pixel classification accuracy for all 2000 pristine images in the dataset. The main observation is that transformer-based models have better accuracy than CNN models, but the UPerNet-ConvNeXt combination has competitive performance. For all the methods, as expected, ResNet-101 performs better than the ResNet-50 backbones.
Table I.
Average Pixel Classification Accuracy (PA) for the pristine images in the dataset.
Pristine
MethodBackbonePA
FCNResNet-5078.2
ResNet-10180.1
PSPResNet-5081.0
ResNet-10181.8
UPerNetResNet-5080.8
ResNet-10181.5
Next83.4
VIT-B83.1
DEIT82.9
SWIN-B83.0
SegmenterVIT-B83.8
4.1
Colour Space based Distortions
Colour information from an image can be modeled using different colour spaces. During the pre-deep learning era, colour space information was used for segmentation [51]. Some of the popular colour spaces are red-green-blue (RGB), hue saturation value (HSV), luminescence chrominance (YCbCr), and CIE-Lab colour space. Generally, deep neural networks are trained on the RGB colour space. To study the impact of colour information encoded in color spaces, we modified (setting to zero) the hue, saturation, Cb, Cr and A, B components of the HSV, YCbCr, and CIE-Lab spaces, respectively, and created six subsets of images as shown in Figure 2. The images were generated by converting the RGB image into HSV [52], YCbCr [53] and CIE-Lab spaces and then the hue (H), saturation (S), Cb, Cr, a and b channels were set to 0 to generate the images. All the colour space based distortions were implemented using Matlab 2021a.
Figure 2.
Example from the generated dataset the component not seen is mentioned in the labels.
Table II.
Average Pixel Classification Accuracy (PA) for the colour space-modified images in the dataset.
HueSatCbCrAB
MethodBackbonePAPAPAPAPAPA
FCNResNet-5073.570.165.962.177.075.1
ResNet-10177.573.672.665.279.178.1
PSPResNet-5078.975.874.066.180.379.4
ResNet-10179.576.776.268.080.180.4
UPerNetResNet-5078.375.473.050.879.978.7
ResNet-10179.376.975.765.580.980.1
Next81.580.779.477.382.381.8
VIT-B80.980.779.474.981.881.3
DEIT80.779.279.077.381.480.7
Swin-B80.579.178.176.181.581.2
SegmenterVIT-B82.182.082.079.882.782.7
One of the key observations from Table II is that in the YCbCr space, the Cb and Cr components have a significant effect on the performance of deep learning-based models. The methods with CNN backbones perform significantly worse than those with transformer backbones. The perturbation of the Cb and Cr components in the image has created a maximum reduction in performance in comparison to any other component in other colour spaces.
4.2
Hue Angle Shift
For image classification, studies have shown [33] that changing the hue angle to red or blue has a significant impact on the performance of deep convolutional networks. Similarly to the previous study, here we shift the hue angle by 60 degrees to create a subset of five classes of images (60,120,180,240,300) as depicted in Figure 3. The aim is to study the effect of hue shift on semantic segmentation models trained on pristine images with a few augmentations. The main observation from Table III is that when we change the hue angle, there is a change in the distribution and there is a drop in performance. The networks with transformer backbones perform better than the earlier methods with ResNet-50 or ResNet-101 backbones. In addition, the performance of UPerNet with the next-generation ConvNeXt backbone is also very competitive. The networks perform the worst when the hue-angle shift is around 180 degrees. Fully convolutional networks (FCN) are seen to be more sensitive to hue-angle shifts compared to the competitors PSPNets and UPerNets. The UPerNets that are combined with transformers are found to be more robust compared to the ones with CNN backbones.
Table III.
Average Pixel Classification Accuracy (PA) for different hue-angle shifts (in degrees) for images in the dataset.
60120180240300
MethodBackbonePAPAPAPAPA
FCNResNet-5076.872.969.170.776.4
ResNet-10179.176.274.375.879.1
PSPResNet-5080.076.176.177.480.2
ResNet-10180.978.576.878.181.1
UPerNetResNet-5079.576.774.876.679.7
ResNet-10180.678.176.177.880.7
Next82.780.979.780.982.8
VIT-B82.680.379.380.182.5
DEIT82.480.079.679.982.5
Swin-B82.380.078.979.782.4
SegmenterVIT-B83.481.881.181.683.4
Figure 3.
Example from the generated dataset with the hue shift mentioned in the labels.
4.3
Saturation
We have varied the saturation levels in the images into four levels in order of distortion (refer to Figure 4 reduction in this case) to study the impact of saturation on the semantic segmentation task performed by deep learning models. The four levels were created by scaling the values of the S channel in the HSV colour space. For the experiments in this paper, the scaling factors used were 0.8,0.6,0.4,0.2, respectively. which are termed Levels 1 to 4, respectively. The modification was performed using MATLAB 2021a where the saturation channel was scaled using these scaling factors. The observation from Table IV is that, as expected, the transformer models tend to perform better than the CNN based models. There is a drop in performance when the saturation is reduced in the images, but the transformer-based models show better resistance to these changes.
Table IV.
Average Pixel Classification Accuracy (PA) for different saturation levels for images in the dataset.
Level 1Level 2Level 3Level 4
MethodBackbonePAPAPAPA
FCNResNet-5078.177.977.275.0
ResNet-10180.180.079.477.8
PSPResNet-5081.080.880.479.1
ResNet-10181.881.781.279.9
UPerNetResNet-5080.780.579.978.5
ResNet-10181.581.481.079.8
Next83.483.283.182.6
VIT-B83.183.082.882.2
DEIT82.982.882.582.0
Swin-B83.082.882.782.0
SegmenterVIT-B83.783.783.683.3
Figure 4.
Example from the generated dataset with different saturation levels mentioned in the labels.
4.4
Brightness and Contrast
Brightness and contrast are two fundamental parameters of image quality. The goal is to study how the semantic segmentation of images works when the input image has poor contrast or it is too dark or too bright. In real-life applications, sometimes brightness and contrast may get affected due to uncontrollable situations, and thus a deep learning-based model must be robust to brightness and contrast variations. For this study, we used four levels of contrast and brightness and implemented them using the Augly library [54] which is used for adversarial robustness. Examples of brightness modification are shown in Figure 5. Four levels were chosen for the final analysis that have poor visual contrast for the final reporting of the results to demonstrate their effect on semantic segmentation. Images with poor visual quality showed poorer performance for semantic segmentation. Table V shows that darker images(B1) degrade the performance of methods with ResNet backbones compared to methods based on transformers. A similar behavior is observed for too bright images (B4).
Figure 5.
Example from the generated dataset with different brightness levels.
Table V.
Average Pixel Classification Accuracy (PA) for different brightness levels for images in the dataset.
B1B2B3B4
MethodBackbonePAPAPAPA
FCNResNet-5076.277.977.876.2
ResNet-10178.879.979.878.6
PSPResNet-5079.780.980.779.5
ResNet-10180.681.681.580.5
UPerNetResNet-5079.280.580.479.2
ReNets-10180.281.481.280.1
Next82.283.283.182.0
VIT-B82.182.982.881.8
DEIT81.882.782.781.8
SWIN-B82.082.782.781.7
SegmenterVIT-B83.283.783.582.9
Contrast is an important parameter of image quality, and it is very important to study its effects. We have generated four levels of contrast (examples shown in Figure 6) and their corresponding average pixel accuracy for each of the models is reported in Table VI. For poorer images (C1), CNN-based methods show a performance drop compared to pristine images in Table I and the transformer based models appear to be more robust.
Figure 6.
Example from the generated dataset with the different contrast mentioned in the labels.
Table VI.
Average Pixel Classification Accuracy (PA) for different contrast levels for images in the dataset.
C1C2C3C4
MethodBackbonePAPAPAPA
FCNResNet-5073.677.877.876.9
ResNet-10176.879.879.879.0
PSPResNet-5077.980.880.880.0
ResNet-10179.181.581.580.8
UPerNetResNet-5077.180.480.579.7
ResNet-10178.581.381.380.5
Next81.883.183.282.6
VIT-B81.782.982.982.3
DEIT81.782.782.882.3
SWIN-B81.382.682.982.3
SegmenterVIT-B83.083.683.783.2
4.5
Colour Gamut
Colour gamut [55] comprises the total subset of colors that the display device can represent and is one of the key considerations in imaging and display technologies. We conducted a set of experiments on different synthetically generated images using ICC profiles (www.color.org) which are generally used for printing, and ran semantic segmentation models on these images. An example of a newsprint gamut is shown in Figure 7. Table VII shows that the newspaper gamut has a reducing effect on the performance of semantic segmentation networks. This modification shows that there is a distribution shift between the training and testing data. The ResNet-50 FCN network has the greatest performance drop, and even transformer-based models have shown a performance drop. Similar trends were observed in experiments carried out on other print gamuts.
Figure 7.
Example from the generated dataset with the different gamut mentioned in the labels.
Table VII.
Average Pixel Classification Accuracy (PA) for images with reduced (newspaper) gamut.
Gamut
MethodBackbonePA
FCNResNet-5072.6
ResNet-10176.2
PSPResNet-5077.6
ResNet-10178.4
UPerNetResNet-5076.9
ResNet-10178.3
Next81.3
VIT-B81.6
DEIT81.3
SWIN-B80.2
SegmenterVIT-B82.8
4.6
ISO Noise
To study the sensitivity of semantic segmentation to image sensor noise, we have created two subsets with two levels of noise where level 2 indicates the presence of more noise than level 1 as shown in Figure 8. The ISO noise model is based on the Poisson distribution implemented in the Albumentations [56] library (https://albumentations.ai/docs/api_reference/augmentations/transforms/.) From Table VIII we can infer that, as expected, there is a difference in the performance of semantic segmentation when the level of noise increases. Transformer-based methods are more immune to noise compared to competitors with ResNet-50 and ResNet-101 backbones. The accuracy of the ConvNeXt architecture backbone-based UPerNet is on par with transformer-based architectures. The level 2 images are visibly more distorted than level 1 images and the experimental results also show that the models perform worse on noisier images.
Figure 8.
Example from the generated dataset with two levels of ISO Noise (Level 2 more noisier than Level 1).
Table VIII.
Average Pixel Classification Accuracy (PA) with Two Levels of ISO Noise.
Level 1Level 2
MethodBackbonePAPA
FCNResNet-5077.675.3
ResNet-10179.677.7
PSPResNet-5080.678.8
ResNet-10181.379.4
UPerNetResNet-5080.378.7
ResNet-10181.179.5
Next83.182.0
VIT-B82.881.9
DEIT82.681.5
SWIN-B82.681.0
SegmenterVIT-B83.683.1
4.7
Colour Modification
We included two colour modifications that converted the colour image to grayscale and sepia (refer Figure 9) using the Albumentations. The main aim was to check how these semantic segmentation methods behave in the absence of colour information.
Figure 9.
Example from the generated dataset with grayscale and sepia mentioned in the labels.
Table IX.
Average Pixel Classification Accuracy (PA) for a subset of images with sepia and grayscale filters.
SepiaGray
MethodBackbonePAPA
FCNResNet-5070.970.7
ResNet-10175.774.3
PSPResNet-5077.476.6
ResNet-10178.177.4
UPerNetResNet-5076.676.1
ResNet-10178.077.4
Next80.281.0
VIT-B79.880.9
DEIT78.979.5
SWIN-B78.879.4
SegmenterVIT-B81.682.3
Table IX shows that colour information plays a significant information in computer vision tasks. There is a significant decrease in accuracy for CNN-based networks, and transformer-based methods have also shown a performance dip. The patch-based nature of transformers most likely have enabled them to perform better in comparison with CNN based models.
5.
Results and Discussion
In this paper, we have created synthetic data based on the popular ADE20K dataset to understand the impact of colour information and quality parameters on the performance of semantic segmentation networks. We have performed comparative studies ranging from basic FCNs to the most recent state-of-the-art transformer-based methods and attempted to get an estimate of robustness of these methods under different colour and image quality-based modifications. The general observation is that transformer-based methods show more robustness to poor-quality data compared to their CNN-based counterparts. However, transformer-based methods can be difficult to train and may require significant computing resources. There has to be a trade-off between robustness and model complexity. One consistent observation is that pyramid scene parsing networks (PSPNets) and unified perceptual parsing networks (UPerNets) with ResNet-50 and ResNet-101 backbone perform significantly better than FCN on these backbones, with ResNet-101 outperforming ResNet-50. Colour channel information is not widely studied in the context of deep neural networks and the current era is based on deep learning, and one of the main objectives of this study is to find ways in which these modern techniques match to quality and colour parameters and motivate researchers to incorporate colour channels into the architecture engineering process. Real-time computer vision systems have been deployed in critical areas such as surgery, autonomous driving, etc. to name a few, errors in scene understanding can lead to catastrophe. To the best of our knowledge, we have listed some of the key quality and colour parameters which need to be investigated in detail by both the colour imaging and deep learning community to build safer systems.
References
1KrizhevskyA.SutskeverI.HintonG.2012Imagenet classification with deep convolutional neural networksAdv. Neural Inf. Process. Syst.25109711051097–10510.1145/3065386
2DengJ.DongW.SocherR.LiL.LiK.Fei-FeiL.Imagenet: A large-scale hierarchical image database2009 IEEE Conf. on Computer Vision and Pattern Recognition2009IEEEPiscataway, NJ248255248–5510.1109/CVPR.2009.5206848
3GirshickR.Fast r-cnnProc. IEEE Int’l. Conf. on Computer Vision2015IEEEPiscataway, NJ144014481440–810.1109/ICCV.2015.169
4RedmonJ.DivvalaS.GirshickR.FarhadiA.You only look once: Unified, real-time object detectionProc. IEEE Conf. on Computer Vision and Pattern Recognition2016IEEEPiscataway, NJ779788779–8810.1109/CVPR.2016.91
5RonnebergerO.FischerP.BroxT.U-net: Convolutional networks for biomedical image segmentationInt’l. Conf. on Medical Image Computing and Computer-assisted Intervention2015SpringerCham234241234–4110.1007/978-3-319-24574-4_28
6JégouS.DrozdzalM.VazquezD.RomeroA.BengioY.The one hundred layers tiramisu: Fully convolutional densenets for semantic segmentationProc. IEEE Conf. on Computer Vision and Pattern Recognition Workshops2017IEEEPiscataway, NJ111911–910.1109/CVPRW.2017.156
7RajpalS.SadhyaD.DeK.RoyP. P.RamanB.Eai-net: Effective and accurate iris segmentation networkInt’l. Conf. on Pattern Recognition and Machine Intelligence2019SpringerCham442451442–5110.1007/978-3-030-34869-4_48
8OsadebeyM.AndersenH. K.WaalerD.FossaaK.MartinsenA.PedersenM.2021Three-stage segmentation of lung region from ct images using deep neural networksBMC Med. Imaging211191–1910.1186/s12880-021-00640-1
9ZhouB.ZhaoH.PuigX.XiaoT.FidlerS.BarriusoA.TorralbaA.2019Semantic understanding of scenes through the ade20k datasetInt. J. Comput. Vis.127302321302–2110.1007/s11263-018-1140-0
10HeK.ZhangX.RenS.SunJ.Deep residual learning for image recognitionProc. IEEE Conf. on Computer Vision and Pattern Recognition2016IEEEPiscataway, NJ770778770–810.1109/CVPR.2016.90
11HuangG.LiuZ.Van Der MaatenL.WeinbergerK. Q.Densely connected convolutional networksProc. IEEE Conf. on Computer Vision and Pattern Recognition2017IEEEPiscataway, NJ470047084700–810.1109/CVPR.2017.243
12SimonyanK.ZissermanA.Very deep convolutional networks for large-scale image recognition. arXiv Preprint arXiv:1409.1556, (2014)
13SzegedyC.LiuW.JiaY.SermanetP.ReedS.AnguelovD.ErhanD.VanhouckeV.RabinovichA.Going deeper with convolutionsProc. IEEE Conf. on Computer Vision and Pattern Recognition2015IEEEPiscataway, NJ191–910.1109/CVPR.2015.7298594
14TanM.LeQ.Efficientnet: Rethinking model scaling for convolutional neural networksInt’l. Conf. on Machine Learning2019PMLRLong Beach, California610561146105–1410.48550/arXiv.1905.11946
15LiuZ.MaoH.WuC.-Y.FeichtenhoferC.DarrellT.XieS.A convnet for the 2020s. arXiv Preprint arXiv:arXiv:2201.03545, (2022)
16DosovitskiyA.BeyerL.KolesnikovA.WeissenbornD.ZhaiX.UnterthinerT.DehghaniM.MindererM.HeigoldG.GellyS.UszkoreitSJ.An image is worth 16×16 words: Transformers for image recognition at scale. arXiv Preprint arXiv:arXiv:2010.11929, (2020)
17LiuZ.LinY.CaoY.HuH.WeiY.ZhangZ.LinS.GuoB.Swin transformer: Hierarchical vision transformer using shifted windowsProc. IEEE/CVF Int’l. Conf. on Computer Vision2021IEEEPiscataway, NJ100121002210012–2210.1109/ICCV48922.2021.00986
18TouvronH.CordM.DouzeM.MassaF.SablayrollesA.JégouH.Training data-efficient image transformers & distillation through attentionInt’l. Conf. on Machine Learning2021PMLRCambridge, MA103471035710347–57
19DodgeS.KaramL.Understanding how image quality affects deep neural networks2016 Eighth Int’l. Conf. on Quality of Multimedia Experience (QoMEX)2016IEEEPiscataway, NJ161–610.1109/QoMEX.2016.7498955
20DodgeS.KaramL.A study and comparison of human and deep learning recognition performance under visual distortions2017 26th Int’l. Conf. on Computer Communication and Networks (ICCCN)2017IEEEPiscataway, NJ171–710.1109/ICCCN.2017.8038465
21RoyP.GhoshS.BhattacharyaS.PalU.Effects of degradations on deep neural network architectures. arXiv Preprint arXiv:1807.10108 (2018)
22HendrycksD.DietterichT.Benchmarking neural network robustness to common corruptions and perturbationsProc. of the Int’l. Conf. on Learning Representations2019arXiv Preprint arXiv:1903.12261
23HendrycksD.ZhaoK.BasartS.SteinhardtJ.SongD.Natural adversarial examples.Proc. IEEE/CVF Int’l. Conf. on Computer Vision2021IEEEPiscataway, NJ152621527115262–7110.1109/CVPR46437.2021.01501
24HendrycksD.BasartS.MuN.KadavathS.WangF.DorundoE.DesaiR.ZhuT.ParajuliS.GuoM.SongD.The many faces of robustness: A critical analysis of out-of-distribution generalization.Proc. IEEE/CVF Int’l. Conf. on Computer Vision2021IEEEPiscataway, NJ834083498340–910.1109/ICCV48922.2021.00823
25ZhangR.IsolaP.EfrosA.Colorful image colorizationEuropean Conf. on Computer Vision2016SpringerCham649666649–6610.1007/978-3-319-46487-9_40
26NazeriK.NgE.EbrahimiM.Image colorization using generative adversarial networksInt’l. Conf. on Articulated Motion and Deformable Objects2018SpringerCham859485–9410.1007/978-3-319-94544-6_9
27HalderS.DeK.RoyP.Perceptual conditional generative adversarial networks for end-to-end image colourizationAsian Conf. on Computer Vision2018SpringerCham269283269–8310.1007/978-3-030-20890-5_18
28ShopovskaI.JovanovL.PhilipsW.Rgb-nir demosaicing using deep residual u-net2018 26th Telecommunications Forum (TELFOR)2018IEEEPiscataway, NJ141–410.1109/TELFOR.2018.8611819
29LuoJ.WangJ.2020Image demosaicing based on generative adversarial networkMath. Proble. Eng.202010.1155/2020/7367608
30BuhrmesterV.MünchD.BulatovD.ArensM.Evaluating the impact of color information in deep neural networksIberian Conf. on Pattern Recognition and Image Analysis2019SpringerCham302316302–1610.1007/978-3-030-31332-6_27
31DeK.PedersenM.Impact of colour on robustness of deep neural networksProc. IEEE/CVF Int’l. Conf. on Computer Vision2021IEEEPiscataway, NJ213021–3010.1109/ICCVW54120.2021.00009
32FlachotA.GegenfurtnerK.2021Color for object recognition: hue and chroma sensitivity in the deep features of convolutional neural networksVis. Res.1828910089–10010.1016/j.visres.2020.09.010
33DeK.PedersenM.Effect of hue shift towards robustness of convolutional neural networksProc. IS&T Electronic Imaging: Color Imaging XXVII: Displaying, Processing, Hardcopy, and Applications2022IS&TSpringfield, VA156-1156-6156-1–610.2352/EI.2022.34.15.COLOR-156
34DengY.ManjunathB.ShinH.Color image segmentationProc. 1999 IEEE Computer Society Conf. on Computer Vision and Pattern Recognition19992IEEEPiscataway, NJ446451446–5110.1109/CVPR.1999.784719
35ChenT. Q.LuY.2002Color image segmentation–an innovative approachPattern Recognit.35395405395–40510.1016/S0031-3203(01)00050-4
36KhattabD.EbiedH. M.HusseinA. S.TolbaM. F.2014Color image segmentation based on different color space models using automatic grabcutSci. World J.201410.1155/2014/126025
37KantipudiJ.DubeyS. R.ChakrabortyS.2020Color channel perturbation attacks for fooling convolutional neural networks and a defense against such attacksIEEE Trans. Artif. Intell.1181191181–9110.1109/TAI.2020.3046167
38GeirhosR.RubischP.MichaelisC.BethgeM.WichmannF. A.BrendelW.Imagenet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustnessProc. Int’l. Conf. on Learning Representations2019ICLRNew Orleans, LA10.48550/arXiv.1811.12231
39PapernotN.McDanielP.JhaS.FredriksonM.CelikZ. B.SwamiA.The limitations of deep learning in adversarial settings2016 IEEE European Symposium on Security and Privacy (EuroS&P)2016IEEEPiscataway, NJ372387372–8710.1109/EuroSP.2016.36
40CarliniN.WagnerD.Towards evaluating the robustness of neural networks2017 IEEE Symposium on Security and Privacy (sp)2017IEEEPiscataway, NJ395739–5710.1109/SP.2017.49
41TaoriR.DaveA.ShankarV.CarliniN.RechtB.SchmidtL.2020Measuring robustness to natural distribution shifts in image classification.Adv. Neural Inf. Process. Syst.33185831859918583–9910.48550/arXiv.2007.00644
42RauberJ.ZimmermannR.BethgeM.BrendelW.2020Foolbox native: Fast adversarial attacks to benchmark the robustness of machine learning models in pytorch, tensorflow, and jaxJ. Open Source Softw.5260710.21105/joss.02607
43KamannC.RotherC.Benchmarking the robustness of semantic segmentation modelsProc. IEEE/CVF Conf. on Computer Vision and Pattern Recognition2020IEEEPiscataway, NJ882888388828–3810.1109/CVPR42600.2020.00885
44MinaeeS.BoykovY.PorikliF.PlazaA. J.KehtarnavazN.TerzopoulosD.Image segmentation using deep learning: A surveyIEEE Trans. on Pattern Analysis and Machine Intelligence2021IEEEPiscataway, NJ10.1109/TPAMI.2021.3059968
45ZhouB.ZhaoH.PuigX.FidlerS.BarriusoA.TorralbaA.Scene parsing through ade20k datasetProc. IEEE Conf. on Computer Vision and Pattern Recognition2017IEEEPiscataway, NJ633641633–4110.1109/CVPR.2017.544
46MMSegmentation Contributors. MMSegmentation: Openmmlab semantic segmentation toolbox and benchmark. https://github.com/open-mmlab/mmsegmentation, 2020
47LongJ.ShelhamerE.DarrellT.Fully convolutional networks for semantic segmentationProc. IEEE Conf. on Computer Vision and Pattern Recognition2015IEEEPiscataway, NJ343134403431–4010.1109/CVPR.2015.7298965
48ZhaoH.ShiJ.QiX.WangX.JiaJ.Pyramid scene parsing networkProc. IEEE Conf. on Computer Vision and Pattern Recognition2017IEEEPiscataway, NJ288128902881–9010.1109/CVPR.2017.660
49XiaoT.LiuY.ZhouB.JiangY.SunJ.Unified perceptual parsing for scene understandingProc. European Conf. on Computer Vision (ECCV)2018SpringerCham418434418–3410.48550/arXiv.1807.10221
50StrudelR.GarciaR.LaptevI.SchmidC.Segmenter: Transformer for semantic segmentationProc. IEEE/CVF Int’l. Conf. on Computer Vision2021IEEEPiscataway, NJ726272727262–7210.1109/ICCV48922.2021.00717
51SuralS.QianG.PramanikS.Segmentation and histogram generation using the hsv color space for image retrievalProc. Int’l. Conf. on Image Processing2002Vol. 2IEEEPiscataway, NJIIIIII–10.1109/ICIP.2002.1040019
52SmithA. R.1978Color gamut transform pairsACM Siggraph Comput. Graph.12121912–910.1145/965139.807361
53PoyntonC. A.A Technical Introduction to Digital Video1996John Wiley & Sons, Inc.175
54PapakiposZ.BittonJ.AugLy: Data augmentations for adversarial robustnessProc. IEEE/CVF Conf. on Computer Vision and Pattern Recognition2022IEEEPiscataway, NJ156163156–6310.48550/arXiv.2201.06494
55FarupI.GattaC.RizziA.2007A multiscale framework for spatial gamut mappingIEEE Trans. Image Process.16242324352423–3510.1109/TIP.2007.904946
56BuslaevA.IglovikovV. I.KhvedchenyaE.ParinovA.DruzhininM.KalininA.2020Albumentations: fast and flexible image augmentationsInformation1112510.3390/info11020125