Color Conversion in Deep Autoencoders

Arash Akbarinia; Raquel Gil-Rodríguez

doi:10.2352/J.Percept.Imaging.2021.4.2.020401

Abstract

While RGB is the status quo in machine vision, other color spaces offer higher utility in distinct visual tasks. Here, the authors have investigated the impact of color spaces on the encoding capacity of a visual system that is subject to information compression, specifically variational autoencoders (VAEs) with a bottleneck constraint. To this end, they propose a framework—color conversion—that allows a fair comparison of color spaces. They systematically investigated several ColourConvNets, i.e. VAEs with different input–output color spaces, e.g. from RGB to CIE L^∗a^∗b^∗ (in total five color spaces were examined). Their evaluations demonstrate that, in comparison to the baseline network (whose input and output are RGB), ColourConvNets with a color-opponent output space produce higher quality images. This is also evident quantitatively: (i) in pixel-wise low-level metrics such as color difference (ΔE), peak signal-to-noise ratio (PSNR) and structural similarity index measure (SSIM); and (ii) in high-level visual tasks such as image classification (on ImageNet dataset) and scene segmentation (on COCO dataset) where the global content of reconstruction matters. These findings offer a promising line of investigation for other applications of VAEs. Furthermore, they provide empirical evidence on the benefits of color-opponent representation in a complex visual system and why it might have emerged in the human brain.

jpi

Journal of Perceptual Imaging

J. Percept. Imaging

2575-8144

Society for Imaging Science and Technology

jpi0153

10.2352/J.Percept.Imaging.2021.4.2.020401

0153

Regular Articles

Color Conversion in Deep Autoencoders

Color conversion in deep autoencoders

AkbariniaArash

Gil-RodríguezRaquel

Department of Experimental Psychology, Justus-Liebig University, D-35394, Giessen, Germany

arash.akbarinia@psychol.uni-giessen.de

Akbarinia and Gil-Rodríguez

032021

020401-1

020401-10

1452021

1102021

2021

Abstract

While RGB is the status quo in machine vision, other color spaces offer higher utility in distinct visual tasks. Here, the authors have investigated the impact of color spaces on the encoding capacity of a visual system that is subject to information compression, specifically variational autoencoders (VAEs) with a bottleneck constraint. To this end, they propose a framework—color conversion—that allows a fair comparison of color spaces. They systematically investigated several ColourConvNets, i.e. VAEs with different input–output color spaces, e.g. from RGB to CIE L∗a∗b∗ (in total five color spaces were examined). Their evaluations demonstrate that, in comparison to the baseline network (whose input and output are RGB), ColourConvNets with a color-opponent output space produce higher quality images. This is also evident quantitatively: (i) in pixel-wise low-level metrics such as color difference (ΔE), peak signal-to-noise ratio (PSNR) and structural similarity index measure (SSIM); and (ii) in high-level visual tasks such as image classification (on ImageNet dataset) and scene segmentation (on COCO dataset) where the global content of reconstruction matters. These findings offer a promising line of investigation for other applications of VAEs. Furthermore, they provide empirical evidence on the benefits of color-opponent representation in a complex visual system and why it might have emerged in the human brain.

ccc

2575-8144/2021/4(2)/020401/10/$00.00

printed

Printed in the USA

Introduction

Color is an inseparable component of our conscious visual perception with an objective utility spanning over a large set of tasks such as object recognition and scene segmentation [8]. Consequently, color has become a ubiquitous feature in machine vision and image processing. Currently, state of the art and practice in these fields are being dominated by deep learning methods. Thus, progress in these lines requires a better understanding of the networks’ underlying mechanism [3] and the color representation learned by them.

The human color vision is a result of three types of cone photoreceptors present in the retina [6]. Thus, models of color perception become defined in a three-dimensional space. In theory, an infinite number of color spaces could be formulated and indeed several of them exist in the literature and industry [55]. RGB color sensors are the standard in off-the-shelf commercial cameras. This makes the RGB color space widely used in computer vision and deep learning applications. We are interested to know whether the choice of color representation influences the capacity of deep networks in visual information processing. This is a generic endeavor not targeted toward a specific application. A common real-world physical restriction to all applications is the bottleneck in information transmission. Hence, autoencoders are a perfect tool to study this question given their objective is simply efficient coding under a similar constraint [50].

To this end, we propose the color conversion framework, in which the input–output color spaces are explicitly imposed on deep autoencoders (referred to as ColourConvNets). ColourConvNets learn to compress the visual information in their bottleneck while transforming the input to output. Essentially, the output y for input image x is generated on the fly by a transformation y = T(x), where T maps input to output. Color conversion offers a framework to fairly compare the effect of color spaces in a complex visual system that is driven by optimization. Here, we study the choice of color conversion on the quality of reconstructed images, which is an indication of whether the representation of input–output color spaces impacts the network’s encoding power.

In this work, we focused on Vector Quantized Variational Autoencoder (VQ-VAE) [52] due to the discrete nature of its latent space. We thoroughly studied five commonly used color spaces by training ColourConvNets for all combinations of input–output spaces. First, we show that ColourConvNets with a decorrelated output color space (e.g. CIE L∗a∗b) convey information more efficiently in their compressing bottleneck, in line with the presence of color opponency in the human visual system [5]. This is evident qualitatively (Figure 7) and quantitatively (evaluated with three low-level and two high-level metrics). We further discuss a potential explanation at the level of embedding vectors linking it to the histogram equalization technique [41] and the efficient coding theory [4].

Related Work

Various color spaces have been explored in classical computer vision to boost the performance of algorithms. Color-opponent spaces (e.g. CIE L∗a∗b∗) have been extensively used in applications of image retrieval [42], color constancy [1], color stabilization [19], color transfer [43], color naming [40], texture classification [7], edge detection [2] to name a few. Combinations of intensity, saturation and hue (e.g. HSV) have also been shown effective in applications such as object recognition [18], skin classification [21], and object tracking [39]. In general, the fusion of color spaces is reported to create an optimal feature detector [49].

Figure 1.

Left: Exemplary conversions across different color spaces. Right: The schematic view of VQ-VAE ColourConvNets.

In comparison to the classical approaches, the utility of color spaces in deep neural networks (DNNs) is understudied. Initial work suggested that non-RGB color spaces do not boost the performance in the ImageNet dataset [36]. Contrary to this, a fusion of three color spaces (RGB, HSV and CIE L∗a∗b∗) has improved retinal medical imaging [16]. Similarly, a multi-channel architecture combining three color spaces (RGB, HSV and YCbCr) has been proposed for face identification [29]. Integrating six networks of different color spaces has also been successfully applied to traffic light recognition [23]. In addition to this, color spaces in which luminance and chromatic information have separate channels (e.g. YUV) are in particular helpful in applications such as picture colorization [30] and style transfer [35]. Last but not least, the prediction of luminance from chromatic planes and vice versa has been explored in unsupervised learning [57].

Color spaces have also been a topic of research in the efficient coding literature. The choice of color space influences the degree of image compression and efficient representation [48]. This has made color conversion a standard technique in image compression. In certain on-board systems (e.g. Mars Exploration Rover) the extra computational cost of finding an optimal space for a set of images is justified [56]. Consequently, modern image file formats allow for color-space information to be stored in their metadata [44]. In the case of the commonly used JPEG image compression, it has been specifically shown that RGB is the least and CIE L∗a∗b∗ is the most optimal color space [37]. Correspondingly, classical learning-based methods of image compression also use opponency color spaces (i.e. one luminance and two chromatic channels) [9]. To the best of our knowledge, this finding has not been thoroughly examined in modern deep autoencoders [24]. As opposed to classical approaches, current compression studies rely on the encoder capabilities [51], without applying any previous color transformation. In this article, we aim to break this gap by systematically comparing color spaces in the context of deep autoencoders.

Color Converting Autoencoders

In this article, we propose a novel unsupervised task of color conversion: the network’s output color space is independent of its input (see Figure 1). This is inspired by the human visual system, in which the sensory and perceptual systems work in different color spaces. The input to our visual system is triggered by photoreceptors in the back of the retina. Hence, the sensory system is defined in the LMS color space [17]. Before reaching the cortex, this signal is transformed into a cone-opponent space by the opponent cells present in the retina and the lateral geniculate nucleus (LGN) [12]. Behavioral studies suggest that yet another color-opponent space shapes our perceptual system [54]. Last but not least, it has been argued that the current color spaces cannot fully explain the dimension of hue in which colors and objects are associated [26]. This collection of studies in the literature suggests that our visual system functions with different color spaces for distinct goals. A similar observation can be made for machine vision. While the sensory system is in the RGB color space (the input to the system), alternative spaces might be more efficient for other purposes.

A color space is an arbitrary definition of colors’ organization in space [27]. Thus, the choice of transformation matrix T in ColourConvNets is perfectly flexible to model any desired space,

(1)

C_{i n} \to_{}^{T} C_{o u t},

where

C_{i n}

and

C_{o u t}

are the input and output color spaces. This framework offers a controlled environment to compare color spaces within a complex visual system. Here, we compared them in an information encoding network that is constrained to a bottleneck. This loosely corresponds to the need for signal compression in the human visual system due to present physical constraints. An extension of the proposed framework can encompass other constraints (such as entropy, energy, wiring, etc.) relevant to understanding color representation in complex visual systems. This structure can be further used to compare the autoencoder’s latent space across color spaces aiming to decipher the intermediate color representation within these networks [14]. The proposed framework can also be employed in applications, e.g., as an add-on optimization capsule to any computer vision application [38], or as a proxy task for visual understanding [30].

3.1

Networks

One fundamental property of neural activity in biological brains is “all-or-none” [22]. This, in turn, strengthens the argument of discrete representation [34]. Hence, we studied a particular class of VAEs—Vector Quantized Variational Autoencoder (VQ-VAE) [52]—due to the discrete nature of its latent embedding space, which distinguishes it from other regimes [24].

Figure 2.

Evolution of losses for VQ-VAEs of K = 8 and D = 128. In each panel, the ColourConvNets have the same output space. Across panels, curves of the same color have the same input space.

VQ-VAE consists of three main components (see the right panel in Fig. 1):

An encoder f(x) that processes the input data x to ze(x) by non-linear operations;

A embedding space {e}

\in R^{K \times D}

, with K vectors of dimensionality D, mapping the continuous ze(x) onto a sequence of discrete latent variables zq(x) by estimating the nearest vector ei to ze(x);

A decoder g(e) that reconstructs the final output x’ with a distribution p(x|zq(x)) over the input data.

The loss function

L

is defined as follows,

(2)

\begin{matrix} L = ∥ y - g (e) ∥_{2}^{2} + ∥ s g [f (x)] - e ∥_{2}^{2} + β ∥ f (x) - s g [e] ∥_{2}^{2}, \end{matrix}

where y is the target image (i.e. x in the output color space); sg denotes the stop gradient computation that is defined as the identity during the forward propagation with zero partial derivatives during the backpropagation to refrain its update.

The first term in Eq. (2) corresponds to the quality of the reconstruction image by jointly updating encoder and decoder. The other two terms align the embedding vector with the encoder output. The second term only updates the latent variables (embedding vectors). The third term only updates to the encoder. The hyperparameter

β \in R

regulates the degree of change for the encoder output. Without a hyperparameter search, we set β = 0.5 in all conducted experiments.

3.2

Color Spaces

We explored five color spaces: RGB, LMS, CIE L∗a∗b∗, DKL and HSV. The standard space in digital imaging is RGB that represents colors by three additive primaries in a cubic shape. The LMS color space corresponds to the sensitivity function of cones in the human eye (long, middle, and short wavelengths) [17]. The CIE L∗a∗b∗ color space (luminance, red-green and yellow-blue axes) is designed to be perceptually uniform [10]. The DKL color space (Derrington–Krauskopf–Lennie) models the opponent responses of rhesus monkeys in the early visual system [12]. The HSV color space (hue, saturation, and value) is a cylindrical representation of the RGB cube designed by computer graphics.

The input–output to our networks can be in any combination of these color spaces. Effectively, our VQ-VAE models, in addition to learning efficient representation, must learn the transformation function from their input to output color space. It is worth considering that the original images in explored datasets are in the RGB format. Therefore, one might expect a slight positive bias toward this color space given its gamut defines the limits of other color spaces.

Experiments

4.1

Training Procedure

We trained several instances of VQ-VAEs with distinct sizes of embedding space

{e} \in R^{K \times D}

. The training procedure was identical for all networks: trained with Adam optimizer (lr = 2 ⋅ 10−4) for 90 epochs. To isolate the influence of random variables, all networks were initialized with the same set of weights and an identical random seed was used throughout all experiments. We used the ImageNet dataset [13] for training. This is a visual database of object recognition in real-world images, divided into one thousand categories. The training set contains 1.3M images. At each epoch, a subset of 100K samples was exposed to networks. Input images were of size 224 × 224 in three color channels. Figure 2 reports the progress of loss function for all ColourConvNets with an embedding space of size

{e} \in R^{8 \times 128}

. A similar pattern of convergence can be observed in all trained networks suggesting that the optimization is a fair comparison across different input–output color spaces.

4.2

Evaluation Protocol

To increase the generalization power of our findings, we evaluated all networks (without any fine-tuning) on the validation set of three benchmark datasets: ImageNet (50K images), COCO (5K images), and CelebA (∼20K images). COCO is a large-scale dataset for object detection and scene segmentation in natural images [32]. CelebA contains facial attributes of celebrities [33]. The types of images in CelebA dataset (close-up faces) rarely appear in the train set of our networks (i.e. ImageNet). We relied on two classes of evaluation: low level, capturing the local statistics of an image; high level, assessing the global content of an image (For reproduction, the source code and experimental data are available:https://github.com/ArashAkbarinia/DecomposeNet).

4.2.1

Low-level Evaluation

We computed three commonly used metrics to measure the pixel-wise performance of networks: (i) the color difference CIEDE2000 (ΔE) [47], (ii) peak signal-to-noise ratio (PSNR), and (iii) structural similarity index measure (SSIM) [53]. These metrics are often used in the literature of image quality assessment. Lower values of ΔE and higher values of PSNR and SSIM indicate better performance.

4.2.2

High-level Evaluation

Pixel-wise measures are unable to capture the global content of an image and whether semantic information remains perceptually intact. To account for this limitation, we performed a procedure similar to the standard Inception Score [46]: feeding the reconstructed images into two pretrained networks (without fine-tuning) that perform the task of object classification, ResNet50 [20], and scene segmentation, Feature Pyramid Network—FPN [25]. ResNet50 and FPN expect RGB inputs, thus non-RGB reconstructed images were converted to RGB. The evaluation for ResNet50 is the classification accuracy on the ImageNet dataset. The evaluation for FPN is the intersection over union (IoU) on the COCO dataset.

Figure 3.

Low-level evaluation for embedding spaces of different sizes. Lower values of color difference and higher values of PSNR and SSIM indicate higher quality of the reconstruction.

4.3

Embedding Size

We first evaluated the influence of embedding size for four regimes of ColourConvNets whose input color space is the original RGB images. The low-level evaluation for the ImageNet and COCO datasets is reported in Figure 3. The most noticeable data point (in all three metrics) is the poor performance of rgb2hsv with embedding space

{e} \in R^{8 \times 8}

. This might be due to the circular nature of the hue information that cannot be adequately encoded with low-dimensional vectors (i.e. D = 8). For the smallest and the largest embedding space, we observe no significant differences between the four networks. However, for intermediate embedding spaces (i.e. 8 × 8 and 8 × 128) an advantage appears for networks whose outputs are opponent color spaces (DKL and CIE L∗a∗b).

The corresponding high-level evaluation is reported in Figure 4. The overall trend is much alike for both tasks. The lowest performance occurs for rgb2hsv across all embedding spaces. ColourConvNets with an opponent output color space systematically perform better than rgb2rgb, with an exception for the largest embedding space (128 × 128) where they are on a par with each other (despite the substantial compression, 70% top-1 accuracy on ImageNet and 60% IoU on COCO). The comparison of low- and high-level evaluation for the smallest embedding space (4 × 128) demonstrates the importance of high-level evaluation. Although in the low-level metrics the four networks perform similarly, in the high-level metrics a large difference appears among them (compare Fig. 4 versus Fig. 3). The classification and segmentation performance is substantially influenced by the choice of color space. Overall, the results of the embedding size experiment suggest that when physical constraints demand heavy compression (i.e. narrow bottleneck) rgb2lab and rgb2dkl autoencoders better preserve the semantic content of images.

Figure 4.

High-level visual task evaluation. ResNet50’s classification accuracy on reconstructed images of ImageNet and FPNS’s segmentation IoU on reconstructed images of COCO.

Noise reduction is a primary application of autoencoders. Essentially the imposed bottleneck enforces the system to ignore insignificant information. Correspondingly, we tested all networks after adding different degrees of salt-and-pepper noise to the input images. The ColourConvNets with an opponent output space systematically outperformed the baseline in this experiment as well. While this does not explicitly indicate better noise reduction in these networks, it demonstrates that their efficiency is generalized to out-of-the-distribution conditions.

4.4

Pairwise Comparison

For the two embedding spaces 8 × 8 and 8 × 128 we conducted an exhaustive pairwise comparison across two regimes of color spaces: sensory (RGB and LMS) versus opponency (DKL and CIE L∗a∗b). The HSV color space was excluded due to the aforementioned reason. Figure 5 presents the low-level evaluation. ColourConvNets with an opponent output space clearly perform better across all measures and datasets. Specifically, in comparison to the baseline (the rgb2rgb network) both rgb2lab and rgb2dkl obtain substantially lower color differences, and higher PSNRs and SSIMs.

The comparison of rows and columns in Fig. 5 suggests that the quality of compression is more influenced by the output color space in comparison to the input. Specifically, the poor performance of networks whose output space is LMS is noticeable. This can be explained by the high correlation between different channels of the LMS. Essentially, ColourConvNets struggle to accurately decode each of those channels separately. This problem does not occur when LMS is the input color space.

The pairwise high-level evaluation is reported in Figure 6. In agreement to previous findings, the rgb2lab network performs best across both datasets and embedding spaces. Overall, ColourConvNets with an opponent output space show a clear advantage: rgb2lab and rgb2dkl obtain 5–7% higher accuracy and IoU with respect to the baseline (the rgb2rgb to network).

Figure 5.

Low-level pairwise comparison in four color spaces. Figures are averaged over two embedding spaces 8 × 8 and 8 × 128. Lower values of color difference and higher values of PSNR and SSIM indicate higher quality of the reconstruction. The cells are color-coded accordingly.

Figure 6.

High-level pairwise comparison in four color spaces: sensory (RGB and LMS) and opponency (DKL and CIE L∗a∗b).

4.5

Qualitative Comparison

In addition to the quantitative evaluations reported in the previous section, the advantage of utilizing a decorrelated output color space can be appreciated qualitatively. In Figure 7, we have illustrated five representative examples, four images from the dataset of natural scenes and one image from the faces. The Jupyter-Notebook scripts in our GitHub provide more examples and can be executed for user-input images. Overall, the perceptual quality of the image reconstruction in ColourConvNets with an opponent output space (rgb2dkl and rgb2lab) is visibly higher than the baseline rgb2rgb. For instance, in the first row of Fig. 7, the rgb2rgb output contains a large number of artifacts on walls and ceilings. In contrast, the output of rgb2dkl and rgb2lab are sharper. This qualitative difference can also be appreciated on the cabinets of the second row and glasses of the third row (it is best seen in the digital format with full resolution). It is challenging to quantify the types of natural scenes with the greatest advantage for color opponency. This might be better addressed in a more controlled dataset where images are generated from a set of predefined reflectance spectra. Nevertheless, we observed a more prominent effect under two conditions. First, in uniform regions many times the rgb2rgb network appears to greatly suffer. For instance, this is evident from the blue sky in the fourth row of Fig. 7. Second, in many instances the rgb2rgb fails to faithfully reproduce the color of an object (see the red cloth in the last row).

Figure 7.

Qualitative comparison of three ColourConvNets (VQ-VAE of K = 8 and D = 128). The first column is the networks’ input and the other columns their corresponding outputs. The output images of rgb2dkl and rgb2lab have been converted to the RGB color space for visualization purposes. The artifacts in rgb2rgb are clearly more visible in comparison to the other ColourConvNets.

Performance Advantage

The main difference between the two regimes of color spaces (sensory versus opponency) is their intra-axes correlation. In other words, the extent of information independence in each of the three channels. The intra-axes correlation for LMS and RGB is very high, hence referred to as correlated color spaces. On the contrary, the intra-axes correlations for CIE L∗a∗b∗ and DKL is very low, hence referred to as decorrelated color spaces. We computed these correlations r in all images of ImageNet dataset (100 random pixels per image). RGB: rRG ≈ 0.90, rRB ≈ 0.77, rGB ≈ 0.89; LMS: rLM ≈ 1.00, rLS ≈ 0.93, rMS ≈ 0.93; L∗a∗b∗: rL∗a∗≈−0.14, rL∗b∗≈ 0.13, ra∗b∗≈−0.34; DKL: rDK ≈ 0.01, rDL ≈ 0.14, rKL ≈ 0.61. In biological visual systems, the retinal signal is transformed to opponency before being transmitted to the visual cortex by passing through the physical bottleneck of optical nerve and LGN. This transformation has been argued to optimize the efficiency of color signal transmission in the visual system by reducing redundant information [5].

Interestingly, some works have suggested that deep networks trained to perform high-level visual tasks learn to decorrelate their inputs [45]. Here, our results show a similar phenomenon in deep autoencoders: information compression is more efficient when a network decorrelates the input signal. Contrary to this, the ImageNet classification performance was reported unaltered when input images were explicitly converted from RGB to CIE L∗a∗b∗ [36]. This might be explained by the lack of bottleneck constraint in their examined architecture, thus decorrelating color representation leads to no extra advantage. This matches the results we obtained with ColourConvNets of the largest embedding space (128 × 128), suggesting that decorrelation of color signal become beneficial when the system is constrained in its information flow.

Previous works in the literature [15] have measured the decorrelation characteristics of color-opponent spaces in information-theoretical analysis and demonstrated their effectiveness in encoding natural images. The understanding of how a complex visual system, driven by an error minimization strategy [28], might utilize these properties at the system level is of great interest. We hypothesized that an efficient system distributes its representation across all resources instead of heavily relying on a few components [31]. To measure this, we computed the histogram of embedding vectors across all images of the validation set in the ImageNet (50K) and COCO (5K) datasets. A zero standard deviation in the frequency of selected vectors means embedding vectors are equally used by the network. This can be interpreted as an indication of well-distributed feature representation in the system.

Figure 8 reports the error rate as a function of this measure. A significant correlation emerges in both datasets, suggesting a more uniform contribution of embedding vectors enhances visual encoding in VQ-VAEs. To ensure the obtained correlation is robust, we analyzed the sensitivity of this correlation by means of two methods. (i) To determine highly influential points, we performed the Cook’s Distance [11]. No points surpass the standard outlier threshold (

I_{t} = \frac{4}{n}

). (ii) We performed the Jackknife resampling technique and systematically computed the correlations after leaving out each ColourConvNet. The obtained correlations are in the range of [0.59,0.72] with an average of 0.67 ± 0.02. Overall, these analyses suggest that there is a correlation between the distribution of features among the embedding vectors and the encoding capacity of the network.

Our findings can be linked to two frameworks of histogram equalization and efficient coding. The neural model of histogram equalization follows a similar line of reasoning: the materialization of all intensity values [41]. This is achieved by explicitly minimizing a corresponding term in an objective function. This is also consistent with the efficient coding theory for the biological organisms [4], in which the system distributes its encoded representation across all response levels with an equal frequency. Here, we observe a similar phenomenon in VQ-VAEs: ColourConvNets that better materialize all their embedding vectors obtain higher quality in image compression.

Figure 8.

Error rate as a function of the distribution of features in the embedding space. A value of zero in the x-axis indicates all embedding vectors are equally used by the model. Higher values of x indicate that the model relies heavily on certain vectors.

Conclusion

We proposed the unsupervised color conversion task to investigate the efficiency of color representation in deep networks. By means of this framework, we studied the impact of color spaces on the encoding capacity of autoencoders, specifically VQ-VAEs whose feature representation is constrained by a discrete bottleneck. The comparison of several ColourConvNets exhibits advantage for a decorrelated output color space. This is evident qualitatively and measured quantitatively with five metrics. Our analysis suggests that this advantage stems from a more uniform distribution of feature representation in networks’ embedding space, which is reminiscent of efficient coding and histogram equalization in biological systems.

We propose two lines of investigation for future works. First, integrating the choice of color spaces into the optimization problem, essentially driving the network to explicitly find the most optimum color space for the visual task it is learning. This formulation allows a flexible add-on optimization capsule to any computer vision application. Second, our findings might contribute to the understanding of why the brain’s neural network has naturally evolved a particular type of color vision and perception. To better investigate this, we propose to include further biologically motivated constraints (e.g. entropy) on the network. These configurations would perhaps result in the emergence of color categories when a visual scene is being efficiently encoded.

Acknowledgment

This study was funded by Deutsche Forschungsgemeinschaft SFB/TRR 135 (grant number 222641018) TP C2. We would like to thank Matteo Toscani and Alban Flachot for their valuable feedback.

References

1AkbariniaA.ParragaC. A.2017Colour constancy beyond the classical receptive fieldIEEE Trans. Pattern Anal. Mach. Intell.40208120942081–9410.1109/TPAMI.2017.2753239

2AkbariniaA.ParragaC. A.2018Feedback and surround modulated boundary detectionInt. J. Comput. Vis.126136713801367–8010.1007/s11263-017-1035-5

3AkbariniaA.Gil RodríguezR.2020Deciphering image contrast in object classification deep networksVis. Res.173617661–7610.1016/j.visres.2020.04.015

4BarlowH. B.Possible principles underlying the transformation of sensory messagesSensory Communication1961MIT PressCambridge, UK217234217–34

5BuchsbaumG.AllanG.1983Trichromacy, opponent colours coding and optimum colour information transmission in the retinaProc. R. Soc. Lond. B2208911389–11310.1098/rspb.1983.0090

6BurnsM. E.LambT. D.16. visual transduction by rod and cone photoreceptorsVisual Neuroscience2003MIT Press215233215–33Citeseer

7CernadasE.Fernández-DelgadoM.González-RufinoE.CarriónP.2017Influence of normalization and color space to color texture classificationPattern Recognit.61120138120–3810.1016/j.patcog.2016.07.002

8ChirimuutaM.KingdomF. A. A.2015The uses of colour vision: Ornamental, practical, and theoreticalMinds Mach.25213229213–2910.1007/s11023-015-9364-z

9ClausenC.WechslerH.2000Color image compression using pca and backpropagation learningPattern Recognit.33155515601555–6010.1016/S0031-3203(99)00126-0

10CIE, “Recommendations on uniform color spaces, color-difference equations, psychometric color terms,” Paris: CIE, (1978)

11CookR. D.1977Detection of influential observation in linear regressionTechnometrics19151815–8

12DerringtonA. M.KrauskopfJ.LennieP.1984Chromatic mechanisms in lateral geniculate nucleus of macaqueThe J. Physiol.357241265241–6510.1113/jphysiol.1984.sp015499

13DengJ.DongW.SocherR.LiL.-J.LiK.Fei-FeiL.Imagenet: A large-scale hierarchical image databaseProc. IEEE Int’l. Conf. on Computer Vision and Pattern Recognition2009IEEEPiscataway, NJ248255248–55

14EngilbergeM.CollinsE.SüsstrunkS.Color representation in deep neural networksProc. IEEE Int’l. Conf. on Image Processing2017IEEEPiscataway, NJ278627902786–90

15FosterD. H.Marín-FranchI.NascimentoS.AmanoK.Coding efficiency of cie color spacesProc. IS&T/SID CIC16: Sixteenth Color Imaging Conference2008IS&TSpringfield, VA285288285–8

16FuH.WangB.ShenJ.CuiS.XuY.LiuJ.ShaoL.Evaluation of retinal image quality assessment networks in different color-spacesInt’l. Conf. on Medical Image Computing and Computer-Assisted Intervention2019485648–56

17GegenfurtnerK. R.SharpeL. A.Color Vision1999Cambridge University PressCambridge, UK

18GeversT.SmeuldersA. W.1999Color-based object recognitionPattern Recognit.32453464453–6410.1016/S0031-3203(98)00036-3

19Gil RodríguezR.Vazquez-CorralJ.BertalmíoM.2020Color matching images with unknown non-linear encodingsIEEE Trans. Image Process.29443544444435–4410.1109/TIP.2020.2968766

20HeK.ZhangX.RenS.SunJ.Deep residual learning for image recognitionProc. IEEE Int’l. Conf. on Computer Vision and Pattern Recognition2016IEEEPiscataway, NJ770778770–8

21KakumanuP.MakrogiannisS.BourbakisN.2007A survey of skin-color modeling and detection methodsPattern Recognit.40110611221106–2210.1016/j.patcog.2006.06.010

22KalatJ. W.Biological psychologyNelson Education2015Cengage Learning, MA

23KimH.-K.ParkJ. H.JungH.-Y.2018An efficient color space for deep-learning based traffic light recognitionJ. Adv. Transportation201810.1155/2018/2365414

24KingmaD. P.WellingM.2019An introduction to variational autoencodersFoundations and Trends in Machine Learning12307392307–9210.1561/2200000056

25KirillovA.HeK.GirshickR.RotherC.DollárP.Panoptic segmentationProc. IEEE Conf. on Computer Vision and Pattern Recognition2019IEEEPiscataway, NJ940494139404–13

26KoenderinkJ.DoornA. v.GegenfurtnerK.“Colors and things,”i-Perception 11, 1–43 (2020)

27KoenderinkJ.van DoornA. J.Perspectives on Colour Space2003Oxford UniversityNew York

28LaparraV.JiménezS.Camps-VallsG.MaloJ.2012Nonlinearities and adaptation of color vision from sequential principal curves analysisNeural Computation24275127882751–8810.1162/NECO˙a˙00342

29LarbiK.OuardaW.DriraH.AmorB. B.AmarC. B.Deepcolorfasd: Face anti spoofing solution using a multi channeled color spaces cnnProc. IEEE Int’l. Conf. on Systems, Man, and Cybernetics2018IEEEPiscataway, NJ401140164011–6

30LarssonG.MaireM.ShakhnarovichG.Colorization as a proxy task for visual understandingProc. IEEE Int’l. Conf. on Computer Vision and Pattern Recognition2017IEEEPiscataway, NJ687468836874–83

31LaughlinS.1981A simple coding procedure enhances a neuron’s information capacityZ. Natforsch. c36910912910–210.1515/znc-1981-9-1040

32LinT.-Y.MaireM.BelongieS.HaysJ.PeronaP.RamananD.DollárP.ZitnickC. L.Microsoft coco: Common objects in contextProc. European Conf. on Computer Vision2014SpringerCham740755740–55

33LiuZ.LuoP.WangX.TangX.Deep learning face attributes in the wildProc. IEEE Int’l. Conf. on Computer Vision and Pattern Recognition2015IEEEPiscataway, NJ373037383730–8

34McCullochW. S.PittsW.1943A logical calculus of the ideas immanent in nervous activityThe Bulletin of Mathematical Biophysics5115133115–3310.1007/BF02478259

35MechrezR.ShechtmanE.Zelnik-ManorL.Photorealistic style transfer with screened poisson equationProc. The British Machine Vision Conf.2017BMVA PressUK153.1153.12153.1–153.12

36MishkinD.SergievskiyN.MatasJ.2017Systematic evaluation of convolution neural network advances on the imagenetComput. Vis. Image Underst.161111911–910.1016/j.cviu.2017.05.007

37MoroneyN.FairchildM. D.1995Color space selection for jpeg image compressionJ. Electronic Imaging4373382373–8210.1117/12.217266

38MoslehA.SharmaA.OnzonE.MannanF.RobidouxN.HeideF.Hardware-in-the-loop end-to-end optimization of camera image processing pipelinesProc. IEEE Conf. on Computer Vision and Pattern Recognition2020IEEEPiscataway, NJ752975387529–38

39Muñoz-SalinasR.2008A Bayesian plan-view map-based approach for multiple-person detection and trackingPattern Recognit.41366536763665–7610.1016/j.patcog.2008.06.013

40ParragaC. A.AkbariniaA.2016Nice: A computational solution to close the gap from colour perception to colour categorizationPloS one11e014953810.1371/journal.pone.0149538

41PrattW.Digital Image Processing20074th ed.John Wiley & SonsHoboken, NJ

42QiuG.2002Indexing chromatic and achromatic patterns for content-based colour image retrievalPattern Recognit.35167516861675–8610.1016/S0031-3203(01)00162-5

43ReinhardE.AdhikhminM.GoochB.ShirleyP.2001Color transfer between imagesIEEE Comput. Graph. Appl.21344134–4110.1109/38.946629

44RabbaniM.2002Jpeg2000: Image compression fundamentals, standards and practiceJ. Electronic Imaging1128610.1117/1.1469618

45RafegasI.VanrellM.2018Color encoding in biologically-inspired convolutional neural networksVis. Res.1517177–1710.1016/j.visres.2018.03.010

46SalimansT.GoodfellowI.ZarembaW.CheungV.RadfordA.ChenX.Improved techniques for training gansAdvances in Neural Information Processing Systems2016Curran Associates Inc.NY223422422234–42

47SharmaG.WuW.DalalE. N.2005The CIEDE2000 color-difference formula: Implementation notes, supplementary test data, and mathematical observationsColor Res. Appl.30213021–3010.1002/col.20070

48StarosolskiR.2014New simple and efficient color space transformations for lossless image compressionJ. Vis. Commun. Image Represent.25105610631056–6310.1016/j.jvcir.2014.03.003

49StokmanH.GeversT.2007Selection and fusion of color models for image feature detectionIEEE Trans. on Pattern Anal. Machine Intell.29371381371–8110.1109/TPAMI.2007.58

50TishbyN.ZaslavskyN.Deep learning and the information bottleneck principleIEEE Information Theory Workshop2015IEEEPiscataway, NJ151–5

51TheisL.ShiW.CunninghamA.HuszárF.Lossy image compression with compressive autoencodersInt’l. Conf. on Learning2017ICLR, CA

52van den OordA.VinyalsO.KavukcuogluK.Neural discrete representation learningAdvances in Neural Information Processing Systems2017Curran Associates Inc.NY630663156306–15

53WangZ.BovikA. C.SheikhH. R.SimoncelliE. P.2004Image quality assessment: from error visibility to structural similarityIEEE Trans. Image Process.13600612600–1210.1109/TIP.2003.819861

54WuergerS.XiaoK.Color Vision, Opponent Theory2016SpringerBerlin413418413–8

55WyszeckiG.StilesW. S.Color Science1982Vol. 8WileyNew York

56YuG.VladimirovaT.SweetingM. N.2009Image compression systems on board satellitesActa Astronaut.649881005988–100510.1016/j.actaastro.2008.12.006

57ZhangR.IsolaP.EfrosA. A.Split-brain autoencoders: Unsupervised learning by cross-channel predictionProc. IEEE Conf. on Computer Vision and Pattern Recognition2017IEEEPiscataway, NJ105810671058–67