
Advances in high dynamic range (HDR) lighting estimation from a single image have opened new possibilities for augmented reality (AR) applications. Predicting complex lighting environments from a single input image allows for the realistic rendering and compositing of virtual objects. In this work, we investigate the color robustness of such methods—an often overlooked yet critical factor for achieving visual realism. While most evaluations conflate color with other lighting attributes (e.g., intensity, direction), we isolate color as the primary variable of interest. Rather than introducing a new lighting estimation algorithm, we explore whether simple adaptation techniques can enhance the color accuracy of existing models. Using a novel HDR dataset featuring diverse lighting colors, we systematically evaluate several adaptation strategies. Our results show that preprocessing the input image with a pre-trained white balance network improves color robustness, outperforming other strategies across all tested scenarios. Notably, this approach requires no retraining of the lighting estimation model. We further validate the generality of this finding by applying the technique to three state-of-the-art lighting estimation methods from recent literature. Our project webpage is available at: https: // lvsn. github. io/ coloraccuracy .

Various uniform color spaces and color appearance models were mainly developed for characterizing stimuli under a low dynamic range condition. Real scenes in daily life, however, are commonly high dynamic range (HDR), containing highlights with luminance beyond the diffuse white, whose color appearance characterization was never investigated in the past. This study was carefully designed to investigate the color appearance characterization of highlights in HDR scenes, covering extremely wide ranges of diffuse white luminance (up to 11000 cd/m<sup>2</sup>), stimulus luminance (up to 49000 cd/m<sup>2</sup>), stimulus chromaticities (reach Rec. 2020 gamut), and scene luminance contrast (up to 72045). The observers viewed two stimuli, including one highlight and one dark stimulus, in a viewing booth, and were asked to adjust the color appearance of another stimulus, so that the color differences between the adjusted stimulus to each of the other two stimuli appeared the same. The results clearly showed that none of the existing models, including the one (i.e., IC<sub>t</sub>C<sub>p</sub>) that was recently designed for HDR scenes, has a good performance. The models using a power function to characterize the non-linear compressive responses of the human visual system (i.e., CIELAB and IPT) had a slightly better performance. The findings provided some guidance for performing tone mapping and chroma/saturation adjustments, and clearly suggest the necessity to carry out further work to develop a better model for HDR scenes.