It is well understood that the color values from a digital camera are functions of the camera's spectral sensitivities, the reflectances of the objects in the scene as well as illumination and any filter that is placed between the object and the sensor. It is vital to select the correct illumination to optimize a color reproduction pipeline. In practice, the choice of the illumination is limited to the spectra of available light sources.<br/> In this paper, we optimize a camera's colorimetric performance by theoretically mounting a filter to the lens. An ideal spectrum of the filter is obtained using the Luther optimization condition. By using variational calculus we reduce the optimization problem to a system of non linear equations on a Lie group. We solve the system of equations by applying Newton's method on a Lie group with a left invariant Riemannian structure. As expected from the literature, our experiments show quadratic convergence.<br/> A second approach is a redesign of the set-up. This redesign gives us a quadratic optimization problem that is easier to solve. Constraints to this optimization problem gives us control on the transparency of the filter.
Visually induced motion sickness (VIMS) is evoked by conflicting motion sensory signals within the brain. Use of the simulator sickness questionnaire (SSQ) or postural stability measures to quantify one's VIMS experience only measures the changes between pre- and post-experiment. The motion sickness susceptibility questionnaire (MSSQ) is widely used to measure individual's sensitivity to motion sickness, but its applicability to VIMS has not been proven. We are introducing a novel VIMS susceptibility measure by combining measures of the subject's "sensitivity" and "endurance" to VIMS. The proposed VIMS susceptibility measure was tested for various VIMS inducing conditions, and demonstrated its effectiveness by conducting both between-subjects and within-subject comparisons for different VIMS conditions.
Recent advances in computational models in vision science have considerably furthered our understanding of human visual perception. At the same time, rapid advances in convolutional deep neural networks (DNNs) have resulted in computer vision models of object recognition which, for the first time, rival human object recognition. Furthermore, it has been suggested that DNNs may not only be successful models for computer vision, but may also be good computational models of the monkey and human visual systems. The advances in computational models in both vision science and computer vision pose two challenges in two different and independent domains: First, because the latest computational models have much higher predictive accuracy, and competing models may make similar predictions, we require more human data to be able to statistically distinguish between different models. Thus we would like to have methods to acquire trustworthy human behavioural data fast and easy. Second, we need challenging experiments to ascertain whether models show similar input-output behaviour only near "ceiling" performance, or whether their performance degrades similar to human performance: only then do we have strong evidence that models and human observers may be using similar features and processing strategies. In this paper we address both challenges.
Providing a natural 3D visualization is a major challenge in 3D display technologies. Although 3D displays with light-ray reconstruction have been demonstrated, displayable 3D scenes are selective because their depth-reconstruction range is restricted. Here, we attempt to expand the range virtually by introducing "depth-compressed expressions," in which the depth of 3D scenes are compressed or modified in the axial direction so that the appearances of depth-compressed scenes is kept natural for viewers. With a simulated system of an autostereoscopic 3D display with light-ray reconstruction, we investigated how large the depth range needed to be to show the depth-compressed scenes without inducing unnaturalness in viewers. Using a linear depthcompression method—the simplest way of depth-compression—we found that viewers did not feel unnaturalness for the depthcompressed scenes that were expressed within at most half the depth range of the originals. These results gave us a design goal in developing 3D displays for high quality 3D visualization.
Humans resolve the spatial alignment between two visual stimuli at a resolution that is substantially finer than the spacing between the foveal cones. In this paper, we analyze the factors that limit the information at the cone photoreceptors that is available to make these acuity judgments (Vernier acuity). We use open-source software, ISETBIO1 to quantify the stimulus and encoding stages in the front-end of the human visual system, starting with a description of the stimulus spectral radiance and a computational model that includes the physiological optics, inert ocular pigments, eye movements, photoreceptor sampling and absorptions. The simulations suggest that the visual system extracts the information available within the spatiotemporal pattern of photoreceptor absorptions within a small spatial (0.12 deg) and temporal (200 ms) regime. At typical display luminance levels, the variance arising from the Poisson absorptions and small eye movements (tremors and microsaccades) both appear to be critical limiting factors for Vernier acuity.
In recent years, Convolutional Neural Networks (CNNs) have gained huge popularity among computer vision researchers. In this paper, we investigate how features learned by these networks in a supervised manner can be used to define a measure of self-similarity, an image feature that characterizes many images of natural scenes and patterns, and is also associated with images of artworks. Compared to a previously proposed method for measuring self-similarity based on oriented luminance gradients, our approach has two advantages. Firstly, we fully take color into account, an image feature which is crucial for vision. Secondly, by using higher-layer CNN features, we define a measure of selfsimilarity that relies more on image content than on basic local image features, such as luminance gradients.
We investigated gaze-contingent fusion of infrared imagery during visual search. Eye movements were monitored while subjects searched for and identified human targets in images captured simultaneously in the short-wave (SWIR) and long-wave (LWIR) infrared bands. Based on the subject's gaze position, the search display was updated such that imagery from one sensor was continuously presented to the subject's central visual field ("center") and another sensor was presented to the subject's non-central visual field ("surround"). Analysis of performance data indicated that, compared to the other combinations, the scheme featuring SWIR imagery in the center region and LWIR imagery in the surround region constituted an optimal combination of the SWIR and LWIR information: it inherited the superior target detection performance of LWIR imagery and the superior target identification performance of SWIR imagery. This demonstrates a novel method for efficiently combining imagery from two infrared sources as an alternative to conventional image fusion. © 2017 Her Majesty the Queen in Right of Canada.
Recently the movie industry has been advocating the use of frame rates significantly higher than the traditional 24 frames per second. This higher frame rate theoretically improves the quality of motion portrayed in movies, and helps avoid motion blur, judder and other undesirable artifacts. Previously we reported that young adult audiences showed a clear preference for higher frame rates, particularly when contrasting 24 fps with 48 or 60 fps. We found little impact of shutter angle (frame exposure time) on viewers' choices. In the current study we replicated this experiment with an audience composed of imaging professionals who work in the film and display industry who assess image quality as an aspect of their everyday occupation. These viewers were also on average older and thus could be expected to have attachments to the "film look" both through experience and training. We used stereoscopic 3D content, filmed and projected at multiple frame rates (24, 48 and 60 fps), with shutter angles ranging from 90° to 358°, to evaluate viewer preferences. In paired-comparison experiments we assessed preferences along a set of five attributes (realism, motion smoothness, blur/clarity, quality of depth and overall preference). As with the young adults in the earlier study, the expert viewers showed a clear preference for higher frame rates, particularly when contrasting 24 fps with 48 or 60 fps. We found little impact of shutter angle on viewers' choices, with the exception of one clip at 48 fps where there was a preference for larger shutter angle. However, this preference was found for the most dynamic "warrior" clip in the experts but in the slower moving "picnic" clip for the naïve viewers. These data confirm the advantages afforded by high-frame rate capture and presentation in a cinema context in both naïve audiences and experienced film professionals. © 2016 Society for Imaging Science and Technology.