Task requirements for image acquisition systems vary substantially between applications: requirements for consumer photography may be irrelevant - or may even interfere - with requirements for automotive, medical and other applications. The remarkable capabilities of the imaging industry to create lens and sensor designs for specific applications has been demonstrated in the mobile computing market. We might expect that the industry can further innovate if we specify the requirements for other markets. This paper explains an approach to developing image system designs that meet the task requirements for autonomous vehicle applications. It is impractical to build a large number of image acquisition systems and evaluate each of them with real driving data; therefore, we assembled a simulation environment to provide guidance at an early stage. The open-source and freely available software (isetcam, iset3d, and isetauto) uses ray tracing to compute quantitatively how scene radiance propagates through a multi-element lens to form the sensor irradiance. The software then transforms the irradiance into the sensor pixel responses, accounting for a large number of sensor parameters. This enables the user to apply different types of image processing pipelines to generate images that are used to train and test convolutional networks used in autonomous driving. We use the simulation environment to assess performance for different cameras and networks.
Due to the fast evolving technologies and the increasing importance of Social Media, the camera is one of the most important components of today's mobile phones. Nowadays, smartphones are taking over a big share of the compact camera market. A simple reason for this might be revealed by the famous quote: "The best camera is the one that's with you". But with the vast choice of devices and great promises of manufacturers, there is a demand to characterize image quality and performance in very simple terms in order to provide information that helps choosing the best-suited device. The current existing evaluation systems are either not entirely objective or are under development and haven't reached a useful level yet. Therefore the industry itself has gotten together and created a new objective quality evaluation system named Valued Camera eXperience (VCX). It is designed to reflect the user experience regarding the image quality and the performance of a camera in a mobile device. Members of the initiative so fare are: Apple, Huawei, Image Engineering, LG, Mediatec, Nomicam, Oppo, TCL, Vivo, and Vodafone.
A multicamera, array camera, cluster camera, or "supercamera" incorporates two or more component cameras in a single system that functions as a camera with superior performance or special capabilities. Many camera arrays have been built by many organizations, yet creating an effective multicamera has not become significantly easier. This paper attempts to provide some useful insights toward simplifying the design, construction, and use of multicameras. Nine multicameras our group built for diverse purposes between 1999 and 2017 are described in some detail, including four built during Summer 2017 using some of the proposed simplifications.
High quality, 360 capture for Cinematic VR is a relatively new and rapidly evolving technology. The field demands very high quality, distortionfree 360 capture which is not possible with cameras that depend on fisheye lenses for capturing a 360 field of view. The Facebook Surround 360 Camera, one of the few "players" in this space, is an open-source license design that Facebook has released for anyone that chooses to build it from off-the-shelf components and generate 8K stereo output using open-source licensed rendering software. However, the components are expensive and the system itself is extremely demanding in terms of computer hardware and software. Because of this, there have been very few implementations of this design and virtually no real deployment in the field. We have implemented the system, based on Facebook's design, and have been testing and deploying it in various situations; even generating short video clips. We have discovered in our recent experience that high quality, 360 capture comes with its own set of new challenges. As an example, even the most fundamental tools of photography like "exposure" become difficult because one is always faced with ultra-high dynamic range scenes (one camera is pointing directly at the sun and the others may be pointing to a dark shadow). The conventional imaging pipeline is further complicated by the fact that the stitching software has different effects on various aspects of the calibration or pipeline optimization. Most of our focus to date has been on optimizing the imaging pipeline and improving the quality of the output for viewing in an Oculus Rift headset. We designed a controlled experiment to study 5 key parameters in the rendering pipeline – black level, neutral balance, color correction matrix (CCM), geometric calibration and vignetting. By varying all of these parameters in a combinatorial manner, we were able to assess the relative impact of these parameters on the perceived image quality of the output. Our results thus far indicate that the output image quality is greatly influenced by the black level of the individual cameras (the Facebook camera comprised of 17 cameras whose output need to be stitched to obtain a 360 view). Neutral balance is least sensitive. We are most confused about the results we obtain from accurately calculating and applying the CCM for each individual camera. We obtained improved results by using the average of the matrices for all cameras. Future work includes evaluating the effects of geometric calibration and vignetting on quality.
The goal of autofocus is to enable a digital camera to capture sharp images as accurately and quickly as possible in any lighting condition without human intervention. Recent developments in mobile imaging seek to embed phase-detection sensor pixels into the image sensor itself because these phase-detection sensors are able to provide information for controlling both the amount and the direction of lens offset and thereby expedite the autofocus process. Compared to the conventional contrast-detection autofocus algorithms, however, the presence of noise, the lack of contrast in the image, and the spatial offset between the left and right phase sensing pixels can easily affect phase detection. In this paper, we propose to address the issue by characterizing the relation between phase shift and lens movement for various object depths by a statistical model. Experiments are conducted to show that the proposed method is indeed able to improve the reliability of phase-detection autofocus.
Light fields can be captured by plenoptic cameras and observed on integral displays supporting auto-stereoscopic viewing and full parallax. However due to the limited aperture of plenoptic cameras and the resampling process needed to overcome the resolution mismatch between the capturing and displaying devices, the perceived parallax of the resampled light fields is considerably reduced, limiting the viewing experience. We propose a light field retargeting technique that enhances the perceived parallax by first slicing the captured light fields according to their disparity, and then translating these slices with proper magnitudes and directions prior to the resampling stage. The resampled light field conserves enough parallax to be perceived in the target display. The developed technique gives users control over the depth of field and the axial location of the rendered objects as seen through the integral display.
Modern digital cameras have very limited dynamic range, which makes them unable to capture the full range of illumination in natural scenes. Since this prevents them from accurately photographing visible detail, researchers have spent the last two decades developing algorithms for high-dynamic range (HDR) imaging which can capture a wider range of illumination and therefore allow us to reconstruct richer images of natural scenes. The most practical of these methods are stack-based approaches which take a set of images at different exposure levels and then merge them together to form the final HDR result. However, these algorithms produce ghost-like artifacts when the scene has motion or the camera is not perfectly static. In this paper, we present an overview of state-of-the-art deghosting algorithms for stack-based HDR imaging and discuss some of the tradeoffs of each.
Allowing viewers to explore virtual reality in a head-mounted display with six degrees of freedom (6-DoF) greatly enhances the associated immersion and comfort. It makes the experience more compelling compared to a fixed-viewpoint 2-DoF rendering produced by conventional algorithms using data from a stationary camera rig. In this work, we use subjective testing to study the relative importance of, and the interaction between, motion parallax and binocular disparity as depth cues that shape the perception of 3D environments by human viewers. Additionally, we use the recorded head trajectories to estimate the distribution of the head movements of a sedentary viewer exploring a virtual environment with 6-DoF. Finally, we demonstrate a real-time virtual reality rendering system that uses a Stacked OmniStereo intermediary representation to provide a 6-DoF viewing experience by utilizing data from a stationary camera rig. We outline the challenges involved in developing such a system and discuss the limitations of our approach.
Camera arrays are used to acquire the 360° surround video data presented on 3D immersive displays. The design of these arrays involves a large number of decisions ranging from the placement and orientation of the cameras to the choice of lenses and sensors. We implemented an open-source software environment (iset360) to support engineers designing and evaluating camera arrays for virtual and augmented reality applications. The software uses physically based ray tracing to simulate a 3D virtual spectral scene and traces these rays through multi-element spherical lenses to calculate the irradiance at the imaging sensor. The software then simulates imaging sensors to predict the captured images. The sensor data can be processed to produce the stereo and monoscopic 360° panoramas commonly used in virtual reality applications. By simulating the entire capture pipeline, we can visualize how changes in the system components influence the system performance. We demonstrate the use of the software by simulating a variety of different camera rigs, including the Facebook Surround360, the GoPro Odyssey, the GoPro Omni, and the Samsung Gear 360.