Efficient compression plays a significant role in Light Field imaging technology because of the huge amount of data needed for their representation. Video encoders using different strategies are commonly used for Light Field image compression. In this paper, different video encoder implementations including HM, VTM, x265, xvc, VP9, and AV1 are analysed and compared in terms of coding efficiency, and encoder/decoder time-complexity. Light field images are compressed as pseudo-videos.
Modern computing and imaging technologies have allowed for many recent advances to be made in the field of 3D range imaging: range data can now be acquired at speeds much faster than real-time, with sub-millimeter precision. However, these benefits come at the cost of an increased quantity of data being generated by 3D range imaging systems, potentially limiting the number of applications that can take advantage of this technology. One common approach to the compression of 3D range data is to encode it within the three color channels of a traditional 24-bit RGB image. This paper presents a novel method for the modification and compression of 3D range data such that the original depth information can be stored within, and recovered from, only two channels of a traditional 2D RGB image. Storage within a traditional image format allows for further compression to be realized via lossless or lossy image compression techniques. For example, when JPEG 80 was used to store the encoded output image, this method was able to achieve an 18.2% reduction in file size when compared to a similar three-channel, image-base compression method, with only a corresponding 0.17% reduction in global reconstruction accuracy.
In this paper, we propose an automated adaptive focus pipeline for creating synthetic extended depth of field images using a reflectance transformation imaging (RTI) system. The pipeline proposed detects object regions at different depth levels relative to the camera’s depth of field and collects a most focused image for each. These images are then run through a focus stacking algorithm to create an image where the focus of each pixel has been maximized for the given camera parameters, lighting conditions, and glare. As RTI is used for many cultural heritage imaging projects, automating this process provides high quality data by removing the need for many separate images focused on different regions of interest on the object. It also lowers the skill floor for this image collection process by reducing the amount of manual adjustments that need to be made for focus. Furthermore, this can help to minimize the amount of time that a sensitive cultural heritage object is outside of its ideal preservation environment.
Application of optical metrology techniques in the collection of surface data and its 3D representation can improve the digital documentation of the conservation and restoration process of artworks. The tracking of induced change after the restoration process on cultural heritage (CH) surfaces involves a computational analysis of surface geometry. In the analysis, the conservation scientists were interested to see what impact the fillings of some holes have on its nearby surrounding during the reconstruction. In theory, the loss compensation method for stone should allow conservators to make a filling that only exists in the place of the void, but it is highly unlikely to make a filling that will only adhere to the substrate at the void site and not protrude elsewhere. According to the conservator scientists, we proposed an approach of local geometry changes to identify and visualize changes and presented the outcome through a local neighborhood distance histogram. This analysis will give us overall surface change considering each surface point and its respective neighborhood points and what impact it faced due to the reconstruction process. The work is also focused on developing the representation of each type of loss compensation method to make it more objective according to a restorer’s point of view and simplify their work visibility.
One of the most common sources of damage in Cultural Heritage Objects (CHO) such as parchment, oil paintings and historical textiles is relative humidity (RH) changes which cause inplane and out-of-plane displacements of their surfaces. The best suited method which enables full-field, non-contact displacements measurement with high resolution and sufficient range, is 3D Digital Image Correlation (3D DIC). However, the standard version of DIC requires applying random, good contrast speckle pattern at a surface of the object under investigation. Such requirement is not acceptable for CHOs. In this paper, we analyze the possibility to apply 3D DIC method for monitoring of displacements in selected groups of CHOs without modifying their surface i.e. based on their natural texture. The selection of the data capturing conditions and analysis parameters can lead to successful non-invasive monitoring of CHOs’ behavior. The samples studied herein are historical parchments subjected to controlled RH changes that impose inplane and out-of-plane displacements variations in time.
In this paper, a low cost, single camera, double mirror system that can be built in a desktop nail printer will be described. The usage of this system is to capture an image of a fingernail and to generate the 3D shape of the nail. The nail’s depth map will be estimated from this rendered 3D nail shape. The paper will describe the camera calibration process and explain the calibration theory for this proposed system. Then a 3D reconstruction method will be introduced, as well. Experimental results will be shown in the paper, which illustrate the accuracy of the system to handle the rendering task.
Robust multi-camera calibration is a fundamental task for all multi-view camera systems, leveraging discreet camera model fitting from sparse target observations. Stereo systems, photogrammetry and light-field arrays have all demonstrated the need for geometrically consistent calibrations to achieve higherlevels of sub-pixel localization accuracy for improved depth estimation. This work presents a calibration target that leverages multi-directional features to achieve improved dense calibrations of camera systems. We begin by presenting a 2D target that uses an encoded feature set, each with 12 bits of uniqueness for flexible patterning and easy identification. These features combine orthogonal sets of straight and circular binary edges, along with Gaussian peaks. Our proposed feature extraction algorithm uses steerable filters for edge localization, and an ellipsoidal peak fitting for the circle center estimation. Feature uniqueness is used for associativity across views, which is combined into a 3D pose graph for nonlinear optimization. Existing camera models are leveraged for intrinsic and extrinsic estimates, demonstrating a reduction in mean re-projection error of for stereo calibration from 0.2 pixels to 0.01 pixels when using a traditional checkerboard and the proposed target respectively.
Immersive video enables interactive natural consumption of visual content by empowering a user to navigate through six degrees of freedom, with motion parallax and wide-angle rotation. Supporting immersive experiences requires content captured by multiple cameras and efficient video coding to meet bandwidth and decoder complexity constraints, while delivering high quality video to end users. The Moving Picture Experts Group (MPEG) is developing an immersive video (MIV) standard to data access and delivery of such content. One of MIV operating modes is an objectbased immersive video coding which enables innovative use cases where the streaming bandwidth can be better allocated to objects of interest and users can personalize the rendered streamed content. In this paper, we describe a software implementation of the object-based solution on top of the MPEG Test Model for Immersive Video (TMIV). We demonstrate how encoding foreground objects can lead to a significant saving in pixel rate and bitrate while still delivering better subjective and objective results compared to the generic MIV operating mode without the object-based solution.
On-the-fly reconstruction of 3D indoor environments has recently become an important research field to provide situational awareness for first responders, like police and defence officers. The protocols do not allow deployment of active sensors (LiDAR, ToF, IR cameras) to prevent the danger of being exposed. Therefore, passive sensors, such as stereo cameras or moving mono sensors, are the only viable options for 3D reconstruction. At present, even the best portable stereo cameras provide an inaccurate estimation of depth images, caused by the small camera baseline. Reconstructing a complete scene from inaccurate depth images becomes then a challenging task. In this paper, we present a real-time ROS-based system for first responders that performs semantic 3D indoor reconstruction based purely on stereo camera imaging. The major components in the ROS system are depth estimation, semantic segmentation, SLAM and 3D point-cloud filtering. First, we improve the semantic segmentation by training the DeepLab V3+ model [9] with a filtered combination of several publicly available semantic segmentation datasets. Second, we propose and experiment with several noise filtering techniques on both depth images and generated point-clouds. Finally, we embed semantic information into the mapping procedure to achieve an accurate 3D floor plan. The obtained semantic reconstruction provides important clues on the inside structure of an unseen building which can be used for navigation.