Adaptive bit rate (ABR) streaming is one enabling technology for video streaming over modern throughput-varying communication networks. A widely used ABR streaming method is to adapt the video bit rate to channel throughput by dynamically changing the video resolution. Since videos have different ratequality performances at different resolutions, such ABR strategy can achieve better rate-quality trade-off than single resolution ABR streaming. The key problem for resolution switched ABR is to work out the bit rate appropriate at each resolution. In this paper, we investigate optimal strategies to estimate this bit rate using both quantitative and subjective quality assessment. We use the design of bitrates for 2K and 4K resolutions as an example of the performance of this strategy. We introduce strategies for selecting an appropriate corpus for subjective assessment and find that at this high resolution there is good agreement between quantitative and subjective analysis. The optimal switching bit rate between 2K and 4K resolutions is 4 Mbps.
We present and analyse schemes for the improvement of computational complexity in the current HEVC (High Efficiecy Video Coding) standard, by a subsampling of the block-matching distortion cost functions used in the encoding process. HEVC improves on prior standards considerably in coding (compression) efficiency, with a large set-back in time complexity for inter and intra prediction processes and mode decisions. We alleviate this by reducing the number of calculations per decision in all modes of prediction, through pixel decimation in the SAD and SSE distortion cost functions. Experimentation with different patterns shows significant encoding time reduction with these schemes, used in tandem with built-in Fast Encoding optimizations in the HEVC reference implementation.
We recently introduced a spectral filter array design for single-shot multispectral imaging that is based on Fourier transform spectroscopy. In this article, we investigate feasibility of guided filter demosaicking for our SFA design.
Most real-time video applications typically demand low end-to-end latency and faithful reconstruction of the video sequence. Many popular video coding standards (e.g. VP8, VP9, H.264 and HEVC) aim at achieving high compression efficiencies by exploiting spatial and temporal redundancies. This makes the encoded bitstream vulnerable to errors. Thus, applications especially on mobile phones, tablet PCs and other portable devices that use WiFi or 3G/4G/LTE networks typically suffer from low quality of service typically characterized by frequent delays, jitter, frozen picture, partial/no picture and total loss of connection. Similar scenarios are also often observed while watching live streaming accompanied by service interruptions and a blank screen. Our approach is to investigate error resilient coding control for the VPx encoder to make the bitstream more error resilient for streaming applications under lossy channel conditions. In this paper, we describe an error resilient coding system that uses duplication of frame prediction information. Our “error resilience packet” consists of this prediction information of several frames, that can be used for error concealment in the case of packet loss.
VP9 is an open-source video codec released by Google. It introduces superblocks (SBs) of size 64 × 64, and uses a recursive decomposition scheme to break them all the way down to 4 × 4 blocks. This provides a large efficiency gain for VP9. However, it also brings large computational complexity when encoding because of the rate distortion (RD) optimization on prediction blocks. This paper proposes a method that can early terminate the block partitioning process based on the information of the current block. We first model the early termination decision as a binary classification problem. Second, to solve this classification problem, a weighted linear Support Vector Machine (SVM) is trained whose weights are determined by the RD cost increase caused by misclassification. Finally, we model the parameter selection of the SVM as an optimization problem, which can enable us to control the trade-off between time saving and RD cost increase. Experimental results on standard HD data shows that the proposed method can reduce the complexity of partitioning prediction blocks while maintaining comparable coding performance - The Bjøntegaard delta bit rate is ∼1.2% for ∼30% encoding time reduction.
We propose a slice-level SAO on-off control method that can be applied in the parallel HEVC encoding scheme. To be applied in the parallel encoding scheme, our method does not use any information from the previous encoded frames. Our method uses the GOP level and slice quantization parameter, which are given before starting the current frame encoding. Our experimental results shows that our method can control SAO on-off in the slice level with very small amount of loss than the method that is hardly employed in the parallel encoding scheme.
Detecting spoofing compared to a live trait is a critical problem in the biometric authentication. In this paper, we present a novel method to detect fake fingerprint attacks based on the ensemble of image quality assessments (IQAs). The key idea of the proposed method is to combine quality scores obtained from multiple local regions, which are input into the linear SVM classifier to determine whether the given fingerprint is fake or not. One important advantage of the proposed method is that, in contrast to previous approaches, it accurately identifies fake fingerprints even with small partial distortions. Moreover, the proposed method does not require any additional device. Experimental results on the mobile device show that the proposed method is effective for fingerprint liveness detection in real-world scenarios.
Today, with the higher computing power of CPUs and GPUs, many different neural network architectures have been proposed for object detection in images. However, these networks are often not optimized to process color information. In this paper, we propose a new method based on an SVM network, that efficiently extracts this color information. We describe different network architectures and compare them with several color models (CIELAB, HSV, RGB...). The results obtained on real data show that our network is more efficient and robust than a single SVM network, with an average precision gain ranging from 1.5% to 6% with respect to the complexity of the test image database. We have optimized the network architecture in order to gain information from color data, thus increasing the average precision by up to 10%.
Camera motion blur generally varies across the image plane. In addition to camera rotation, scene depth is also an important factor that contributes to blur variation. This paper addresses the problem of estimating the latent image of a depth-varying scene from a blurred image caused by camera in-plane motion. To make this depth-dependent deblurring problem tractable, we acquire a small sequence of images with different exposure settings along with inertial sensor readings using a smart phone. The motion trajectory can be roughly estimated from the noisy inertial measurements. The short/long exposure settings are arranged in a special order such that the structure information preserved in short-exposed images is employed to compensate the trajectory drift introduced by the measurement noise. Meanwhile, these short-exposed images could be regarded as a stereo pair which provide necessary constraints for depth map inference. However, even with ground-truth motion parameters and depth map, the deblurred image may still suffer from ringing artifacts due to depth value ambiguity along objects boundaries resulting from camera motion. We propose a modified deconvolution algorithm that searches the “optimal” depth value in a neighborhood for each boundary pixel to resolve ambiguity. Experiments on real images validate that our deblurring approach achieves better performance than existing state-of-the-art methods on a depthvarying scene.