According to Cisco, most Internet traffic is currently comprised of videos. Therefore, developing a quality assessment method for assuring that those videos are received and displayed with quality at the user side is an important and challenging task. As a consequence, over the last decades, several no-reference video quality metrics have been proposed with the goal of blindly predicting (with no access to the original signal) the quality of videos in streaming applications. One of such metrics is NAVE, whose architecture includes an auto-encoder module that produces a compact set of visual features with a higher descriptive capacity. Nevertheless, the visual features in NAVE do not include descriptive temporal features that are sensitive to temporal degradation. In this work, we analyze the effect on accuracy performance of using a new type of temporal features, based on natural scene statistics. This approach has the goal of making the tested video quality metric more generic, i.e. sensitive to both spatial and temporal distortions and therefore adequate for video streaming applications.
Video Quality Assessment (VQA) is an essential topic in several industries ranging from video streaming to camera manufacturing. In this paper, we present a novel method for No-Reference VQA. This framework is fast and does not require the extraction of hand-crafted features. We extracted convolutional features of 3-D C3D Convolutional Neural Network and feed one trained Support Vector Regressor to obtain a VQA score. We did certain transformations to different color spaces to generate better discriminant deep features. We extracted features from several layers, with and without overlap, finding the best configuration to improve the VQA score. We tested the proposed approach in LIVE-Qualcomm dataset. We extensively evaluated the perceptual quality prediction model, obtaining one final Pearson correlation of 0:7749±0:0884 with Mean Opinion Scores, and showed that it can achieve good video quality prediction, outperforming other state-of-the-art VQA leading models.
Video object tracking (VOT) aims to determine the location of a target over a sequence of frames. The existing body of work has studied various image factors that affect VOT performance. For instance, factors such as occlusion, clutter, object shape, unstable speed and zooming, that influence video quality, do affect tracking performance. Nonetheless, there is no clear distinction between scene-dependent challenges such as occlusion and clutter and the challenges imposed by traditional notions of “quality impairments” inherited from capture, compression, processing, and transmission. In this study, we are concerned with the latter interpretation of quality as it affects video tracking performance. In this paper, we propose the design and implementation of a quality aware feature selection for VOT. First, we divided each frame of the video into patches of the same size and extracted HOG, and natural scene statistics (NSS) features from these patches. Then, we degraded the videos synthetically with different levels of post-capture distortions such as MPEG-4, AWGN, salt and pepper, and blur. Finally, we defined the best set of features HOG and NSS that generate the largest area under the curve in the success plots, yielding an improvement in the video tracker performance in videos affected by post-capture distortions.