There has been a growing interest in using different approaches to improve the coding efficiency of modern video codec in recent years as demand for web-based video consumption increases. In this paper, we propose a model-based approach that uses texture analysis/synthesis to reconstruct blocks in texture regions of a video to achieve potential coding gains using the AV1 codec developed by the Alliance for Open Media (AOM). The proposed method uses convolutional neural networks to extract texture regions in a frame, which are then reconstructed using a global motion model. Our preliminary results show an increase in coding efficiency while maintaining satisfactory visual quality.
Encoders of AOM/AV1 codec consider an input video sequence as succession of frames grouped in Golden-Frame (GF) groups. The coding structure of a GF group is fixed with a given GF group size. In the current AOM/AV1 encoder, video frames are coded using a hierarchical, multilayer coding structure within one GF group. It has been observed that the use of multilayer coding structure may result in worse coding performance if the GF group presents consistent stillness across its frames. This paper proposes a new approach that adaptively designs the Golden-Frame (GF) group coding structure through the use of stillness detection. Our new approach hence develops an automatic stillness detection scheme using three metrics extracted from each GF group. It then differentiates those GF groups of stillness from other non-still GF groups and uses different GF coding structures accordingly. Experimental result demonstrates a consistent coding gain using the new approach.
Psychovisual rate-distortion optimization (Psy-RD) has been used in the industrial video coding practice as a tool to improve perceptual video quality. It has earned significant popularity through the wide spread of the open source x264 video encoders, where the Psy-RD option is employed by default. Nevertheless, little work has been dedicated to validate the impact of Psy-RD optimization on perceptual quality, so as to provide meaningful guidance on the practical usage and future development of the idea. In this work, we build a database that contains Psy-RD encoded video sequences at different strength and bitrates. A subjective user study is then conducted to evaluate and compare the quality of the Psy-RD encoded videos. We observe that there is considerable agreement between subjects' opinions on the test video sequences. Unfortunately, the impact of Psy-RD optimization on video quality does not appear to be encouraging. Somewhat surprisingly, the perceptual quality gain of Psy-RD ON versus Psy-RD OFF cases is negative on average. Our results suggest that Psy-RD optimization should be used with caution. Further investigations show that most state-of-the-art full-reference objective quality models correlate well with the subjective experiment results overall. But in terms of the paired comparison between Psy-RD ON and OFF cases, the false alarm rates are moderately high.
The paper presents an efficient algorithm that reduces the time complexity of video coding in the H.265/HEVC encoder, towards an implementation employable in real-time video coding and transmission applications. The optimization targets the motion estimation search procedure, which occupies a large part of the compute time per Coding Unit. Experimental results demonstrate extensive processing time savings while maintaining similar compression quality and bit rate as the standard.
Google started the WebM Project in 2010 to develop open source, royalty--free video codecs designed specifically for media on the Web. Subsequently, Google jointly founded a consortium of major tech companies called the Alliance for Open Media (AOM) to develop a new codec AV1, aiming at a next edition codec that achieves at least a generational improvement in coding efficiency over VP9. This paper proposes a new coding tool as one of the many efforts devoted to AOM/AV1. In particular, we propose a second ALTREF_FRAME in the AV1 syntax, which brings the total reference frames to seven on top of the work presented in [11]. ALTREF_FRAME is a constructed, no-show reference obtained through temporal filtering of a look-ahead frame. The use of twoALTREF_FRAMEs adds further flexibility to the multi-layer, multi-reference symmetric framework, and provides a great potential for the overall Rate- Distortion (RD) performance enhancement. The experimental results have been collected over several video test sets of various resolutions and characteristics both texture- and motion-wise, which demonstrate that the proposed approach achieves a consistent coding gain, compared against the AV1 baseline as well as against the results in [11]. For instance, using overall-PSNR as the distortion metric, an average bitrate saving of 5.880% in BDRate is obtained for the CIF-level resolution set, and 4.595% on average for the VGA-level resolution set.