Real-time video super-resolution (VSR) has been considered a promising solution to improving video quality for video conferencing and media video playing, which requires low latency and short inference time. Although state-of-the-art VSR methods have been proposed with well-designed architectures, many of them are not feasible to be transformed into a real-time VSR model because of vast computation complexity and memory occupation. In this work, we propose a light-weight recurrent network for this task, where motion compensation offset is estimated by an optical flow estimation network, features extracted from the previous high-resolution output are aligned to the current target frame, and a hidden space is utilized to propagate long-term information. We show that the proposed method is efficient in real-time video super-resolution. We also carefully study the effectiveness of the existence of an optical flow estimation module in a lightweight recurrent VSR model and compare two ways of training the models. We further compare four different motion estimation networks that have been used in light-weight VSR approaches and demonstrate the importance of reducing information loss in motion estimation.
With the recent advance in video super-resolution (VSR) techniques, there have been many requests for super-resolve real-world old analog TV series into high-definition digital content. As excellent classical TV series may receive little to no attention due to their poor video quality, restoring them would open new business opportunities for reusing old TV contents. A problem with restoring real-world old TV series is in the complex artifacts introduced by the old interlaced scanning and compression artifacts during the digitization of old analog videos. Though recent DNN-based VSR models perform nicely on clean videos, due to the artificial nature of interlacing and compression artifacts, they fail to restore old videos into a high-definition counterpart free from noticeable artifacts. In this work, we propose OldVSR for restoring old real-world TV series with artifacts of artificial nature. The proposed model implements a bidirectional recurrent structure with first and second-order propagation where each recurrent layer implements two main functions, i.e., Feature alignment (FA) and Pyramid feature aggregation (PFA). The outputs of the forward and backward layers are merged and upsampled to produce a High-Definition (HD) frame of the input standard-definition (SD) frame. We demonstrate through experiments that our proposed OldVSR can effectively remove artifacts of artificial nature from old videos and successfully restores old TV series.