In recent years, we have seen significant progress in advanced image upscaling techniques, sometimes called super-resolution, ML-based, or AI-based upscaling. Such algorithms are now available not only in form of specialized software but also in drivers and SDKs supplied with modern graphics cards. Upscaling functions in NVIDIA Maxine SDK is one of the recent examples. However, to take advantage of this functionality in video streaming applications, one needs to (a) quantify the impacts of super-resolution techniques on the perceived visual quality, (b) implement video rendering incorporating super-resolution upscaling techniques, and (c) implement new bitrate+resolution adaptation algorithms in streaming players, enabling such players to deliver better quality of experience or better efficiency (e.g. reduce bandwidth usage) or both. Towards this end, in this paper, we propose several techniques that may be helpful to the implementation community. First, we offer a model quantifying the impacts of super resolution upscaling on the perceived quality. Our model is based on the Westerink-Roufs model connecting the true resolution of images/videos to perceived quality, with several additional parameters added, allowing its tuning to specific implementations of super-resolution techniques. We verify this model by using several recent datasets including MOS scores measured for several conventional up-scaling and super-resolution algorithms. Then, we propose an improved adaptation logic for video streaming players, considering video bitrates, encoded video resolutions, player size, and the upscaling method. This improved logic relies on our modified Westerink-Roufs model to predict perceived quality and suggests choices of renditions that would deliver the best quality for given display and upscaling method characteristics. Finally, we study the impacts of the proposed techniques and show that they can deliver practically appreciable results in terms of the expected QoE improvements and bandwidth savings.
Despite the advances in single-image super resolution using deep convolutional networks, the main problem remains unsolved: recovering fine texture details. Recent works in super resolution aim at modifying the training of neural networks to enable the recovery of these details. Among the different method proposed, wavelet decomposition are used as inputs to super resolution networks to provide structural information about the image. Residual connections may also link different network layers to help propagate high frequencies. We review and compare the usage of wavelets and residuals in training super resolution neural networks. We show that residual connections are key in improving the performance of deep super resolution networks. We also show that there is no statistically significant performance difference between spatial and wavelet inputs. Finally, we propose a new super resolution architecture that saves memory costs while still using residual connections, and performing comparably to the current state of the art.