According to Cisco, most Internet traffic is currently comprised of videos. Therefore, developing a quality assessment method for assuring that those videos are received and displayed with quality at the user side is an important and challenging task. As a consequence, over the last
decades, several no-reference video quality metrics have been proposed with the goal of blindly predicting (with no access to the original signal) the quality of videos in streaming applications. One of such metrics is NAVE, whose architecture includes an auto-encoder module that produces
a compact set of visual features with a higher descriptive capacity. Nevertheless, the visual features in NAVE do not include descriptive temporal features that are sensitive to temporal degradation. In this work, we analyze the effect on accuracy performance of using a new type of temporal
features, based on natural scene statistics. This approach has the goal of making the tested video quality metric more generic, i.e. sensitive to both spatial and temporal distortions and therefore adequate for video streaming applications.