Real-time video super-resolution (VSR) has been considered a promising solution to improving video quality for video conferencing and media video playing, which requires low latency and short inference time. Although state-of-the-art VSR methods have been proposed with well-designed architectures, many of them are not feasible to be transformed into a real-time VSR model because of vast computation complexity and memory occupation. In this work, we propose a light-weight recurrent network for this task, where motion compensation offset is estimated by an optical flow estimation network, features extracted from the previous high-resolution output are aligned to the current target frame, and a hidden space is utilized to propagate long-term information. We show that the proposed method is efficient in real-time video super-resolution. We also carefully study the effectiveness of the existence of an optical flow estimation module in a lightweight recurrent VSR model and compare two ways of training the models. We further compare four different motion estimation networks that have been used in light-weight VSR approaches and demonstrate the importance of reducing information loss in motion estimation.