Event-based vision Sensors (EVS) utilize smart pixels capable of detecting whether relative illumination changes exceed a predefined temporal contrast threshold on a pixel level. As EVS asynchronously read these events, they provide low-latency and high-temporal resolution suitable for complementing conventional CMOS Image Sensors (CIS). Emerging hybrid CIS+EVS sensors fuse the high spatial resolution intensity frames with low latency event information to enhance applications such as deblur or video-frame interpolation (VFI) for slow-motion video capture. This paper employs an edge sharpness-based metric-Blurred Edge Width (BEW) to benchmark EVS-assisted slow-motion capture against CIS-only solutions. The EVS-assisted VFI interpolates a CIS video steam with a framerate of 120 fps by 64x, yielding an interpolated framerate of 7680 fps. We observed that the added information from EVS dramatically outperforms a 120 fps CIS-only VFI solution. Furthermore, the hybrid EVS+CIS-based VFI achieves comparable performance as high-speed CIS-only solutions that capture frames directly at 480 fps or 1920 fps and incorporate additional CIS-only VFI. These, however, do so at significantly lower data rates. In our study, factors 2.6 and 10.5 were observed.
While slow motion has become a standard feature in mainstream cell phones, a fast approach without relying on specific training datasets to assess slow motion video quality is not available. Conventionally, researchers evaluate their algorithms with peak signal-to-noise ratio (PSNR) or structural similarity index measure (SSIM) between ground-truth and reconstructed frames. But they are both global evaluation index and more sensitive to noise or distortion brought by the interpolation. For video interpolation, especially for fast moving objects, motion blur as well as ghost problem are more essential to the audience subjective judgment. How to achieve a proper evaluation for Video Frame Interpolation (VFI) task is still a problem that is not well addressed.