Virtual background has become an increasingly important feature of online video conferencing due to the popularity of remote work in recent years. To enable virtual background, a segmentation mask of the participant needs to be extracted from the real-time video input. Most previous works have focused on image based methods for portrait segmentation. However, portrait video segmentation poses additional challenges due to complicated background, body motion, and inter-frame consistency. In this paper, we utilize temporal guidance to improve video segmentation, and propose several methods to address these challenges including prior mask, optical flow, and visual memory. We leverage an existing portrait segmentation model PortraitNet to incorporate our temporal guided methods. Experimental results show that our methods can achieve improved segmentation performance on portrait videos with minimum latency.
Weichen Xu, Yezhi Shen, Qian Lin, Jan P. Allebach, Fengqing Zhu, "Efficient real-time portrait video segmentation with temporal guidance" in Electronic Imaging, 2022, pp 263-1 - 263-7, https://doi.org/10.2352/EI.2022.34.8.IMAGE-263