Back to articles
Perception and Quality
Volume: 28 | Article ID: art00021
An Audiovisual Saliency Model For Conferencing and Conversation Videos
  DOI :  10.2352/ISSN.2470-1173.2016.13.IQSP-217  Published OnlineFebruary 2016

Visual attention modeling is a very active research area. During the last decade several image and video attention models have been proposed. Unfortunately, the majority of classical video attention models do not take into account the multimodal aspect of the video (visual and auditory cues). However, several studies have proven that human gazes are affected by the presence of the soundtrack. In this paper we propose an audiovisual saliency model that can predict the human gaze maps when exploring a conferencing or conversation videos. The model is based on the fusion of spatial, temporal and auditory attentional maps. Thanks to a real-time audiovisual speaker localization method, the proposed auditory maps are modulated by the enhanced saliency region of speakers compared to the other faces in the video. Classical visual attention measures have been used to compare the predicted saliency maps with the eye-tracking ground truth. Results of the proposed approach, using several fusion methods, show a good performance whatever the used spatial models.

Subject Areas :
Views 17
Downloads 0
 articleview.views 17
 articleview.downloads 0
  Cite this article 

Naty Ould Sidaty, Mohamed-Chaker Larabi, Abdelhakim Saadane, "An Audiovisual Saliency Model For Conferencing and Conversation Videosin Proc. IS&T Int’l. Symp. on Electronic Imaging: Image Quality and System Performance XIII,  2016,

 Copy citation
  Copyright statement 
Copyright © Society for Imaging Science and Technology 2016
Electronic Imaging
Society for Imaging Science and Technology