<?xml version="1.0"?>
                <!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "journalpublishing3.dtd">
                <article article-type="research-article" xmlns:mml="http://www.w3.org/1998/Math/MathML"
                xmlns:xlink="http://www.w3.org/1999/xlink"
                xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                dtd-version="3.0">
                <front>
                    <journal-meta>
                    <journal-id journal-id-type="publisher-id">ei</journal-id>
                    <journal-title>Electronic Imaging</journal-title>
                    <issn pub-type="ppub">2470-1173</issn><issn pub-type="epub">2470-1173</issn>
                    <publisher>
                        <publisher-name>Society for Imaging Science and Technology</publisher-name>
                        <publisher-loc>IS&amp;T 7003 Kilworth Lane, Springfield, VA 22151 USA</publisher-loc>
                    </publisher>
                    </journal-meta>
                    <article-meta>
                    <article-id pub-id-type="doi">10.2352/EI.2024.36.4.MWSF-335</article-id>
                    <article-id pub-id-type="publisher-id">MWSF-335</article-id>
                    <article-categories>
                        <subj-group>
                        <subject>Proceedings Paper</subject>
                        </subj-group>
                    </article-categories>
                    <title-group>
                        <article-title>Efficient Temporally-aware DeepFake Detection using H.264 Motion Vectors</article-title>
                    </title-group><contrib-group content-type="all"><contrib contrib-type="author"><name>
                            <surname>Grönquist</surname>
                            <given-names>Peter </given-names>
                           </name> <xref ref-type="aff" rid="aff1author1"/></contrib><aff id="aff1author1">École Polytechnique Fédérale de Lausanne, Switzerland</aff></contrib-group><contrib-group content-type="all"><contrib contrib-type="author"><name>
                            <surname>Ren</surname>
                            <given-names>Yufan </given-names>
                           </name> <xref ref-type="aff" rid="aff1author2"/></contrib><aff id="aff1author2">École Polytechnique Fédérale de Lausanne, Switzerland</aff></contrib-group><contrib-group content-type="all"><contrib contrib-type="author"><name>
                            <surname>He</surname>
                            <given-names>Qingyi </given-names>
                           </name> <xref ref-type="aff" rid="aff1author3"/></contrib><aff id="aff1author3">École Polytechnique Fédérale de Lausanne, Switzerland</aff></contrib-group><contrib-group content-type="all"><contrib contrib-type="author"><name>
                            <surname>Verardo</surname>
                            <given-names>Alessio </given-names>
                           </name> <xref ref-type="aff" rid="aff1author4"/></contrib><aff id="aff1author4">École Polytechnique Fédérale de Lausanne, Switzerland</aff></contrib-group><contrib-group content-type="all"><contrib contrib-type="author"><name>
                            <surname>Süsstrunk</surname>
                            <given-names>Sabine </given-names>
                           </name> <xref ref-type="aff" rid="aff1author5"/></contrib><aff id="aff1author5">École Polytechnique Fédérale de Lausanne, Switzerland</aff></contrib-group><abstract>
                    <title>Abstract</title>
                    <p>Video DeepFakes are fake media created with Deep Learning (DL) that manipulate a person’s expression or identity. Most current DeepFake detection methods analyze each frame independently, ignoring inconsistencies and unnatural movements between frames. Some newer methods employ optical flow models to capture this temporal aspect, but they are computationally expensive. In contrast, we propose using the related but often ignored Motion Vectors (MVs) and Information Masks (IMs) from the H.264 video codec, to detect temporal inconsistencies in DeepFakes. Our experiments show that this approach is effective and has minimal computational costs, compared with per-frame RGB-only methods. This could lead to new, real-time temporally-aware DeepFake detection methods for video calls and streaming.</p>
                    </abstract><pub-date>
                        <day>21</day>
                        <month>1</month>
                        <year>2024</year>
                        </pub-date><volume>36</volume>
                    <issue-acronym>MWSF</issue-acronym>
                    <issue-title>Media Watermarking, Security, and Forensics 2024</issue-title>
                    <issue seq="335">4</issue>
                    <fpage>335-1</fpage>
                    <lpage>335-9</lpage>
                    <permissions>
                         <copyright-statement>This work is licensed under the Creative Commons Attribution 4.0 International License.  To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.</copyright-statement>
                        <copyright-year>2024</copyright-year>
                    </permissions><kwd-group><kwd>Deep Learning</kwd><kwd>DeepFake</kwd><kwd>Detection</kwd><kwd>Efficiency</kwd><kwd>Face Forgery</kwd><kwd>H.264</kwd><kwd>Motion Vectors</kwd><kwd>Optical Flow</kwd><kwd>Video</kwd></kwd-group></article-meta>
                </front>
                </article>