<!DOCTYPE article PUBLIC '-//NLM//DTD Journal Publishing DTD v2.1 20050630//EN' 'http://uploads.ingentaconnect.com/docs/dtd/ingenta-journalpublishing.dtd'>
<article article-type="research-article">
  <front>
    <journal-meta>
      <journal-id journal-id-type="aggregator">72010604</journal-id>
      <journal-title>Electronic Imaging</journal-title>
      <issn pub-type="ppub">2470-1173</issn><issn pub-type="epub"></issn>
      <publisher>
        <publisher-name>Society for Imaging Science and Technology</publisher-name>
      </publisher>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.2352/ISSN.2470-1173.2017.16.CVAS-344</article-id>
      <article-id pub-id-type="sici">2470-1173(20170129)2017:16L.15;1-</article-id>
      <article-id pub-id-type="publisher-id">s4.phd</article-id>
      <article-id pub-id-type="other">/ist/ei/2017/00002017/00000016/art00004</article-id>
      <article-categories>
        <subj-group>
          <subject>Articles</subject>
        </subj-group>
      </article-categories>
      <title-group>
        <article-title>Goal!! Event detection in sports video</article-title>
      </title-group>
      <contrib-group>
        <contrib>
          <name>
            <surname>Tsagkatakis</surname>
            <given-names>Grigorios</given-names>
          </name>
        </contrib>
        <contrib>
          <name>
            <surname>Jaber</surname>
            <given-names>Mustafa</given-names>
          </name>
        </contrib>
        <contrib>
          <name>
            <surname>Tsakalides</surname>
            <given-names>Panagiotis</given-names>
          </name>
        </contrib>
      </contrib-group>
      <pub-date>
        <day>29</day>
        <month>01</month>
        <year>2017</year>
      </pub-date>
      <volume>2017</volume>
      <issue>16</issue>
      <fpage>15</fpage>
      <lpage>20</lpage>
      <permissions>
        <copyright-year>2017</copyright-year>
      </permissions>
      <abstract>
        <p>Understanding complex events from unstructured video, like scoring a goal in a football game, is an extremely challenging task due to the dynamics, complexity and variation of video sequences. In this work, we attack this problem exploiting the capabilities of the recently developed
 framework of deep learning. We consider independently encoding spatial and temporal information via convolutional neural networks and fusion of features via regularized Autoencoders. To demonstrate the capacities of the proposed scheme, a new dataset is compiled, compose of goal and no-goal
 sequences. Experimental results demonstrate that extremely high classification accuracy can be achieved, from a dramatically limited number of examples, by leveraging pre-trained models with fine-tuned fusion of spatio-temporal features.</p>
      </abstract>
      <kwd-group>
        <kwd>CONVOLUTIONAL NEURAL NETWORK</kwd>
        <kwd>EVENT DETECTION</kwd>
        <kwd>SPORTS VIDEO</kwd>
        <kwd>GOAL DETECTION</kwd>
        <kwd>FOOTBALL</kwd>
      </kwd-group>
    </article-meta>
  </front>
</article>
