<?xml version="1.0"?>
                <!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "journalpublishing3.dtd">
                <article article-type="research-article">
                <front>
                    <journal-meta>
                    <journal-id journal-id-type="publisher-id">ei</journal-id>
                    <journal-title>Electronic Imaging</journal-title>
                    <issn pub-type="ppub">2470-1173</issn><issn pub-type="epub">2470-1173</issn>
                    <publisher>
                        <publisher-name>Society for Imaging Science and Technology</publisher-name>
                        <publisher-loc>IS&amp;T 7003 Kilworth Lane, Springfield, VA 22151 USA</publisher-loc>
                    </publisher>
                    </journal-meta>
                    <article-meta>
                    <article-id pub-id-type="doi">10.2352/EI.2023.35.17.3DIA-102</article-id>
                    <article-id pub-id-type="publisher-id">3DIA-102</article-id>
                    <article-categories>
                        <subj-group>
                        <subject>Article</subject>
                        </subj-group>
                    </article-categories>
                    <title-group>
                        <article-title>Learned visual localization with camera pose refinement and verification based on differentiable renderer</article-title>
                    </title-group><contrib-group content-type="all"><contrib contrib-type="author"><name>
                            <surname>Tsai</surname>
                            <given-names>Chanchang </given-names>
                           </name> <xref ref-type="aff" rid="aff1author1"/></contrib><aff id="aff1author1">Tokyo Institute of Technology, Japan</aff></contrib-group><contrib-group content-type="all"><contrib contrib-type="author"><name>
                            <surname>Taira</surname>
                            <given-names>Hajime </given-names>
                           </name> <xref ref-type="aff" rid="aff1author2"/></contrib><aff id="aff1author2">Tokyo Institute of Technology, Japan</aff></contrib-group><contrib-group content-type="all"><contrib contrib-type="author"><name>
                            <surname>Okutomi</surname>
                            <given-names>Masatoshi </given-names>
                           </name> <xref ref-type="aff" rid="aff1author3"/></contrib><aff id="aff1author3">Tokyo Institute of Technology, Japan</aff></contrib-group><abstract>
                    <title>Abstract</title>
                    <p>This manuscript presents a new CNN-based visual localization method that seeks a camera location of an input RGB image with respect to a pre-collected RGB-D images database. To determine an accurate camera pose, we employ a coarse-to-fine localization manner that firstly finds coarse location candidates via image retrieval, then refines them using local 3D structure represented by each retrieved RGB-D image. We use a CNN feature extractor and a relative pose estimator for coarse prediction that do not sufficiently require a scene-specific training. Furthermore, we propose a new pose refinement-verification module that simultaneously evaluates and refines camera poses using differentiable renderer. Experimental results on public datasets show that our proposed pipeline achieves accurate localization on both trained and unknown scenes.</p>
                    </abstract><pub-date>
                        <day>16</day>
                        <month>1</month>
                        <year>2023</year>
                        </pub-date><volume>35</volume>
                    <issue-acronym>3DIA</issue-acronym>
                    <issue-title>3D Imaging and Applications 2023</issue-title>
                    <issue>17</issue>
                    <fpage>102-1</fpage>
                    <lpage>102-6</lpage>
                    <permissions>
                         <copyright-statement>© 2023, Society for Imaging Science and Technology</copyright-statement>
                        <copyright-year>2023</copyright-year>
                    </permissions><kwd-group><kwd>Visual localization</kwd><kwd>Place recognition</kwd><kwd>Deep learning</kwd><kwd>Differentiable renderer</kwd><kwd>Camera pose estimation</kwd><kwd>Pose verification</kwd></kwd-group></article-meta>
                </front>
                </article>