Self-Attention Enhanced Recognition: A Unified Model for Handwriting and Scene-text Recognition with Improved Inference

Gaurav  Patel; Taewook  Kim; Qian  Lin; Jan P. Allebach; Qiang  Qiu

doi:10.2352/EI.2024.36.8.IMAGE-241

Abstract

In this paper, we introduce a unified handwriting and scene-text recognition model tailored to discern both printed and hand-written text images. Our primary contribution is the incorporation of the self-attention mechanism, a salient feature of the transformer architecture. This incorporation leads to two significant advantages: 1) A substantial improvement in the recognition accuracy for both scene-text and handwritten text, and 2) A notable decrease in inference time, addressing a prevalent challenge faced by modern recognizers that utilize sequence-based decoding with attention.

Electronic Imaging

2470-1173

Society for Imaging Science and Technology

IS&T 7003 Kilworth Lane, Springfield, VA 22151 USA

10.2352/EI.2024.36.8.IMAGE-241

IMAGE-241

Proceedings Paper

Self-Attention Enhanced Recognition: A Unified Model for Handwriting and Scene-text Recognition with Improved Inference

PatelGaurav

Purdue University, US

KimTaewook

Purdue University, US

LinQian

HP Inc., US

AllebachJan P.

Purdue University, US

QiuQiang

Purdue University, US

Abstract

2112024

IMAGE

Imaging and Multimedia Analytics at the Edge 2024

241-1

241-6

2024

Handwriting RecognitionSequence ModelsText Recognition

articleview.keywords