Back to articles
Articles
Volume: 30 | Article ID: art00009
Image
Approach for Machine-Printed Arabic Character Recognition: the-state-of-the-art deep-learning method
  DOI :  10.2352/ISSN.2470-1173.2018.2.VIPC-176  Published OnlineJanuary 2018
Abstract

Optical character recognition (OCR) automatically recognizes texts in an image and converts them into machine codes such as ASCII or Unicode. Compared to many research studied on OCR for other languages, recognizing Arabic language is still a challenging problem due to character connection and segmentation issues. In this work, we propose a deep-learning framework of recognizing Arabic characters based on the multi-dimensional bi-direction long short-term memory (MD-BLSTM) with connectionist temporal classification (CTC). To train this framework, we generate over one-million Arabic text-line images dataset that contains Arabic digits, basic Arabic forms with isolated shape and connected forms. To compare the results, we also measure the performance of other OCR software such as Tesseract made by Hewlett-Packard and Google Inc. Tesseract version 3 and version 4 are used. Results show that deep-learning method outperforms the conventional methods in terms of recognition error rate, although the Tesseract_3.0 system was faster.

Subject Areas :
Views 28
Downloads 6
 articleview.views 28
 articleview.downloads 6
  Cite this article 

Daegun Ko, Changhyung Lee, Donghyeop Han, Hyeongsu Ohk, Kimin Kang, Seongwook Han, "Approach for Machine-Printed Arabic Character Recognition: the-state-of-the-art deep-learning methodin Proc. IS&T Int’l. Symp. on Electronic Imaging: Visual Information Processing and Communication IX,  2018,  pp 176-1 - 176-8,  https://doi.org/10.2352/ISSN.2470-1173.2018.2.VIPC-176

 Copy citation
  Copyright statement 
Copyright © Society for Imaging Science and Technology 2018
72010604
Electronic Imaging
2470-1173
Society for Imaging Science and Technology