Line segmentation performs a significant stage in the OCR systems; it has a direct effect on the character segmentation stage which affects the recognition rate. In this paper a robust algorithm is proposed for line segmentation for Arabic printed text system with and without diacritics based on finding the global maximum peak and the baseline detection. The algorithm is tested for different font sizes and types and results have been obtained from testing 5 types of fonts with total of 43,055 lines with 99.9 % accuracy for text without diacritics and 99.5% accuracy for text with diacritics.
Muna Ayesh, Khader Mohammad, Aziz Qaroush, Sos Agaian, Mahdi Washha, "A Robust Line Segmentation Algorithm for Arabic Printed Text with Diacritics" in Proc. IS&T Int’l. Symp. on Electronic Imaging: Image Processing: Algorithms and Systems XV, 2017, pp 42 - 47, https://doi.org/10.2352/ISSN.2470-1173.2017.13.IPAS-204