Recognition of Arabic Printed Scripts by Dynamic Programming Matching Method

Mohamed FAKIR; Chuichi SODEYAMA

Recognition of Arabic Printed Scripts by Dynamic Programming Matching Method

Mohamed FAKIR, Chuichi SODEYAMA

Full Text Views

0

Cite this

Summary :

A method for the recognition of Arabic printed scripts entered from an image scanner is presented. The method uses the Hough transformation (HT) to extract features, Dynamic programming (DP) matching technique, and a topological classifier to recognize the characters. A process of characters recognition is further divided into four parts: preprocessing, segmentation of a word into characters, features extraction, and characters identification. The preprocessing consists of the following steps: smoothing to remove noise, baseline drift correction by using HT, and lines separation by making an horizontal projection profile. After preprocessing, Arabic printed words are segmented into characters by analysing the vertical and the horizontal projection profiles using a threshold. The character or stroke obtained from the segmentation process is normalized in size, then thinned to provide it skeleton from which features are extracted. As in the procedure of straight lines detection, a threshold is applied to every cell and those cells whose count is greater than the threshold are selected. The coordinates (R, θ) of the selected cells are the extracted features. Next, characters are classified in two steps: In the first one, the character main body is classified using DP matching technique, and features selected in the HT space. In the second one, simple topological features extracted from the geometry of the stress marks are used by the topological classifier to completely recognize the characters. The topological features used to classify each type of the stress mark are the width, the height, and the number of black pixels of the stress marks. Knowing both the main group of the character body and the type of the stress mark (if any), the character is completely identified.

Publication: IEICE TRANSACTIONS on Information Vol.E76-D No.2 pp.235-242

Publication Date: 1993/02/25

Publicized

Online ISSN

DOI

Type of Manuscript: PAPER

Category: Image Processing, Computer Graphics and Pattern Recognition

Cite this

Copy

Mohamed FAKIR, Chuichi SODEYAMA, "Recognition of Arabic Printed Scripts by Dynamic Programming Matching Method" in IEICE TRANSACTIONS on Information, vol. E76-D, no. 2, pp. 235-242, February 1993, doi: .
Abstract: A method for the recognition of Arabic printed scripts entered from an image scanner is presented. The method uses the Hough transformation (HT) to extract features, Dynamic programming (DP) matching technique, and a topological classifier to recognize the characters. A process of characters recognition is further divided into four parts: preprocessing, segmentation of a word into characters, features extraction, and characters identification. The preprocessing consists of the following steps: smoothing to remove noise, baseline drift correction by using HT, and lines separation by making an horizontal projection profile. After preprocessing, Arabic printed words are segmented into characters by analysing the vertical and the horizontal projection profiles using a threshold. The character or stroke obtained from the segmentation process is normalized in size, then thinned to provide it skeleton from which features are extracted. As in the procedure of straight lines detection, a threshold is applied to every cell and those cells whose count is greater than the threshold are selected. The coordinates (R, θ) of the selected cells are the extracted features. Next, characters are classified in two steps: In the first one, the character main body is classified using DP matching technique, and features selected in the HT space. In the second one, simple topological features extracted from the geometry of the stress marks are used by the topological classifier to completely recognize the characters. The topological features used to classify each type of the stress mark are the width, the height, and the number of black pixels of the stress marks. Knowing both the main group of the character body and the type of the stress mark (if any), the character is completely identified.
URL: https://global.ieice.org/en_transactions/information/10.1587/e76-d_2_235/_p

Copy

@ARTICLE{e76-d_2_235,
author={Mohamed FAKIR, Chuichi SODEYAMA, },
journal={IEICE TRANSACTIONS on Information},
title={Recognition of Arabic Printed Scripts by Dynamic Programming Matching Method},
year={1993},
volume={E76-D},
number={2},
pages={235-242},
abstract={A method for the recognition of Arabic printed scripts entered from an image scanner is presented. The method uses the Hough transformation (HT) to extract features, Dynamic programming (DP) matching technique, and a topological classifier to recognize the characters. A process of characters recognition is further divided into four parts: preprocessing, segmentation of a word into characters, features extraction, and characters identification. The preprocessing consists of the following steps: smoothing to remove noise, baseline drift correction by using HT, and lines separation by making an horizontal projection profile. After preprocessing, Arabic printed words are segmented into characters by analysing the vertical and the horizontal projection profiles using a threshold. The character or stroke obtained from the segmentation process is normalized in size, then thinned to provide it skeleton from which features are extracted. As in the procedure of straight lines detection, a threshold is applied to every cell and those cells whose count is greater than the threshold are selected. The coordinates (R, θ) of the selected cells are the extracted features. Next, characters are classified in two steps: In the first one, the character main body is classified using DP matching technique, and features selected in the HT space. In the second one, simple topological features extracted from the geometry of the stress marks are used by the topological classifier to completely recognize the characters. The topological features used to classify each type of the stress mark are the width, the height, and the number of black pixels of the stress marks. Knowing both the main group of the character body and the type of the stress mark (if any), the character is completely identified.},
keywords={},
doi={},
ISSN={},
month={February},}

Copy

TY - JOUR
TI - Recognition of Arabic Printed Scripts by Dynamic Programming Matching Method
T2 - IEICE TRANSACTIONS on Information
SP - 235
EP - 242
AU - Mohamed FAKIR
AU - Chuichi SODEYAMA
PY - 1993
DO -
JO - IEICE TRANSACTIONS on Information
SN -
VL - E76-D
IS - 2
JA - IEICE TRANSACTIONS on Information
Y1 - February 1993
AB - A method for the recognition of Arabic printed scripts entered from an image scanner is presented. The method uses the Hough transformation (HT) to extract features, Dynamic programming (DP) matching technique, and a topological classifier to recognize the characters. A process of characters recognition is further divided into four parts: preprocessing, segmentation of a word into characters, features extraction, and characters identification. The preprocessing consists of the following steps: smoothing to remove noise, baseline drift correction by using HT, and lines separation by making an horizontal projection profile. After preprocessing, Arabic printed words are segmented into characters by analysing the vertical and the horizontal projection profiles using a threshold. The character or stroke obtained from the segmentation process is normalized in size, then thinned to provide it skeleton from which features are extracted. As in the procedure of straight lines detection, a threshold is applied to every cell and those cells whose count is greater than the threshold are selected. The coordinates (R, θ) of the selected cells are the extracted features. Next, characters are classified in two steps: In the first one, the character main body is classified using DP matching technique, and features selected in the HT space. In the second one, simple topological features extracted from the geometry of the stress marks are used by the topological classifier to completely recognize the characters. The topological features used to classify each type of the stress mark are the width, the height, and the number of black pixels of the stress marks. Knowing both the main group of the character body and the type of the stress mark (if any), the character is completely identified.
ER -