A method for the recognition of Arabic printed scripts entered from an image scanner is presented. The method uses the Hough transformation (HT) to extract features, Dynamic programming (DP) matching technique, and a topological classifier to recognize the characters. A process of characters recognition is further divided into four parts: preprocessing, segmentation of a word into characters, features extraction, and characters identification. The preprocessing consists of the following steps: smoothing to remove noise, baseline drift correction by using HT, and lines separation by making an horizontal projection profile. After preprocessing, Arabic printed words are segmented into characters by analysing the vertical and the horizontal projection profiles using a threshold. The character or stroke obtained from the segmentation process is normalized in size, then thinned to provide it skeleton from which features are extracted. As in the procedure of straight lines detection, a threshold is applied to every cell and those cells whose count is greater than the threshold are selected. The coordinates (R, θ) of the selected cells are the extracted features. Next, characters are classified in two steps: In the first one, the character main body is classified using DP matching technique, and features selected in the HT space. In the second one, simple topological features extracted from the geometry of the stress marks are used by the topological classifier to completely recognize the characters. The topological features used to classify each type of the stress mark are the width, the height, and the number of black pixels of the stress marks. Knowing both the main group of the character body and the type of the stress mark (if any), the character is completely identified.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Mohamed FAKIR, Chuichi SODEYAMA, "Recognition of Arabic Printed Scripts by Dynamic Programming Matching Method" in IEICE TRANSACTIONS on Information,
vol. E76-D, no. 2, pp. 235-242, February 1993, doi: .
Abstract: A method for the recognition of Arabic printed scripts entered from an image scanner is presented. The method uses the Hough transformation (HT) to extract features, Dynamic programming (DP) matching technique, and a topological classifier to recognize the characters. A process of characters recognition is further divided into four parts: preprocessing, segmentation of a word into characters, features extraction, and characters identification. The preprocessing consists of the following steps: smoothing to remove noise, baseline drift correction by using HT, and lines separation by making an horizontal projection profile. After preprocessing, Arabic printed words are segmented into characters by analysing the vertical and the horizontal projection profiles using a threshold. The character or stroke obtained from the segmentation process is normalized in size, then thinned to provide it skeleton from which features are extracted. As in the procedure of straight lines detection, a threshold is applied to every cell and those cells whose count is greater than the threshold are selected. The coordinates (R, θ) of the selected cells are the extracted features. Next, characters are classified in two steps: In the first one, the character main body is classified using DP matching technique, and features selected in the HT space. In the second one, simple topological features extracted from the geometry of the stress marks are used by the topological classifier to completely recognize the characters. The topological features used to classify each type of the stress mark are the width, the height, and the number of black pixels of the stress marks. Knowing both the main group of the character body and the type of the stress mark (if any), the character is completely identified.
URL: https://global.ieice.org/en_transactions/information/10.1587/e76-d_2_235/_p
Copy
@ARTICLE{e76-d_2_235,
author={Mohamed FAKIR, Chuichi SODEYAMA, },
journal={IEICE TRANSACTIONS on Information},
title={Recognition of Arabic Printed Scripts by Dynamic Programming Matching Method},
year={1993},
volume={E76-D},
number={2},
pages={235-242},
abstract={A method for the recognition of Arabic printed scripts entered from an image scanner is presented. The method uses the Hough transformation (HT) to extract features, Dynamic programming (DP) matching technique, and a topological classifier to recognize the characters. A process of characters recognition is further divided into four parts: preprocessing, segmentation of a word into characters, features extraction, and characters identification. The preprocessing consists of the following steps: smoothing to remove noise, baseline drift correction by using HT, and lines separation by making an horizontal projection profile. After preprocessing, Arabic printed words are segmented into characters by analysing the vertical and the horizontal projection profiles using a threshold. The character or stroke obtained from the segmentation process is normalized in size, then thinned to provide it skeleton from which features are extracted. As in the procedure of straight lines detection, a threshold is applied to every cell and those cells whose count is greater than the threshold are selected. The coordinates (R, θ) of the selected cells are the extracted features. Next, characters are classified in two steps: In the first one, the character main body is classified using DP matching technique, and features selected in the HT space. In the second one, simple topological features extracted from the geometry of the stress marks are used by the topological classifier to completely recognize the characters. The topological features used to classify each type of the stress mark are the width, the height, and the number of black pixels of the stress marks. Knowing both the main group of the character body and the type of the stress mark (if any), the character is completely identified.},
keywords={},
doi={},
ISSN={},
month={February},}
Copy
TY - JOUR
TI - Recognition of Arabic Printed Scripts by Dynamic Programming Matching Method
T2 - IEICE TRANSACTIONS on Information
SP - 235
EP - 242
AU - Mohamed FAKIR
AU - Chuichi SODEYAMA
PY - 1993
DO -
JO - IEICE TRANSACTIONS on Information
SN -
VL - E76-D
IS - 2
JA - IEICE TRANSACTIONS on Information
Y1 - February 1993
AB - A method for the recognition of Arabic printed scripts entered from an image scanner is presented. The method uses the Hough transformation (HT) to extract features, Dynamic programming (DP) matching technique, and a topological classifier to recognize the characters. A process of characters recognition is further divided into four parts: preprocessing, segmentation of a word into characters, features extraction, and characters identification. The preprocessing consists of the following steps: smoothing to remove noise, baseline drift correction by using HT, and lines separation by making an horizontal projection profile. After preprocessing, Arabic printed words are segmented into characters by analysing the vertical and the horizontal projection profiles using a threshold. The character or stroke obtained from the segmentation process is normalized in size, then thinned to provide it skeleton from which features are extracted. As in the procedure of straight lines detection, a threshold is applied to every cell and those cells whose count is greater than the threshold are selected. The coordinates (R, θ) of the selected cells are the extracted features. Next, characters are classified in two steps: In the first one, the character main body is classified using DP matching technique, and features selected in the HT space. In the second one, simple topological features extracted from the geometry of the stress marks are used by the topological classifier to completely recognize the characters. The topological features used to classify each type of the stress mark are the width, the height, and the number of black pixels of the stress marks. Knowing both the main group of the character body and the type of the stress mark (if any), the character is completely identified.
ER -