1-1hit |
Mohamed FAKIR Chuichi SODEYAMA
A method for the recognition of Arabic printed scripts entered from an image scanner is presented. The method uses the Hough transformation (HT) to extract features, Dynamic programming (DP) matching technique, and a topological classifier to recognize the characters. A process of characters recognition is further divided into four parts: preprocessing, segmentation of a word into characters, features extraction, and characters identification. The preprocessing consists of the following steps: smoothing to remove noise, baseline drift correction by using HT, and lines separation by making an horizontal projection profile. After preprocessing, Arabic printed words are segmented into characters by analysing the vertical and the horizontal projection profiles using a threshold. The character or stroke obtained from the segmentation process is normalized in size, then thinned to provide it skeleton from which features are extracted. As in the procedure of straight lines detection, a threshold is applied to every cell and those cells whose count is greater than the threshold are selected. The coordinates (R, θ) of the selected cells are the extracted features. Next, characters are classified in two steps: In the first one, the character main body is classified using DP matching technique, and features selected in the HT space. In the second one, simple topological features extracted from the geometry of the stress marks are used by the topological classifier to completely recognize the characters. The topological features used to classify each type of the stress mark are the width, the height, and the number of black pixels of the stress marks. Knowing both the main group of the character body and the type of the stress mark (if any), the character is completely identified.