1-2hit |
Kha Cong NGUYEN Cuong Tuan NGUYEN Masaki NAKAGAWA
This paper presents a method to segment single- and multiple-touching characters in offline handwritten Japanese text recognition with practical speed. Distortions due to handwriting and a mix of complex Chinese characters with simple phonetic and alphanumeric characters leave optical handwritten text recognition (OHTR) for Japanese still far from perfection. Segmentation of characters, which touch neighbors on multiple points, is a serious unsolved problem. Therefore, we propose a method to segment them which is made in two steps: coarse segmentation and fine segmentation. The coarse segmentation employs vertical projection, stroke-width estimation while the fine segmentation takes a graph-based approach for thinned text images, which employs a new bridge finding process and Voronoi diagrams with two improvements. Unlike previous methods, it locates character centers and seeks segmentation candidates between them. It draws vertical lines explicitly at estimated character centers in order to prevent vertically unconnected components from being left behind in the bridge finding. Multiple candidates of separation are produced by removing touching points combinatorially. SVM is applied to discard improbable segmentation boundaries. Then, ambiguities are finally solved by the text recognition employing linguistic context and geometric context to recognize segmented characters. The results of our experiments show that the proposed method can segment not only single-touching characters but also multiple-touching characters, and each component in our proposed method contributes to the improvement of segmentation and recognition rates.
Nam Tuan LY Kha Cong NGUYEN Cuong Tuan NGUYEN Masaki NAKAGAWA
This paper presents recognition of anomalously deformed Kana sequences in Japanese historical documents, for which a contest was held by IEICE PRMU 2017. The contest was divided into three levels in accordance with the number of characters to be recognized: level 1: single characters, level 2: sequences of three vertically written Kana characters, and level 3: unrestricted sets of characters composed of three or more characters possibly in multiple lines. This paper focuses on the methods for levels 2 and 3 that won the contest. We basically follow the segmentation-free approach and employ the hierarchy of a Convolutional Neural Network (CNN) for feature extraction, Bidirectional Long Short-Term Memory (BLSTM) for frame prediction, and Connectionist Temporal Classification (CTC) for text recognition, which is named a Deep Convolutional Recurrent Network (DCRN). We compare the pretrained CNN approach and the end-to-end approach with more detailed variations for level 2. Then, we propose a method of vertical text line segmentation and multiple line concatenation before applying DCRN for level 3. We also examine a two-dimensional BLSTM (2DBLSTM) based method for level 3. We present the evaluation of the best methods by cross validation. We achieved an accuracy of 89.10% for the three-Kana-character sequence recognition and an accuracy of 87.70% for the unrestricted Kana recognition without employing linguistic context. These results prove the performances of the proposed models on the level 2 and 3 tasks.