A Segmentation Method of Single- and Multiple-Touching Characters in Offline Handwritten Japanese Text Recognition

Kha Cong NGUYEN; Cuong Tuan NGUYEN; Masaki NAKAGAWA

doi:10.1587/transinf.2017EDP7225

A Segmentation Method of Single- and Multiple-Touching Characters in Offline Handwritten Japanese Text Recognition

Kha Cong NGUYEN, Cuong Tuan NGUYEN, Masaki NAKAGAWA

Full Text Views

0

Cite this

Summary :

This paper presents a method to segment single- and multiple-touching characters in offline handwritten Japanese text recognition with practical speed. Distortions due to handwriting and a mix of complex Chinese characters with simple phonetic and alphanumeric characters leave optical handwritten text recognition (OHTR) for Japanese still far from perfection. Segmentation of characters, which touch neighbors on multiple points, is a serious unsolved problem. Therefore, we propose a method to segment them which is made in two steps: coarse segmentation and fine segmentation. The coarse segmentation employs vertical projection, stroke-width estimation while the fine segmentation takes a graph-based approach for thinned text images, which employs a new bridge finding process and Voronoi diagrams with two improvements. Unlike previous methods, it locates character centers and seeks segmentation candidates between them. It draws vertical lines explicitly at estimated character centers in order to prevent vertically unconnected components from being left behind in the bridge finding. Multiple candidates of separation are produced by removing touching points combinatorially. SVM is applied to discard improbable segmentation boundaries. Then, ambiguities are finally solved by the text recognition employing linguistic context and geometric context to recognize segmented characters. The results of our experiments show that the proposed method can segment not only single-touching characters but also multiple-touching characters, and each component in our proposed method contributes to the improvement of segmentation and recognition rates.

Publication: IEICE TRANSACTIONS on Information Vol.E100-D No.12 pp.2962-2972

Publication Date: 2017/12/01

Publicized: 2017/08/23

Online ISSN: 1745-1361

DOI: 10.1587/transinf.2017EDP7225

Type of Manuscript: PAPER

Category: Pattern Recognition

Authors

Kha Cong NGUYEN
  Tokyo University of Agriculture and Technology
Cuong Tuan NGUYEN
  Tokyo University of Agriculture and Technology
Masaki NAKAGAWA
  Tokyo University of Agriculture and Technology

Keyword

handwritten text recognition, offline recognition, touching characters, stroke width, bridge separation, Voronoi diagram, Support Vector Machine

Cite this

Copy

Kha Cong NGUYEN, Cuong Tuan NGUYEN, Masaki NAKAGAWA, "A Segmentation Method of Single- and Multiple-Touching Characters in Offline Handwritten Japanese Text Recognition" in IEICE TRANSACTIONS on Information, vol. E100-D, no. 12, pp. 2962-2972, December 2017, doi: 10.1587/transinf.2017EDP7225.
Abstract: This paper presents a method to segment single- and multiple-touching characters in offline handwritten Japanese text recognition with practical speed. Distortions due to handwriting and a mix of complex Chinese characters with simple phonetic and alphanumeric characters leave optical handwritten text recognition (OHTR) for Japanese still far from perfection. Segmentation of characters, which touch neighbors on multiple points, is a serious unsolved problem. Therefore, we propose a method to segment them which is made in two steps: coarse segmentation and fine segmentation. The coarse segmentation employs vertical projection, stroke-width estimation while the fine segmentation takes a graph-based approach for thinned text images, which employs a new bridge finding process and Voronoi diagrams with two improvements. Unlike previous methods, it locates character centers and seeks segmentation candidates between them. It draws vertical lines explicitly at estimated character centers in order to prevent vertically unconnected components from being left behind in the bridge finding. Multiple candidates of separation are produced by removing touching points combinatorially. SVM is applied to discard improbable segmentation boundaries. Then, ambiguities are finally solved by the text recognition employing linguistic context and geometric context to recognize segmented characters. The results of our experiments show that the proposed method can segment not only single-touching characters but also multiple-touching characters, and each component in our proposed method contributes to the improvement of segmentation and recognition rates.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2017EDP7225/_p

Copy

@ARTICLE{e100-d_12_2962,
author={Kha Cong NGUYEN, Cuong Tuan NGUYEN, Masaki NAKAGAWA, },
journal={IEICE TRANSACTIONS on Information},
title={A Segmentation Method of Single- and Multiple-Touching Characters in Offline Handwritten Japanese Text Recognition},
year={2017},
volume={E100-D},
number={12},
pages={2962-2972},
abstract={This paper presents a method to segment single- and multiple-touching characters in offline handwritten Japanese text recognition with practical speed. Distortions due to handwriting and a mix of complex Chinese characters with simple phonetic and alphanumeric characters leave optical handwritten text recognition (OHTR) for Japanese still far from perfection. Segmentation of characters, which touch neighbors on multiple points, is a serious unsolved problem. Therefore, we propose a method to segment them which is made in two steps: coarse segmentation and fine segmentation. The coarse segmentation employs vertical projection, stroke-width estimation while the fine segmentation takes a graph-based approach for thinned text images, which employs a new bridge finding process and Voronoi diagrams with two improvements. Unlike previous methods, it locates character centers and seeks segmentation candidates between them. It draws vertical lines explicitly at estimated character centers in order to prevent vertically unconnected components from being left behind in the bridge finding. Multiple candidates of separation are produced by removing touching points combinatorially. SVM is applied to discard improbable segmentation boundaries. Then, ambiguities are finally solved by the text recognition employing linguistic context and geometric context to recognize segmented characters. The results of our experiments show that the proposed method can segment not only single-touching characters but also multiple-touching characters, and each component in our proposed method contributes to the improvement of segmentation and recognition rates.},
keywords={},
doi={10.1587/transinf.2017EDP7225},
ISSN={1745-1361},
month={December},}

Copy

TY - JOUR
TI - A Segmentation Method of Single- and Multiple-Touching Characters in Offline Handwritten Japanese Text Recognition
T2 - IEICE TRANSACTIONS on Information
SP - 2962
EP - 2972
AU - Kha Cong NGUYEN
AU - Cuong Tuan NGUYEN
AU - Masaki NAKAGAWA
PY - 2017
DO - 10.1587/transinf.2017EDP7225
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E100-D
IS - 12
JA - IEICE TRANSACTIONS on Information
Y1 - December 2017
AB - This paper presents a method to segment single- and multiple-touching characters in offline handwritten Japanese text recognition with practical speed. Distortions due to handwriting and a mix of complex Chinese characters with simple phonetic and alphanumeric characters leave optical handwritten text recognition (OHTR) for Japanese still far from perfection. Segmentation of characters, which touch neighbors on multiple points, is a serious unsolved problem. Therefore, we propose a method to segment them which is made in two steps: coarse segmentation and fine segmentation. The coarse segmentation employs vertical projection, stroke-width estimation while the fine segmentation takes a graph-based approach for thinned text images, which employs a new bridge finding process and Voronoi diagrams with two improvements. Unlike previous methods, it locates character centers and seeks segmentation candidates between them. It draws vertical lines explicitly at estimated character centers in order to prevent vertically unconnected components from being left behind in the bridge finding. Multiple candidates of separation are produced by removing touching points combinatorially. SVM is applied to discard improbable segmentation boundaries. Then, ambiguities are finally solved by the text recognition employing linguistic context and geometric context to recognize segmented characters. The results of our experiments show that the proposed method can segment not only single-touching characters but also multiple-touching characters, and each component in our proposed method contributes to the improvement of segmentation and recognition rates.
ER -