A new character recognition algorithm with a capacity for easy learning has been developed. This algorithm incorporates two different kinds of processes: a triadic decision tree and a pattern matching method. Its significant features are that (1) a user can generate a reliable recognition dictionary by using only a limited number of characters, (2) the user can easily add new categories to the recognition dictionary or modify it for matching to actual patterns, and (3) the recognition speed is fast. We have also developed a unique OCR system on a personal computer, adopting this recognition algorithm. The system has several unique functions: (1) A user does not need to prepare special sample sheets on which all categories of unknown font are printed in alphabetical order. New dictionaries can be generated by using general texts. (2) It provides him with a concurrent spelling correction capability in order to facilitate learning and recognition. We conducted tests on the system to investigate the effect of the number of learning times and the quantity of sample data on the recognition accuracy. In our tests, the dictionary that was generated from eight data per category (the original and the first to third successive copies of two printed sets) attained a recognition accuracy of 99.6%. In another test, the recognition accuracy of the test documents was improved from 92.3% to 99.5% by modifying a different font dictionary four times with the test font data. A recognition speed of 120 char/sec was achieved by our software program on an IBM PS/55 personal computer.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Hiroyasu TAKAHASHI, Akio YAMASHITA, Nobuyasu ITOH, Tomio AMANO, "A Hybrid Recognition Algorithm with Learning Capability and Its Application to an OCR System" in IEICE TRANSACTIONS on transactions,
vol. E73-E, no. 4, pp. 577-586, April 1990, doi: .
Abstract: A new character recognition algorithm with a capacity for easy learning has been developed. This algorithm incorporates two different kinds of processes: a triadic decision tree and a pattern matching method. Its significant features are that (1) a user can generate a reliable recognition dictionary by using only a limited number of characters, (2) the user can easily add new categories to the recognition dictionary or modify it for matching to actual patterns, and (3) the recognition speed is fast. We have also developed a unique OCR system on a personal computer, adopting this recognition algorithm. The system has several unique functions: (1) A user does not need to prepare special sample sheets on which all categories of unknown font are printed in alphabetical order. New dictionaries can be generated by using general texts. (2) It provides him with a concurrent spelling correction capability in order to facilitate learning and recognition. We conducted tests on the system to investigate the effect of the number of learning times and the quantity of sample data on the recognition accuracy. In our tests, the dictionary that was generated from eight data per category (the original and the first to third successive copies of two printed sets) attained a recognition accuracy of 99.6%. In another test, the recognition accuracy of the test documents was improved from 92.3% to 99.5% by modifying a different font dictionary four times with the test font data. A recognition speed of 120 char/sec was achieved by our software program on an IBM PS/55 personal computer.
URL: https://global.ieice.org/en_transactions/transactions/10.1587/e73-e_4_577/_p
Copy
@ARTICLE{e73-e_4_577,
author={Hiroyasu TAKAHASHI, Akio YAMASHITA, Nobuyasu ITOH, Tomio AMANO, },
journal={IEICE TRANSACTIONS on transactions},
title={A Hybrid Recognition Algorithm with Learning Capability and Its Application to an OCR System},
year={1990},
volume={E73-E},
number={4},
pages={577-586},
abstract={A new character recognition algorithm with a capacity for easy learning has been developed. This algorithm incorporates two different kinds of processes: a triadic decision tree and a pattern matching method. Its significant features are that (1) a user can generate a reliable recognition dictionary by using only a limited number of characters, (2) the user can easily add new categories to the recognition dictionary or modify it for matching to actual patterns, and (3) the recognition speed is fast. We have also developed a unique OCR system on a personal computer, adopting this recognition algorithm. The system has several unique functions: (1) A user does not need to prepare special sample sheets on which all categories of unknown font are printed in alphabetical order. New dictionaries can be generated by using general texts. (2) It provides him with a concurrent spelling correction capability in order to facilitate learning and recognition. We conducted tests on the system to investigate the effect of the number of learning times and the quantity of sample data on the recognition accuracy. In our tests, the dictionary that was generated from eight data per category (the original and the first to third successive copies of two printed sets) attained a recognition accuracy of 99.6%. In another test, the recognition accuracy of the test documents was improved from 92.3% to 99.5% by modifying a different font dictionary four times with the test font data. A recognition speed of 120 char/sec was achieved by our software program on an IBM PS/55 personal computer.},
keywords={},
doi={},
ISSN={},
month={April},}
Copy
TY - JOUR
TI - A Hybrid Recognition Algorithm with Learning Capability and Its Application to an OCR System
T2 - IEICE TRANSACTIONS on transactions
SP - 577
EP - 586
AU - Hiroyasu TAKAHASHI
AU - Akio YAMASHITA
AU - Nobuyasu ITOH
AU - Tomio AMANO
PY - 1990
DO -
JO - IEICE TRANSACTIONS on transactions
SN -
VL - E73-E
IS - 4
JA - IEICE TRANSACTIONS on transactions
Y1 - April 1990
AB - A new character recognition algorithm with a capacity for easy learning has been developed. This algorithm incorporates two different kinds of processes: a triadic decision tree and a pattern matching method. Its significant features are that (1) a user can generate a reliable recognition dictionary by using only a limited number of characters, (2) the user can easily add new categories to the recognition dictionary or modify it for matching to actual patterns, and (3) the recognition speed is fast. We have also developed a unique OCR system on a personal computer, adopting this recognition algorithm. The system has several unique functions: (1) A user does not need to prepare special sample sheets on which all categories of unknown font are printed in alphabetical order. New dictionaries can be generated by using general texts. (2) It provides him with a concurrent spelling correction capability in order to facilitate learning and recognition. We conducted tests on the system to investigate the effect of the number of learning times and the quantity of sample data on the recognition accuracy. In our tests, the dictionary that was generated from eight data per category (the original and the first to third successive copies of two printed sets) attained a recognition accuracy of 99.6%. In another test, the recognition accuracy of the test documents was improved from 92.3% to 99.5% by modifying a different font dictionary four times with the test font data. A recognition speed of 120 char/sec was achieved by our software program on an IBM PS/55 personal computer.
ER -