Table-form document structure analysis is an important problem in the document processing domain. This paper presents a new method called Box-Driven Reasoning (BDR) to robustly analyze the structure of table-form documents that include touching characters and broken lines. Real documents are copied repeatedly and overlaid with printed data, resulting in characters that touch cells and lines that are broken. Most previous methods employ a line-oriented approach, but touching characters and broken lines make the procedure fail at an early stage. BDR deals with regions directly in contrast with other previous methods and a reduced resolution image is introduced to supplement information deteriorated by noise. Experimental tests show that BDR reliably recognizes cells and strings in document images with touching characters and broken lines.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Osamu HORI, David S. DOERMANN, "Table-Form Structure Analysis Based on Box-Driven Reasoning" in IEICE TRANSACTIONS on Information,
vol. E79-D, no. 5, pp. 542-547, May 1996, doi: .
Abstract: Table-form document structure analysis is an important problem in the document processing domain. This paper presents a new method called Box-Driven Reasoning (BDR) to robustly analyze the structure of table-form documents that include touching characters and broken lines. Real documents are copied repeatedly and overlaid with printed data, resulting in characters that touch cells and lines that are broken. Most previous methods employ a line-oriented approach, but touching characters and broken lines make the procedure fail at an early stage. BDR deals with regions directly in contrast with other previous methods and a reduced resolution image is introduced to supplement information deteriorated by noise. Experimental tests show that BDR reliably recognizes cells and strings in document images with touching characters and broken lines.
URL: https://global.ieice.org/en_transactions/information/10.1587/e79-d_5_542/_p
Copy
@ARTICLE{e79-d_5_542,
author={Osamu HORI, David S. DOERMANN, },
journal={IEICE TRANSACTIONS on Information},
title={Table-Form Structure Analysis Based on Box-Driven Reasoning},
year={1996},
volume={E79-D},
number={5},
pages={542-547},
abstract={Table-form document structure analysis is an important problem in the document processing domain. This paper presents a new method called Box-Driven Reasoning (BDR) to robustly analyze the structure of table-form documents that include touching characters and broken lines. Real documents are copied repeatedly and overlaid with printed data, resulting in characters that touch cells and lines that are broken. Most previous methods employ a line-oriented approach, but touching characters and broken lines make the procedure fail at an early stage. BDR deals with regions directly in contrast with other previous methods and a reduced resolution image is introduced to supplement information deteriorated by noise. Experimental tests show that BDR reliably recognizes cells and strings in document images with touching characters and broken lines.},
keywords={},
doi={},
ISSN={},
month={May},}
Copy
TY - JOUR
TI - Table-Form Structure Analysis Based on Box-Driven Reasoning
T2 - IEICE TRANSACTIONS on Information
SP - 542
EP - 547
AU - Osamu HORI
AU - David S. DOERMANN
PY - 1996
DO -
JO - IEICE TRANSACTIONS on Information
SN -
VL - E79-D
IS - 5
JA - IEICE TRANSACTIONS on Information
Y1 - May 1996
AB - Table-form document structure analysis is an important problem in the document processing domain. This paper presents a new method called Box-Driven Reasoning (BDR) to robustly analyze the structure of table-form documents that include touching characters and broken lines. Real documents are copied repeatedly and overlaid with printed data, resulting in characters that touch cells and lines that are broken. Most previous methods employ a line-oriented approach, but touching characters and broken lines make the procedure fail at an early stage. BDR deals with regions directly in contrast with other previous methods and a reduced resolution image is introduced to supplement information deteriorated by noise. Experimental tests show that BDR reliably recognizes cells and strings in document images with touching characters and broken lines.
ER -