The search functionality is under construction.
The search functionality is under construction.

Open Access
TDEM: Table Data Extraction Model Based on Cell Segmentation

Zhe WANG, Zhe-Ming LU, Hao LUO, Yang-Ming ZHENG

  • Full Text Views

    17

  • Cite this
  • Free PDF (418.7KB)

Summary :

To accurately extract tabular data, we propose a novel cell-based tabular data extraction model (TDEM). The key of TDEM is to utilize grayscale projection of row separation lines, coupled with table masks and column masks generated by the VGG-19 neural network, to segment each individual cell from the input image of the table. In this way, the text content of the table is extracted from a specific single cell, which greatly improves the accuracy of table recognition.

Publication
IEICE TRANSACTIONS on Information Vol.E107-D No.10 pp.1376-1379
Publication Date
2024/10/01
Publicized
2024/05/30
Online ISSN
1745-1361
DOI
10.1587/transinf.2024EDL8029
Type of Manuscript
LETTER
Category
Artificial Intelligence, Data Mining

1.  Introduction

Image information has gradually replaced traditional paper document information and become an indispensable part of people’s daily lives. Tables in electronic document images are an important form of presenting data and key information. Extracting the content of table areas from electronic document images correctly and completely is a challenging task.

Many previous works that extract table structure are heuristic methods-based systems that analyze PDF drawing commands [1]-[4], they don’t directly analyze the document image. Recently, deep learning has been proposed to learn table structure directly from images. The DeepDeSRT model [12] uses a neural network originally designed for semantic segmentation of natural scenes and thus primarily uses local information to classify pixels. However, the direct recognition without incorporating semantic information of table text leads to unsatisfactory results. Existing algorithms also fail to preserve the structured representation of the recognized results. To solve these issues, this letter proposes an accurate tabular data extraction model based on cell segmentation (TDEM).

For tables with complete frame lines, TDEM exploits the interdependence between the twin tasks of table detection and table structure recognition combined with pre-trained VGG-19 features to segment out the table and column regions. TDEM uses the grayscale projection of the table region to determine the position of the row separation lines, thereby cropping each cell in the table. In addition, we also produced the lined table dataset ICDAR-2019_line based on the ICDAR-2019 dataset [5]. Most of the existing table recognition models are trained and tested on the ICDAR-2013 dataset [6], and cannot achieve ideal results on the ICDAR-2019 dataset. Aiming at the extraction accuracy of the table content in the image, this letter proposes a new evaluation metric, which is different from the traditional F-measure, Recall value and Precision value.

In summary, the main contributions of this letter are as follows:

1) We propose TDEM: an accurate tabular data extraction model based on cell segmentation, achieving state-of-the-art performance on the ICDAR-2019_line dataset. Furthermore, our approach allows for structured storage of the results, e.g. in Excel format.

2) We also propose a novel evaluation criterion \(A_{e}\) for tabular text content extraction other than the traditional F-measure, Recall value and Precision value.

3) We select wired frame tables from the ICDAR-2019 dataset and the Marmot dataset and annotate the cells and table borders, resulting in the ICDAR-2019_line dataset for table data extraction.

2.  Related Work

Before deep learning was applied to table structure recognition, traditional table structure recognition algorithms were mainly based on heuristic rules, that is, specifying a set of rules to make decisions in order to identify tables that meet specific conditions [7]-[10]. However, the table recognition method based on heuristic rules is more complicated to design, it is difficult to obtain high accuracy in table recognition in various scenarios, and the robustness is relatively poor.

The table recognition task is often divided into two separate tasks to solve. First, the table detection is performed to locate the table area in the image, and then the structure recognition is performed on the segmented table, and finally the complete table structure information is obtained. A single model is difficult to solve practical problems, and an end-to-end table recognition system [11] is equally important. To overcome the shortcomings of traditional heuristic rule-based table recognition methods that are complex and have low generalization ability, a data-driven end-to-end table recognition system DeepDeSRT was proposed by Schreiber et al. [12] as early as 2017. The system consists of two independent table detection and structure recognition parts. Then in 2019, Tensmeyer et al. [13] proposed the deep learning model SPLERGE for table structure recognition, which consists of two models, split and merge. In the same year, Paliwal et al. [14] proposed an end-to-end image semantic segmentation model TableNet which uses the interdependence between the two tasks of table detection and table structure recognition to segment table and column areas, further improving the recognition accuracy. Recently, Prasad et al. [15] proposed a deep learning end-to-end convolutional neural network model CascadeTabNet that uses instance segmentation technology to complete table recognition tasks, and obtain state-of-the-art results on the corresponding dataset. Currently, existing table recognition algorithms can only provide the recognition accuracy of the table structure, but they cannot combine the textual content of the table to extract the complete table for users from the input image. Our TDEM method effectively solves this problem by combining accurately recognized table structures with table text content, outputting the complete table contained in the image.

3.  Methodology

3.1  Table and Column Detection Module Based on VGG-19

As shown in Fig. 1, the input image to the model is first converted to RGB image and then resized to \(1024 \times 1024\) resolution. The fully connected layers of VGG-19 are replaced by two (\(1 \times 1\)) convolutional layers, which become two different branches of the decoder network. In each branch, additional layers are appended to filter out the respective active regions. In the table branch of the decoder network, an additional (\(1 \times 1\)) convolutional layer is used, followed by a series of fractional stride convolutional layers to upscale the image. Finally, the final feature map is up-scaled to meet the original image size. In the other branch of the detection column, there is an additional convolutional layer with ReLU activation function and a dropout layer with the same dropout probability among the additional convolutional layers. After this layer, the feature maps are upscaled to the original image. Thus, the outputs of the two branches of the computation graph yield masks of table and column regions as shown in Fig. 2 (b), (c).

Fig. 1  TDEM takes in a document image and two decoder branches generate separate table predictions and column predictions. The cell images are segmented by grayscale projection.

Fig. 2  (a) The raw document image. (b) Generated table mask after VGG-19. (c) Generated column mask after VGG-19. (d) The grayscale image of the extracted table region. (e) The histogram of horizontal projection distribution. (f) The specifically extracted individual cell.

3.2  Cell Data Extraction Based on Grayscale Projection

Through the masks generated by the VGG-19 network, we filter out the table regions from the images. We process the images of the table regions by binary preprocessing to obtain grayscale images of the tables as shown in Fig. 2 (d). Projection processing is then applied to the grayscale images in the horizontal direction, and the cumulative number of black pixels in each row is calculated to obtain the histogram of horizontal projection distribution of the tables as shown in Fig. 2 (e). Since there are blank character gaps in the text content of the cells and the row separator lines are composed of continuous black pixels, it can be observed that the pixel accumulation values of the row separator lines after projection are much larger than those in other areas of the tables. Then the vertical coordinates of the row separator lines can be selected based on a certain threshold. The procedure for determining the vertical coordinates of the row separator lines is as follows:

1) Let \(N\) denote the number of rows of the table region image, for \(1 \le i \le N\), select all the \(i\)’s that satisfy \(A(i) > \mathit{minHor}\), and store them in array \(H[y]\). The threshold \(\mathit{minHor}\) can be determined by \(\max(A(i)) \times p\), where \(p\) is a hyperparameter and we set it to 0.7 in this work.

2) Further filter \(H[y]\) based on the threshold \(\mathit{lineHor}\). If the difference between several coordinates is less than \(\mathit{lineHor}\), then select the median value as the final vertical coordinate of the row separator line, and store the final vertical coordinate in the array \(\mathit{finlH}[y]\). The threshold \(\mathit{lineHor}\) can be interpreted as the maximum thickness of the row separator lines, and we set it to 4. The details of the table row separator locator are summarized in Algorithm 1.

After obtaining the accurate vertical coordinate values of the row separator lines, combined with the table column regions filtered out by the VGG-19 column region masks, we can accurately crop out each cell in the table as shown in Fig. 2 (f). By performing OCR text recognition on specific cells, the accuracy of table content recognition in TDEM has been significantly improved.

4.  Experiments

4.1  Dataset and Criterion

To effectively test the model’s ability to extract complex lined table data, we created an ICDAR-2019_line dataset containing 3000 images and 4739 tables by merging the two datasets of ICDAR-2019 (TRACT B) and Marmot. Annotations to table columns and rows are missing in the Marmot dataset. By adding labels to the bounding boxes around each column and row of the tabular region, we manually annotate the dataset for table structure identification. The ICDAR-2019_line dataset contains not only English tables but also Chinese tables. Most of these Chinese tables are in the form of financial statements and ownership structures. We randomly split the ICDAR-2019_line dataset into 2.4k/0.3k/0.3k for training, validation, and testing. Tables in the ICDAR-2019_line dataset exhibit greater complexity, visual appearance, and structure compared to the ICDAR-2013 dataset.

Table data extraction is based on table detection and structure recognition, with an additional step of cell text recognition. To achieve optimal extraction results, higher demands should be placed on the accuracy of cell text recognition. Traditional evaluation metrics such as F-measure, Recall, and Precision are not comprehensive as they only reflect the accuracy of models in table detection and structural recognition. In order to more intuitively demonstrate the effectiveness of data extraction, we additionally incorporate a comparison of the recognition results of table text content and the ground truth, introducing a new evaluation metric: Accuracy of Table Data Extraction (\(A_{e}\)), as shown in formula (1). In the formula, \(C\) represents the number of cells whose text content and structural position are correctly identified in the table, \(T\) represents the total number of cells in the table, and \(A_{e}\) is the final accuracy of tabular data extraction.

\[\begin{equation*} A_e = \frac{C}{T} \times 100\% \tag{1} \end{equation*}\]

4.2  Results of Table Structure Recognition Algorithm

This section presents the experimental results of all models on the ICDAR-2019_line dataset. Model performance is evaluated based on traditional metrics including F-measure, recall and precision values. These measures are computed for each image and averaged over all document images.

As TDEM is a tabular data extraction algorithm composed of a combination of deep learning preprocessing methods and traditional projection post-processing steps, we not only conduct experiments to compare it with table structure recognition algorithms based on deep learning methods [12]-[15], but also include comparisons with traditional methods [3], [4]. As shown in Table 1, our approach achieves an F-measure value of 96.64% on the ICDAR-2019_line dataset, surpassing not only the F-measure value of 79.45% obtained by the traditional method TEXUS [4] but also outperforming the state-of-the-art deep learning method, the CascadeTabNet [15] model, with an F-measure value of 94.92%. Through comparative analysis, it is evident that deep learning methods generally outperform traditional methods in terms of table structure recognition accuracy. Hence, TDEM utilizes the VGG-19 neural network as a preprocessing network to accomplish table region detection and column region recognition tasks.

Table 1  F-measure, recall value and precision value of models on ICDAR-2019_line table dataset.

4.3  \(A_{e}\) of Table Structure Recognition Algorithm

Table 2 further reveals the regular patterns of the \(A_{e}\) values assessed for all models on the ICDAR-2019_line dataset. As depicted, the TDEM model, based on cell instance segmentation, distinctly outshines the comparative methods. Through the fusion of semantic information from table text and accurate cell segmentation leveraging row separator lines projection, TDEM achieves the highest accuracy in tabular data extraction among all models, achieving an \(A_{e}\) value of 89.26%. Throughout our experimentation, we observed that all table recognition algorithms perform well in processing tables like the one depicted in Fig. 3 (b), which features clear boundary lines, uniform inter-row spacing, and straightforward textual content. However, when confronted with tables like that shown in Fig. 3 (a), which features lack of column separator lines, irregular inter-row spacing, and intricate textual content, traditional table recognition methods, along with most deep learning methods, are ineffective. This accentuates the importance of semantic information within table text and validates that recognition methods based on cell segmentation can significantly outperform alternative recognition approaches.

Table 2  Experimental results with the metric \(A_{e}\) for extraction accuracy analysis.

Fig. 3  Various table images inducing alterations in \(A_{e}\).

5.  Conclusion

In this letter, we present an accurate tabular data extraction model based on cell segmentation (TDEM), aiming at the data extraction of lined table from document images. TDEM is the first model to extract tabular data from the cell level and save the result in Excel format. On the ICDAR-2019_line dataset, TDEM has achieved a far higher accuracy of tabular data extraction than other models. In the future, we plan to predict the appearance position of the row-column separation lines after obtaining the table area, and then divide the cell area, to further improve the tabular data extraction result of the complex table by the TDEM model.

Acknowledgements

This work was partially supported by Ningbo Science and Technology Innovation 2025 major project under grants 2020Z106 and 2023Z040.

References

[1] J. Liu, X. Ding, and Y. Wu, “Description and recognition of form and automated form data entry,” Proc. Third International Conference on Document Analysis and Recognition, 1995. doi: 1109/ICDAR. 1995.601963
CrossRef

[2] A. Shigarov, A. Mikhailov, and A. Altaev, “Configurable table structure recognition in untagged PDF documentsm” ACM Symposium on Document Engineering, pp.119-122, 2016.
CrossRef

[3] J. Li, K. Wang, S. Hao, and Q.R. Wang, “Location and recognition of free tables in form,” 2012 International Conference on Software Engineering, Knowledge Engineering and Information Engineering: Theory and Practice, Advances in Intelligent and Soft Computing, vol.162, pp.685-692, Springer, Berlin, Heidelberg., 2012.
CrossRef

[4] R. Rastan, H.-Y. Paik, and J. Shepherd, “TEXUS: A task-based approach for table extraction and understanding,” Proc. 2015 ACM Symposium on Document Engineering, pp.25-34, 2015.
CrossRef

[5] L. Gao, Y. Huang, H. Déjean, J.-L. Meunier, Q. Yan, Y. Fang, F. Kleber, and E. Lang, “ICDAR 2019 competition on table detection and recognition (cTDaR),” 2019 International Conference on Document Analysis and Recognition (ICDAR), Sydney, NSW, Australia, pp.1510-1515, 2019.
CrossRef

[6] M. Göbel, T. Hassan, E. Oro, and G. Orsi, “ICDAR 2013 table competition,” 2013 12th International Conference on Document Analysis and Recognition, Washington, DC, USA, pp.1449-1453, 2013.
CrossRef

[7] A. Laurentini and P. Viada, “Identifying and understanding tabular material in compound documents,” Proc. 11th IAPR International Conference on Pattern Recognition, vol.II, Conference B: Pattern Recognition Methodology and Systems, The Hague, Netherlands, pp.405-409, 1992.
CrossRef

[8] S. Chandran and R. Kasturi, “Structural recognition of tabulated data,” Proc. 2nd International Conference on Document Analysis and Recognition (ICDAR’93), Tsukuba, Japan, pp.516-519, 1993.
CrossRef

[9] C. Akinlar and C. Topal, “Edlines: Real-time line segment detection by Edge Drawing (ed),” Proc. 18th IEEE International Conference on Image Processing, Brussels, Belgium, pp.2837-2840, 2011.
CrossRef

[10] E. Koci, M. Thiele, O. Romero, and W. Lehner, “A genetic-based search for adaptive table recognition in spreadsheets,” 2019 International Conference on Document Analysis and Recognition (ICDAR), Sydney, NSW, Australia, pp.1274-1279, 2019.
CrossRef

[11] A.C. Silva, A.M. Jorge and L. Torgo, “Design of an end-to-end method to extract information from tables,” 2006 International Journal of Document Analysis and Recognition, vol.8, pp.144-171, 2006.
CrossRef

[12] S. Schreiber, S. Agne, I. Wolf, A. Dengel, and S. Ahmed, “DeepDeSRT: Deep learning for detection and structure recognition of tables in document images,” 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Kyoto, Japan, pp.1162-1167, 2017.
CrossRef

[13] C. Tensmeyer, V.I. Morariu, B. Price, S. Cohen, and T. Martinez, “Deep splitting and merging for table structure decomposition,” 2019 International Conference on Document Analysis and Recognition (ICDAR), Sydney, NSW, Australia, pp.114-121, 2019.
CrossRef

[14] S.S. Paliwal, V. D, R. Rahul, M. Sharma, and L. Vig, “TableNet: Deep learning model for end-to-end table detection and tabular data extraction from scanned document images,” 2019 International Conference on Document Analysis and Recognition (ICDAR), Sydney, NSW, Australia, pp.128-133, 2019.
CrossRef

[15] D. Prasad, A. Gadpal, K. Kapadni, M. Visave, and K. Sultanpure, “CascadeTabNet: An approach for end to end table detection and structure recognition from image-based documents,” 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA, pp.2439-2447, 2020.
CrossRef

Authors

Zhe WANG
  Zhejiang University
Zhe-Ming LU
  Zhejiang University
Hao LUO
  Zhejiang University
Yang-Ming ZHENG
  Zhejiang University

Keyword