1-5hit |
Masanobu TSURUTA Hiroyuki SAKAI Shigeru MASUYAMA
We propose a method of informative DOM subtree identification from a Web page in an unfamiliar Web site. Our method uses layout data of DOM nodes generated by a generic Web browser. The results show that our method outperforms a baseline method, and was able to identify informative DOM subtrees from Web pages robustly.
A new method for logical structure analysis of document images is proposed in this paper as the basis for a document reader which can extract logical information from various printed documents. The proposed system consists of five basic modules: text line classification, object recognition, object segmentation, object grouping, and object modification. Emergent computation, which is a key concept of artificial life, is adopted for the cooperative interaction among modules in the system in order to achieve effective and flexible behavior of the whole system. It has three principal advantages over other methods: adaptive system configuration for various and complex logical structures, robust document analysis tolerant of erroneous feature detection, and feedback of high-level logical information to the low-level physical process for accurate analysis. Experimental results obtained for 150 documents show that the method is adaptable, robust, and effective for various document structures.
Kazuhiro NOMURA Koji NAKAMAE Hiromu FUJIOKA
The EB tester line delay fault localization algorithm for combinational circuits is proposed where line delay fault probabilities are utilized to narrow fault candidates down to one efficiently. Probabilities for two main causes of line delay faults, defects of contact/vias along interconnections and crosstalk, are estimated through layout analysis. The algorithm was applied to 8 kinds of ISCAS'85 benchmark circuits to evaluate its performance where the guided probe (GP) diagnosis was used as the reference method. The proposed method can cut the number of probed lines to about 30% in average compared with those for the GP method. The total fault localization time was 31% of the time for the GP method and was 6% less than that of our previous method where the fault list generated in concurrent fault simulation is utilized.
Kei TAKIZAWA Daisaku ARITA Michihiko MINOH Katsuo IKEDA
A method for extracting and recognizing character strings from unformed document images, which have inclined character strings and have no structure at all, is described. To process such kinds of unformed documents, previous schemes, which are intended only to deal with documents containing nothing but horizontal or vertical strings of characters, do not work well. Our method is based on the idea that the processes of recognition and extraction of character patterns should operate together, and on the characteristic that the character patterns are located close to each other when they belong to the same string. The method has been implemented and applied to several images. The experimental results show the robustness of our method.
Takashi SAITOH Toshifumi YAMAAI Michiyoshi TACHIKAWA
A system for segmentation of document image and ordering text areas is described, and applied to complex printed page layouts of both Japanese and English. There is no need to make any assumptions about the shape of blocks, hence the segmentation technique can handle not only skewed images without skew-correction but also documents where columns are not rectangular. In this technique, based on the bottom-up strategy, the connected components are extracted from the reduced image, and classiferd according to their local information. The connected components calssified as characters are then merged into lines, and the lines are merged into areas. Extracted text areas are classified as body, caption, header or footer. A tree graph of the layout of the body texts is made, and the texts ordered by preorder traversal on the graph. We introduce the concept of an influence range of each node, a procedure for handling titles, thus obtaining good results on various documents. The total system is fast and compact.