1-5hit |
Jun ZENG Feng LI Brendan FLANAGAN Sachio HIROKAWA
Content extraction from deep Web pages has received great attention in recent years. However, the increasingly complicated HTML structure of Web documents makes it more difficult to recognize the data records by only analyzing the HTML source code. In this paper, we propose a method named LTDE to extract data records from a deep Web page. Instead of analyzing the HTML source code, LTDE utilizes the visual features of data records in deep Web pages. A Web page is considered as a finite set of visual blocks. The data records are the visual blocks that have similar layout. We also propose a pattern recognizing method named layout tree to cluster the similar layout visual blocks. The weight of all clusters is calculated, and the visual blocks in the cluster that has the highest weight are chosen as the data records to be extracted. The experiment results show that LTDE has higher effectiveness and better robustness for Web data extraction compared to previous works.
In information retrieval from printed images considering the use of mobile devices, the correction of geometrical deformation and lens distortion is required, posing a heavy computational burden. In this paper, we propose a method of reducing the computational burden for such corrections. This method consists of improved extraction to find a line segment of a frame, the reconsideration of the interpolation method for image correction, and the optimization of image resolution in the correction process. The proposed method can reduce the number of computations significantly. The experimental result shows the effectiveness of the proposed method.
Mitsuji MUNEYASU Hiroshi KUDO Takafumi SHONO Yoshiko HANADA
In this paper, we propose an improved data embedding and extraction method for information retrieval considering the use of mobile devices. Although the conventional method has demonstrated good results for images captured by cellular phones, some problems remain with this method. One problem is the lack of consideration of the construction of the code grouping in the code grouping method. In this paper, a new construction method for code grouping is proposed, and it is shown that a suitable grouping of the codes can be found. Another problem is the correction method of lens distortion, which is time-consuming. Therefore, to improve the processing speed, the golden section search method is adopted to estimate the distortion coefficients. In addition, a new tuning algorithm for the gain coefficient in the embedding process is also proposed. Experimental results show an increase in the detection rate for embedding data and a reduction of the processing time.
Haruna MATSUSHITA Yoshifumi NISHIO
Since we can accumulate a large amount of data including useless information in recent years, it is important to investigate various extraction method of clusters from data including much noises. The Self-Organizing Map (SOM) has attracted attention for clustering nowadays. In this study, we propose a method of using plural SOMs (TSOM: Tentacled SOM) for effective data extraction. TSOM consists of two kinds of SOM whose features are different, namely, one self-organizes the area where input data are concentrated, and the other self-organizes the whole of the input space. Each SOM of TSOM can catch the information of other SOMs existing in its neighborhood and self-organizes with the competing and accommodating behaviors. We apply TSOM to data extraction from input data including much noise, and can confirm that TSOM successfully extracts only clusters even in the case that we do not know the number of clusters in advance.
Haruna MATSUSHITA Yoshifumi NISHIO
The Self-Organizing Map (SOM) is an unsupervised neural network introduced in the 80's by Teuvo Kohonen. In this paper, we propose a method of simultaneously using two kinds of SOM whose features are different (the nSOM method). Namely, one is distributed in the area at which input data are concentrated, and the other self-organizes the whole of the input space. The competing behavior of the two kinds of SOM for nonuniform input data is investigated. Furthermore, we show its application to clustering and confirm its efficiency by comparing with the k-means method.