The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] data extraction(5hit)

1-5hit
  • LTDE: A Layout Tree Based Approach for Deep Page Data Extraction

    Jun ZENG  Feng LI  Brendan FLANAGAN  Sachio HIROKAWA  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2017/02/21
      Vol:
    E100-D No:5
      Page(s):
    1067-1078

    Content extraction from deep Web pages has received great attention in recent years. However, the increasingly complicated HTML structure of Web documents makes it more difficult to recognize the data records by only analyzing the HTML source code. In this paper, we propose a method named LTDE to extract data records from a deep Web page. Instead of analyzing the HTML source code, LTDE utilizes the visual features of data records in deep Web pages. A Web page is considered as a finite set of visual blocks. The data records are the visual blocks that have similar layout. We also propose a pattern recognizing method named layout tree to cluster the similar layout visual blocks. The weight of all clusters is calculated, and the visual blocks in the cluster that has the highest weight are chosen as the data records to be extracted. The experiment results show that LTDE has higher effectiveness and better robustness for Web data extraction compared to previous works.

  • Fast Information Retrieval Method from Printed Images Considering Mobile Devices

    Aya HIYAMA  Mitsuji MUNEYASU  

     
    LETTER-Image Processing

      Vol:
    E96-A No:11
      Page(s):
    2194-2197

    In information retrieval from printed images considering the use of mobile devices, the correction of geometrical deformation and lens distortion is required, posing a heavy computational burden. In this paper, we propose a method of reducing the computational burden for such corrections. This method consists of improved extraction to find a line segment of a frame, the reconsideration of the interpolation method for image correction, and the optimization of image resolution in the correction process. The proposed method can reduce the number of computations significantly. The experimental result shows the effectiveness of the proposed method.

  • A Method of Data Embedding and Extracting for Information Retrieval Considering Mobile Devices

    Mitsuji MUNEYASU  Hiroshi KUDO  Takafumi SHONO  Yoshiko HANADA  

     
    PAPER

      Vol:
    E96-A No:6
      Page(s):
    1214-1221

    In this paper, we propose an improved data embedding and extraction method for information retrieval considering the use of mobile devices. Although the conventional method has demonstrated good results for images captured by cellular phones, some problems remain with this method. One problem is the lack of consideration of the construction of the code grouping in the code grouping method. In this paper, a new construction method for code grouping is proposed, and it is shown that a suitable grouping of the codes can be found. Another problem is the correction method of lens distortion, which is time-consuming. Therefore, to improve the processing speed, the golden section search method is adopted to estimate the distortion coefficients. In addition, a new tuning algorithm for the gain coefficient in the embedding process is also proposed. Experimental results show an increase in the detection rate for embedding data and a reduction of the processing time.

  • Tentacled Self-Organizing Map for Effective Data Extraction

    Haruna MATSUSHITA  Yoshifumi NISHIO  

     
    PAPER-Neuron and Neural Networks

      Vol:
    E90-A No:10
      Page(s):
    2085-2092

    Since we can accumulate a large amount of data including useless information in recent years, it is important to investigate various extraction method of clusters from data including much noises. The Self-Organizing Map (SOM) has attracted attention for clustering nowadays. In this study, we propose a method of using plural SOMs (TSOM: Tentacled SOM) for effective data extraction. TSOM consists of two kinds of SOM whose features are different, namely, one self-organizes the area where input data are concentrated, and the other self-organizes the whole of the input space. Each SOM of TSOM can catch the information of other SOMs existing in its neighborhood and self-organizes with the competing and accommodating behaviors. We apply TSOM to data extraction from input data including much noise, and can confirm that TSOM successfully extracts only clusters even in the case that we do not know the number of clusters in advance.

  • Competing Behavior of Two Kinds of Self-Organizing Maps and Its Application to Clustering

    Haruna MATSUSHITA  Yoshifumi NISHIO  

     
    PAPER-Neural Networks and Bioengineering

      Vol:
    E90-A No:4
      Page(s):
    865-871

    The Self-Organizing Map (SOM) is an unsupervised neural network introduced in the 80's by Teuvo Kohonen. In this paper, we propose a method of simultaneously using two kinds of SOM whose features are different (the nSOM method). Namely, one is distributed in the area at which input data are concentrated, and the other self-organizes the whole of the input space. The competing behavior of the two kinds of SOM for nonuniform input data is investigated. Furthermore, we show its application to clustering and confirm its efficiency by comparing with the k-means method.