The search functionality is under construction.

Author Search Result

[Author] Sachio HIROKAWA(6hit)

1-6hit
  • A Web Page Segmentation Approach Using Visual Semantics

    Jun ZENG  Brendan FLANAGAN  Sachio HIROKAWA  Eisuke ITO  

     
    PAPER-Data Engineering, Web Information Systems

      Vol:
    E97-D No:2
      Page(s):
    223-230

    Web page segmentation has a variety of benefits and potential web applications. Early techniques of web page segmentation are mainly based on machine learning algorithms and rule-based heuristics, which cannot be used for large-scale page segmentation. In this paper, we propose a formulated page segmentation method using visual semantics. Instead of analyzing the visual cues of web pages, this method utilizes three measures to formulate the visual semantics: layout tree is used to recognize the visual similar blocks; seam degree is used to describe how neatly the blocks are arranged; content similarity is used to describe the content coherent degree between blocks. A comparison experiment was done using the VIPS algorithm as a baseline. Experiment results show that the proposed method can divide a Web page into appropriate semantic segments.

  • LTDE: A Layout Tree Based Approach for Deep Page Data Extraction

    Jun ZENG  Feng LI  Brendan FLANAGAN  Sachio HIROKAWA  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2017/02/21
      Vol:
    E100-D No:5
      Page(s):
    1067-1078

    Content extraction from deep Web pages has received great attention in recent years. However, the increasingly complicated HTML structure of Web documents makes it more difficult to recognize the data records by only analyzing the HTML source code. In this paper, we propose a method named LTDE to extract data records from a deep Web page. Instead of analyzing the HTML source code, LTDE utilizes the visual features of data records in deep Web pages. A Web page is considered as a finite set of visual blocks. The data records are the visual blocks that have similar layout. We also propose a pattern recognizing method named layout tree to cluster the similar layout visual blocks. The weight of all clusters is calculated, and the visual blocks in the cluster that has the highest weight are chosen as the data records to be extracted. The experiment results show that LTDE has higher effectiveness and better robustness for Web data extraction compared to previous works.

  • Shilling Attack Detection in Recommender Systems via Selecting Patterns Analysis

    Wentao LI  Min GAO  Hua LI  Jun ZENG  Qingyu XIONG  Sachio HIROKAWA  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2016/06/27
      Vol:
    E99-D No:10
      Page(s):
    2600-2611

    Collaborative filtering (CF) has been widely used in recommender systems to generate personalized recommendations. However, recommender systems using CF are vulnerable to shilling attacks, in which attackers inject fake profiles to manipulate recommendation results. Thus, shilling attacks pose a threat to the credibility of recommender systems. Previous studies mainly derive features from characteristics of item ratings in user profiles to detect attackers, but the methods suffer from low accuracy when attackers adopt new rating patterns. To overcome this drawback, we derive features from properties of item popularity in user profiles, which are determined by users' different selecting patterns. This feature extraction method is based on the prior knowledge that attackers select items to rate with man-made rules while normal users do this according to their inner preferences. Then, machine learning classification approaches are exploited to make use of these features to detect and remove attackers. Experiment results on the MovieLens dataset and Amazon review dataset show that our proposed method improves detection performance. In addition, the results justify the practical value of features derived from selecting patterns.

  • Automated Duplicate Bug Report Detection Using Multi-Factor Analysis

    Jie ZOU  Ling XU  Mengning YANG  Xiaohong ZHANG  Jun ZENG  Sachio HIROKAWA  

     
    PAPER-Software Engineering

      Pubricized:
    2016/04/01
      Vol:
    E99-D No:7
      Page(s):
    1762-1775

    The bug reports expressed in natural language text usually suffer from vast, ambiguous and poorly written, which causes the challenge to the duplicate bug reports detection. Current automatic duplicate bug reports detection techniques have mainly focused on textual information and ignored some useful factors. To improve the detection accuracy, in this paper, we propose a new approach calls LNG (LDA and N-gram) model which takes advantages of the topic model LDA and word-based model N-gram. The LNG considers multiple factors, including textual information, semantic correlation, word order, contextual connections, and categorial information, that potentially affect the detection accuracy. Besides, the N-gram adopted in our LNG model is improved by modifying the similarity algorithm. The experiment is conducted under more than 230,000 real bug reports of the Eclipse project. In the evaluation, we propose a new evaluation metric, namely exact-accuracy (EA) rate, which can be used to enhance the understanding of the performance of duplicates detection. The evaluation results show that all the recall rate, precision rate, and EA rate of the proposed method are higher than treating them separately. Also, the recall rate is improved by 2.96%-10.53% compared to the state-of-art approach DBTM.

  • Personalized Movie Recommendation System Based on Support Vector Machine and Improved Particle Swarm Optimization

    Xibin WANG  Fengji LUO  Chunyan SANG  Jun ZENG  Sachio HIROKAWA  

     
    PAPER-Data Engineering, Web Information Systems

      Pubricized:
    2016/11/21
      Vol:
    E100-D No:2
      Page(s):
    285-293

    With the rapid development of information and Web technologies, people are facing ‘information overload’ in their daily lives. The personalized recommendation system (PRS) is an effective tool to assist users extract meaningful information from the big data. Collaborative filtering (CF) is one of the most widely used personalized recommendation techniques to recommend the personalized products for users. However, the conventional CF technique has some limitations, such as the low accuracy of of similarity calculation, cold start problem, etc. In this paper, a PRS model based on the Support Vector Machine (SVM) is proposed. The proposed model not only considers the items' content information, but also the users' demographic and behavior information to fully capture the users' interests and preferences. An improved Particle Swarm Optimization (PSO) algorithm is also proposed to improve the performance of the model. The efficiency of the proposed method is verified by multiple benchmark datasets.

  • Dynamic Macro-Based Heuristic Planning through Action Relationship Analysis

    Zhuo JIANG  Junhao WEN  Jun ZENG  Yihao ZHANG  Xibin WANG  Sachio HIROKAWA  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2014/10/23
      Vol:
    E98-D No:2
      Page(s):
    363-371

    The success of heuristic search in AI planning largely depends on the design of the heuristic. On the other hand, previous experience contains potential domain information that can assist the planning process. In this context, we have studied dynamic macro-based heuristic planning through action relationship analysis. We present an approach for analyzing the action relationship and design an algorithm that learns macros in solved cases. We then propose a dynamic macro-based heuristic that appropriately reuses the macros rather than immediately assigning them to domains. The above ideas are incorporated into a working planning system called Dynamic Macro-based Fast Forward planner. Finally, we evaluate our method in a series of experiments. Our method effectively optimizes planning since it reduces the result length by an average of 10% relative to the FF, in a time-economic manner. The efficiency is especially improved when invoking an action consumes time.