The search functionality is under construction.
The search functionality is under construction.

Author Search Result

[Author] Akira SHIMAZU(2hit)

1-2hit
  • Automatic Extraction of the Fine Category of Person Named Entities from Text Corpora

    Tri-Thanh NGUYEN  Akira SHIMAZU  

     
    PAPER

      Vol:
    E90-D No:10
      Page(s):
    1542-1549

    Named entities play an important role in many Natural Language Processing applications. Currently, most named entity recognition systems rely on a small set of general named entity (NE) types. Though some efforts have been proposed to expand the hierarchy of NE types, there are still a fixed number of NE types. In real applications, such as question answering or semantic search systems, users may be interested in more diverse specific NE types. This paper proposes a method to extract categories of person named entities from text documents. Based on Dual Iterative Pattern Relation Extraction method, we develop a more suitable model for solving our problem, and explore the generation of different pattern types. A method for validating whether a category is valid or not is proposed to improve the performance, and experiments on Wall Street Journal corpus give promising results.

  • Learning to Generate a Table-of-Contents with Supportive Knowledge

    Viet Cuong NGUYEN  Le Minh NGUYEN  Akira SHIMAZU  

     
    PAPER

      Vol:
    E94-D No:3
      Page(s):
    423-431

    In the text summarization field, a table-of-contents is a type of indicative summary that is especially suited for locating information in a long document, or a set of documents. It is also a useful summary for a reader to quickly get an overview of the entire contents. The current models for generating a table-of-contents produced relatively low quality output with many meaningless titles, or titles that have no overlapping meaning with the corresponding contents. This problem may be due to the lack of semantic information and topic information in those models. In this research, we propose to integrate supportive knowledge into the learning models to improve the quality of titles in a generated table-of-contents. The supportive knowledge is derived from a hierarchical clustering of words, which is built from a large collection of raw text, and a topic model, which is directly estimated from the training data. The relatively good results of the experiments showed that the semantic and topic information supplied by supportive knowledge have good effects on title generation, and therefore, they help to improve the quality of the generated table-of-contents.