The search functionality is under construction.

Author Search Result

[Author] Shunxiang WU(1hit)

1-1hit
  • The Biterm Author Topic in the Sentences Model for E-Mail Analysis

    Xiuze ZHOU  Shunxiang WU  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2017/04/25
      Vol:
    E100-D No:8
      Page(s):
    1852-1859

    E-mails, which vary in length, are a special form of text. The difference in the lengths of e-mails increases the difficulty of text analysis. To better analyze e-mail, our models must analyze not only long e-mails but also short e-mails. Unlike normal documents, short texts have some unique characteristics, such as data sparsity and ambiguity problems, making it difficult to obtain useful information from them. However, long text and short text cannot be analyzed in the same manner. Therefore, we have to analyze the characteristics of both. We present the Biterm Author Topic in the Sentences Model (BATS) model; it can discover relevant topics of corpus and accurately capture the relationship between the topics and authors of e-mails. The Author Topic (AT) model learns from a single word in a document, while the BATS is modeled on word co-occurrence in the entire corpus. We assume that all words in a single sentence are generated from the same topic. Accordingly, our method uses only word co-occurrence patterns at the sentence level, rather than the document or corpus level. Experiments on the Enron data set indicate that our proposed method achieves better performance on e-mails than the baseline methods. What's more, our method analyzes long texts effectively and solves the data sparsity problems of short texts.