Modeling Content Structures of Domain-Specific Texts with RUP-HDP-HSMM and Its Applications

Youwei LU; Shogo OKADA; Katsumi NITTA

doi:10.1587/transinf.2017EDP7043

IEICE TRANSACTIONS on Information

Modeling Content Structures of Domain-Specific Texts with RUP-HDP-HSMM and Its Applications

Youwei LU, Shogo OKADA, Katsumi NITTA

Full Text Views

0

Cite this

Summary :

We propose a novel method, built upon the hierarchical Dirichlet process hidden semi-Markov model, to reveal the content structures of unstructured domain-specific texts. The content structures of texts consisting of sequential local contexts are useful for tasks, such as text retrieval, classification, and text mining. The prominent feature of our model is the use of the recursive uniform partitioning, a stochastic process taking a view different from existing HSMMs in modeling state duration. We show that the recursive uniform partitioning plays an important role in avoiding the rapid switching between hidden states. Remarkably, our method greatly outperforms others in terms of ranking performance in our text retrieval experiments, and provides more accurate features for SVM to achieve higher F1 scores in our text classification experiments. These experiment results suggest that our method can yield improved representations of domain-specific texts. Furthermore, we present a method of automatically discovering the local contexts that serve to account for why a text is classified as a positive instance, in the supervised learning settings.

Publication: IEICE TRANSACTIONS on Information Vol.E100-D No.9 pp.2126-2137

Publication Date: 2017/09/01

Publicized: 2017/06/09

Online ISSN: 1745-1361

DOI: 10.1587/transinf.2017EDP7043

Type of Manuscript: PAPER

Category: Artificial Intelligence, Data Mining

Authors

Youwei LU
  Tokyo Institute of Technology
Shogo OKADA
  Japan Advanced Institute of Science and Technology
Katsumi NITTA
  Tokyo Institute of Technology

Keyword

hidden semi-Markov models, content structure, local features, text mining, rapid switching

Cite this

Copy

Youwei LU, Shogo OKADA, Katsumi NITTA, "Modeling Content Structures of Domain-Specific Texts with RUP-HDP-HSMM and Its Applications" in IEICE TRANSACTIONS on Information, vol. E100-D, no. 9, pp. 2126-2137, September 2017, doi: 10.1587/transinf.2017EDP7043.
Abstract: We propose a novel method, built upon the hierarchical Dirichlet process hidden semi-Markov model, to reveal the content structures of unstructured domain-specific texts. The content structures of texts consisting of sequential local contexts are useful for tasks, such as text retrieval, classification, and text mining. The prominent feature of our model is the use of the recursive uniform partitioning, a stochastic process taking a view different from existing HSMMs in modeling state duration. We show that the recursive uniform partitioning plays an important role in avoiding the rapid switching between hidden states. Remarkably, our method greatly outperforms others in terms of ranking performance in our text retrieval experiments, and provides more accurate features for SVM to achieve higher F1 scores in our text classification experiments. These experiment results suggest that our method can yield improved representations of domain-specific texts. Furthermore, we present a method of automatically discovering the local contexts that serve to account for why a text is classified as a positive instance, in the supervised learning settings.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2017EDP7043/_p

Copy

@ARTICLE{e100-d_9_2126,
author={Youwei LU, Shogo OKADA, Katsumi NITTA, },
journal={IEICE TRANSACTIONS on Information},
title={Modeling Content Structures of Domain-Specific Texts with RUP-HDP-HSMM and Its Applications},
year={2017},
volume={E100-D},
number={9},
pages={2126-2137},
abstract={We propose a novel method, built upon the hierarchical Dirichlet process hidden semi-Markov model, to reveal the content structures of unstructured domain-specific texts. The content structures of texts consisting of sequential local contexts are useful for tasks, such as text retrieval, classification, and text mining. The prominent feature of our model is the use of the recursive uniform partitioning, a stochastic process taking a view different from existing HSMMs in modeling state duration. We show that the recursive uniform partitioning plays an important role in avoiding the rapid switching between hidden states. Remarkably, our method greatly outperforms others in terms of ranking performance in our text retrieval experiments, and provides more accurate features for SVM to achieve higher F1 scores in our text classification experiments. These experiment results suggest that our method can yield improved representations of domain-specific texts. Furthermore, we present a method of automatically discovering the local contexts that serve to account for why a text is classified as a positive instance, in the supervised learning settings.},
keywords={},
doi={10.1587/transinf.2017EDP7043},
ISSN={1745-1361},
month={September},}

Copy

TY - JOUR
TI - Modeling Content Structures of Domain-Specific Texts with RUP-HDP-HSMM and Its Applications
T2 - IEICE TRANSACTIONS on Information
SP - 2126
EP - 2137
AU - Youwei LU
AU - Shogo OKADA
AU - Katsumi NITTA
PY - 2017
DO - 10.1587/transinf.2017EDP7043
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E100-D
IS - 9
JA - IEICE TRANSACTIONS on Information
Y1 - September 2017
AB - We propose a novel method, built upon the hierarchical Dirichlet process hidden semi-Markov model, to reveal the content structures of unstructured domain-specific texts. The content structures of texts consisting of sequential local contexts are useful for tasks, such as text retrieval, classification, and text mining. The prominent feature of our model is the use of the recursive uniform partitioning, a stochastic process taking a view different from existing HSMMs in modeling state duration. We show that the recursive uniform partitioning plays an important role in avoiding the rapid switching between hidden states. Remarkably, our method greatly outperforms others in terms of ranking performance in our text retrieval experiments, and provides more accurate features for SVM to achieve higher F1 scores in our text classification experiments. These experiment results suggest that our method can yield improved representations of domain-specific texts. Furthermore, we present a method of automatically discovering the local contexts that serve to account for why a text is classified as a positive instance, in the supervised learning settings.
ER -

IEICE TRANSACTIONS on Information