Automatic Extraction of the Fine Category of Person Named Entities from Text Corpora

Tri-Thanh NGUYEN; Akira SHIMAZU

doi:10.1093/ietisy/e90-d.10.1542

Automatic Extraction of the Fine Category of Person Named Entities from Text Corpora

Tri-Thanh NGUYEN, Akira SHIMAZU

Full Text Views

0

Cite this

Summary :

Named entities play an important role in many Natural Language Processing applications. Currently, most named entity recognition systems rely on a small set of general named entity (NE) types. Though some efforts have been proposed to expand the hierarchy of NE types, there are still a fixed number of NE types. In real applications, such as question answering or semantic search systems, users may be interested in more diverse specific NE types. This paper proposes a method to extract categories of person named entities from text documents. Based on Dual Iterative Pattern Relation Extraction method, we develop a more suitable model for solving our problem, and explore the generation of different pattern types. A method for validating whether a category is valid or not is proposed to improve the performance, and experiments on Wall Street Journal corpus give promising results.

Publication: IEICE TRANSACTIONS on Information Vol.E90-D No.10 pp.1542-1549

Publication Date: 2007/10/01

Publicized

Online ISSN: 1745-1361

DOI: 10.1093/ietisy/e90-d.10.1542

Type of Manuscript: Special Section PAPER (Special Section on Knowledge, Information and Creativity Support System)

Category

Cite this

Copy

Tri-Thanh NGUYEN, Akira SHIMAZU, "Automatic Extraction of the Fine Category of Person Named Entities from Text Corpora" in IEICE TRANSACTIONS on Information, vol. E90-D, no. 10, pp. 1542-1549, October 2007, doi: 10.1093/ietisy/e90-d.10.1542.
Abstract: Named entities play an important role in many Natural Language Processing applications. Currently, most named entity recognition systems rely on a small set of general named entity (NE) types. Though some efforts have been proposed to expand the hierarchy of NE types, there are still a fixed number of NE types. In real applications, such as question answering or semantic search systems, users may be interested in more diverse specific NE types. This paper proposes a method to extract categories of person named entities from text documents. Based on Dual Iterative Pattern Relation Extraction method, we develop a more suitable model for solving our problem, and explore the generation of different pattern types. A method for validating whether a category is valid or not is proposed to improve the performance, and experiments on Wall Street Journal corpus give promising results.
URL: https://global.ieice.org/en_transactions/information/10.1093/ietisy/e90-d.10.1542/_p

Copy

@ARTICLE{e90-d_10_1542,
author={Tri-Thanh NGUYEN, Akira SHIMAZU, },
journal={IEICE TRANSACTIONS on Information},
title={Automatic Extraction of the Fine Category of Person Named Entities from Text Corpora},
year={2007},
volume={E90-D},
number={10},
pages={1542-1549},
abstract={Named entities play an important role in many Natural Language Processing applications. Currently, most named entity recognition systems rely on a small set of general named entity (NE) types. Though some efforts have been proposed to expand the hierarchy of NE types, there are still a fixed number of NE types. In real applications, such as question answering or semantic search systems, users may be interested in more diverse specific NE types. This paper proposes a method to extract categories of person named entities from text documents. Based on Dual Iterative Pattern Relation Extraction method, we develop a more suitable model for solving our problem, and explore the generation of different pattern types. A method for validating whether a category is valid or not is proposed to improve the performance, and experiments on Wall Street Journal corpus give promising results.},
keywords={},
doi={10.1093/ietisy/e90-d.10.1542},
ISSN={1745-1361},
month={October},}

Copy

TY - JOUR
TI - Automatic Extraction of the Fine Category of Person Named Entities from Text Corpora
T2 - IEICE TRANSACTIONS on Information
SP - 1542
EP - 1549
AU - Tri-Thanh NGUYEN
AU - Akira SHIMAZU
PY - 2007
DO - 10.1093/ietisy/e90-d.10.1542
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E90-D
IS - 10
JA - IEICE TRANSACTIONS on Information
Y1 - October 2007
AB - Named entities play an important role in many Natural Language Processing applications. Currently, most named entity recognition systems rely on a small set of general named entity (NE) types. Though some efforts have been proposed to expand the hierarchy of NE types, there are still a fixed number of NE types. In real applications, such as question answering or semantic search systems, users may be interested in more diverse specific NE types. This paper proposes a method to extract categories of person named entities from text documents. Based on Dual Iterative Pattern Relation Extraction method, we develop a more suitable model for solving our problem, and explore the generation of different pattern types. A method for validating whether a category is valid or not is proposed to improve the performance, and experiments on Wall Street Journal corpus give promising results.
ER -