Named entities play an important role in many Natural Language Processing applications. Currently, most named entity recognition systems rely on a small set of general named entity (NE) types. Though some efforts have been proposed to expand the hierarchy of NE types, there are still a fixed number of NE types. In real applications, such as question answering or semantic search systems, users may be interested in more diverse specific NE types. This paper proposes a method to extract categories of person named entities from text documents. Based on Dual Iterative Pattern Relation Extraction method, we develop a more suitable model for solving our problem, and explore the generation of different pattern types. A method for validating whether a category is valid or not is proposed to improve the performance, and experiments on Wall Street Journal corpus give promising results.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Tri-Thanh NGUYEN, Akira SHIMAZU, "Automatic Extraction of the Fine Category of Person Named Entities from Text Corpora" in IEICE TRANSACTIONS on Information,
vol. E90-D, no. 10, pp. 1542-1549, October 2007, doi: 10.1093/ietisy/e90-d.10.1542.
Abstract: Named entities play an important role in many Natural Language Processing applications. Currently, most named entity recognition systems rely on a small set of general named entity (NE) types. Though some efforts have been proposed to expand the hierarchy of NE types, there are still a fixed number of NE types. In real applications, such as question answering or semantic search systems, users may be interested in more diverse specific NE types. This paper proposes a method to extract categories of person named entities from text documents. Based on Dual Iterative Pattern Relation Extraction method, we develop a more suitable model for solving our problem, and explore the generation of different pattern types. A method for validating whether a category is valid or not is proposed to improve the performance, and experiments on Wall Street Journal corpus give promising results.
URL: https://global.ieice.org/en_transactions/information/10.1093/ietisy/e90-d.10.1542/_p
Copy
@ARTICLE{e90-d_10_1542,
author={Tri-Thanh NGUYEN, Akira SHIMAZU, },
journal={IEICE TRANSACTIONS on Information},
title={Automatic Extraction of the Fine Category of Person Named Entities from Text Corpora},
year={2007},
volume={E90-D},
number={10},
pages={1542-1549},
abstract={Named entities play an important role in many Natural Language Processing applications. Currently, most named entity recognition systems rely on a small set of general named entity (NE) types. Though some efforts have been proposed to expand the hierarchy of NE types, there are still a fixed number of NE types. In real applications, such as question answering or semantic search systems, users may be interested in more diverse specific NE types. This paper proposes a method to extract categories of person named entities from text documents. Based on Dual Iterative Pattern Relation Extraction method, we develop a more suitable model for solving our problem, and explore the generation of different pattern types. A method for validating whether a category is valid or not is proposed to improve the performance, and experiments on Wall Street Journal corpus give promising results.},
keywords={},
doi={10.1093/ietisy/e90-d.10.1542},
ISSN={1745-1361},
month={October},}
Copy
TY - JOUR
TI - Automatic Extraction of the Fine Category of Person Named Entities from Text Corpora
T2 - IEICE TRANSACTIONS on Information
SP - 1542
EP - 1549
AU - Tri-Thanh NGUYEN
AU - Akira SHIMAZU
PY - 2007
DO - 10.1093/ietisy/e90-d.10.1542
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E90-D
IS - 10
JA - IEICE TRANSACTIONS on Information
Y1 - October 2007
AB - Named entities play an important role in many Natural Language Processing applications. Currently, most named entity recognition systems rely on a small set of general named entity (NE) types. Though some efforts have been proposed to expand the hierarchy of NE types, there are still a fixed number of NE types. In real applications, such as question answering or semantic search systems, users may be interested in more diverse specific NE types. This paper proposes a method to extract categories of person named entities from text documents. Based on Dual Iterative Pattern Relation Extraction method, we develop a more suitable model for solving our problem, and explore the generation of different pattern types. A method for validating whether a category is valid or not is proposed to improve the performance, and experiments on Wall Street Journal corpus give promising results.
ER -