Extraction of Semantic Text Portion Related to Anchor Link

Bui Quang HUNG; Masanori OTSUBO; Yoshinori HIJIKATA; Shogo NISHIDA

doi:10.1093/ietisy/e89-d.6.1834

Extraction of Semantic Text Portion Related to Anchor Link

Bui Quang HUNG, Masanori OTSUBO, Yoshinori HIJIKATA, Shogo NISHIDA

Full Text Views

0

Cite this

Summary :

Recently, semantic text portion (STP) is getting popular in the field of Web mining. STP is a text portion in the original page which is semantically related to the anchor pointing to the target page. STPs may include the facts and the people's opinions about the target pages. STPs can be used for various upper-level applications such as automatic summarization and document categorization. In this paper, we concentrate on extracting STPs. We conduct a survey of STP to see the positions of STPs in original pages and find out HTML tags which can divide STPs from the other text portions in original pages. We then develop a method for extracting STPs based on the result of the survey. The experimental results show that our method achieves high performance.

Publication: IEICE TRANSACTIONS on Information Vol.E89-D No.6 pp.1834-1847

Publication Date: 2006/06/01

Publicized

Online ISSN: 1745-1361

DOI: 10.1093/ietisy/e89-d.6.1834

Type of Manuscript: Special Section PAPER (Special Section on Human Communication II)

Category: Language

Cite this

Copy

Bui Quang HUNG, Masanori OTSUBO, Yoshinori HIJIKATA, Shogo NISHIDA, "Extraction of Semantic Text Portion Related to Anchor Link" in IEICE TRANSACTIONS on Information, vol. E89-D, no. 6, pp. 1834-1847, June 2006, doi: 10.1093/ietisy/e89-d.6.1834.
Abstract: Recently, semantic text portion (STP) is getting popular in the field of Web mining. STP is a text portion in the original page which is semantically related to the anchor pointing to the target page. STPs may include the facts and the people's opinions about the target pages. STPs can be used for various upper-level applications such as automatic summarization and document categorization. In this paper, we concentrate on extracting STPs. We conduct a survey of STP to see the positions of STPs in original pages and find out HTML tags which can divide STPs from the other text portions in original pages. We then develop a method for extracting STPs based on the result of the survey. The experimental results show that our method achieves high performance.
URL: https://global.ieice.org/en_transactions/information/10.1093/ietisy/e89-d.6.1834/_p

Copy

@ARTICLE{e89-d_6_1834,
author={Bui Quang HUNG, Masanori OTSUBO, Yoshinori HIJIKATA, Shogo NISHIDA, },
journal={IEICE TRANSACTIONS on Information},
title={Extraction of Semantic Text Portion Related to Anchor Link},
year={2006},
volume={E89-D},
number={6},
pages={1834-1847},
abstract={Recently, semantic text portion (STP) is getting popular in the field of Web mining. STP is a text portion in the original page which is semantically related to the anchor pointing to the target page. STPs may include the facts and the people's opinions about the target pages. STPs can be used for various upper-level applications such as automatic summarization and document categorization. In this paper, we concentrate on extracting STPs. We conduct a survey of STP to see the positions of STPs in original pages and find out HTML tags which can divide STPs from the other text portions in original pages. We then develop a method for extracting STPs based on the result of the survey. The experimental results show that our method achieves high performance.},
keywords={},
doi={10.1093/ietisy/e89-d.6.1834},
ISSN={1745-1361},
month={June},}

Copy

TY - JOUR
TI - Extraction of Semantic Text Portion Related to Anchor Link
T2 - IEICE TRANSACTIONS on Information
SP - 1834
EP - 1847
AU - Bui Quang HUNG
AU - Masanori OTSUBO
AU - Yoshinori HIJIKATA
AU - Shogo NISHIDA
PY - 2006
DO - 10.1093/ietisy/e89-d.6.1834
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E89-D
IS - 6
JA - IEICE TRANSACTIONS on Information
Y1 - June 2006
AB - Recently, semantic text portion (STP) is getting popular in the field of Web mining. STP is a text portion in the original page which is semantically related to the anchor pointing to the target page. STPs may include the facts and the people's opinions about the target pages. STPs can be used for various upper-level applications such as automatic summarization and document categorization. In this paper, we concentrate on extracting STPs. We conduct a survey of STP to see the positions of STPs in original pages and find out HTML tags which can divide STPs from the other text portions in original pages. We then develop a method for extracting STPs based on the result of the survey. The experimental results show that our method achieves high performance.
ER -