Web page segmentation has a variety of benefits and potential web applications. Early techniques of web page segmentation are mainly based on machine learning algorithms and rule-based heuristics, which cannot be used for large-scale page segmentation. In this paper, we propose a formulated page segmentation method using visual semantics. Instead of analyzing the visual cues of web pages, this method utilizes three measures to formulate the visual semantics: layout tree is used to recognize the visual similar blocks; seam degree is used to describe how neatly the blocks are arranged; content similarity is used to describe the content coherent degree between blocks. A comparison experiment was done using the VIPS algorithm as a baseline. Experiment results show that the proposed method can divide a Web page into appropriate semantic segments.
Jun ZENG
Kyushu University
Brendan FLANAGAN
Kyushu University
Sachio HIROKAWA
Kyushu University
Eisuke ITO
Kyushu University
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Jun ZENG, Brendan FLANAGAN, Sachio HIROKAWA, Eisuke ITO, "A Web Page Segmentation Approach Using Visual Semantics" in IEICE TRANSACTIONS on Information,
vol. E97-D, no. 2, pp. 223-230, February 2014, doi: 10.1587/transinf.E97.D.223.
Abstract: Web page segmentation has a variety of benefits and potential web applications. Early techniques of web page segmentation are mainly based on machine learning algorithms and rule-based heuristics, which cannot be used for large-scale page segmentation. In this paper, we propose a formulated page segmentation method using visual semantics. Instead of analyzing the visual cues of web pages, this method utilizes three measures to formulate the visual semantics: layout tree is used to recognize the visual similar blocks; seam degree is used to describe how neatly the blocks are arranged; content similarity is used to describe the content coherent degree between blocks. A comparison experiment was done using the VIPS algorithm as a baseline. Experiment results show that the proposed method can divide a Web page into appropriate semantic segments.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.E97.D.223/_p
Copy
@ARTICLE{e97-d_2_223,
author={Jun ZENG, Brendan FLANAGAN, Sachio HIROKAWA, Eisuke ITO, },
journal={IEICE TRANSACTIONS on Information},
title={A Web Page Segmentation Approach Using Visual Semantics},
year={2014},
volume={E97-D},
number={2},
pages={223-230},
abstract={Web page segmentation has a variety of benefits and potential web applications. Early techniques of web page segmentation are mainly based on machine learning algorithms and rule-based heuristics, which cannot be used for large-scale page segmentation. In this paper, we propose a formulated page segmentation method using visual semantics. Instead of analyzing the visual cues of web pages, this method utilizes three measures to formulate the visual semantics: layout tree is used to recognize the visual similar blocks; seam degree is used to describe how neatly the blocks are arranged; content similarity is used to describe the content coherent degree between blocks. A comparison experiment was done using the VIPS algorithm as a baseline. Experiment results show that the proposed method can divide a Web page into appropriate semantic segments.},
keywords={},
doi={10.1587/transinf.E97.D.223},
ISSN={1745-1361},
month={February},}
Copy
TY - JOUR
TI - A Web Page Segmentation Approach Using Visual Semantics
T2 - IEICE TRANSACTIONS on Information
SP - 223
EP - 230
AU - Jun ZENG
AU - Brendan FLANAGAN
AU - Sachio HIROKAWA
AU - Eisuke ITO
PY - 2014
DO - 10.1587/transinf.E97.D.223
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E97-D
IS - 2
JA - IEICE TRANSACTIONS on Information
Y1 - February 2014
AB - Web page segmentation has a variety of benefits and potential web applications. Early techniques of web page segmentation are mainly based on machine learning algorithms and rule-based heuristics, which cannot be used for large-scale page segmentation. In this paper, we propose a formulated page segmentation method using visual semantics. Instead of analyzing the visual cues of web pages, this method utilizes three measures to formulate the visual semantics: layout tree is used to recognize the visual similar blocks; seam degree is used to describe how neatly the blocks are arranged; content similarity is used to describe the content coherent degree between blocks. A comparison experiment was done using the VIPS algorithm as a baseline. Experiment results show that the proposed method can divide a Web page into appropriate semantic segments.
ER -