A web community is a set of web pages that provide resources on a specific topic. Various methods for finding web communities based on link analysis have been proposed in the literature. The method proposed in this paper is based on the method using the maximum flow algorithm proposed in. Our objective of using the maximum flow algorithm is to extract a subgraph which can be recognized as a good web community in the context of the quantity and the quality. This paper first discusses the features of the maximum flow algorithm based method. The previously proposed approach has a problem that a certain graph structure containing noises (i.e., irrelevant pages) is always extracted. This problem is mainly caused by edge capacities assigned a constant value. This paper proposes an assignment of variable edge capacities that are based on hub and authority scores obtained from HITS calculation. To examine the effects of our proposed method, we performed experiments using a Japanese archive crawled in February 2002. Our experimental results demonstrate that our proposed method removes noise pages caused by constant edge capacities and improves the quality of web communities.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Noriko IMAFUJI, Masaru KITSUREGAWA, "Finding Web Communities by Maximum Flow Algorithm Using Well-Assigned Edge Capacities" in IEICE TRANSACTIONS on Information,
vol. E87-D, no. 2, pp. 407-415, February 2004, doi: .
Abstract: A web community is a set of web pages that provide resources on a specific topic. Various methods for finding web communities based on link analysis have been proposed in the literature. The method proposed in this paper is based on the method using the maximum flow algorithm proposed in. Our objective of using the maximum flow algorithm is to extract a subgraph which can be recognized as a good web community in the context of the quantity and the quality. This paper first discusses the features of the maximum flow algorithm based method. The previously proposed approach has a problem that a certain graph structure containing noises (i.e., irrelevant pages) is always extracted. This problem is mainly caused by edge capacities assigned a constant value. This paper proposes an assignment of variable edge capacities that are based on hub and authority scores obtained from HITS calculation. To examine the effects of our proposed method, we performed experiments using a Japanese archive crawled in February 2002. Our experimental results demonstrate that our proposed method removes noise pages caused by constant edge capacities and improves the quality of web communities.
URL: https://global.ieice.org/en_transactions/information/10.1587/e87-d_2_407/_p
Copy
@ARTICLE{e87-d_2_407,
author={Noriko IMAFUJI, Masaru KITSUREGAWA, },
journal={IEICE TRANSACTIONS on Information},
title={Finding Web Communities by Maximum Flow Algorithm Using Well-Assigned Edge Capacities},
year={2004},
volume={E87-D},
number={2},
pages={407-415},
abstract={A web community is a set of web pages that provide resources on a specific topic. Various methods for finding web communities based on link analysis have been proposed in the literature. The method proposed in this paper is based on the method using the maximum flow algorithm proposed in. Our objective of using the maximum flow algorithm is to extract a subgraph which can be recognized as a good web community in the context of the quantity and the quality. This paper first discusses the features of the maximum flow algorithm based method. The previously proposed approach has a problem that a certain graph structure containing noises (i.e., irrelevant pages) is always extracted. This problem is mainly caused by edge capacities assigned a constant value. This paper proposes an assignment of variable edge capacities that are based on hub and authority scores obtained from HITS calculation. To examine the effects of our proposed method, we performed experiments using a Japanese archive crawled in February 2002. Our experimental results demonstrate that our proposed method removes noise pages caused by constant edge capacities and improves the quality of web communities.},
keywords={},
doi={},
ISSN={},
month={February},}
Copy
TY - JOUR
TI - Finding Web Communities by Maximum Flow Algorithm Using Well-Assigned Edge Capacities
T2 - IEICE TRANSACTIONS on Information
SP - 407
EP - 415
AU - Noriko IMAFUJI
AU - Masaru KITSUREGAWA
PY - 2004
DO -
JO - IEICE TRANSACTIONS on Information
SN -
VL - E87-D
IS - 2
JA - IEICE TRANSACTIONS on Information
Y1 - February 2004
AB - A web community is a set of web pages that provide resources on a specific topic. Various methods for finding web communities based on link analysis have been proposed in the literature. The method proposed in this paper is based on the method using the maximum flow algorithm proposed in. Our objective of using the maximum flow algorithm is to extract a subgraph which can be recognized as a good web community in the context of the quantity and the quality. This paper first discusses the features of the maximum flow algorithm based method. The previously proposed approach has a problem that a certain graph structure containing noises (i.e., irrelevant pages) is always extracted. This problem is mainly caused by edge capacities assigned a constant value. This paper proposes an assignment of variable edge capacities that are based on hub and authority scores obtained from HITS calculation. To examine the effects of our proposed method, we performed experiments using a Japanese archive crawled in February 2002. Our experimental results demonstrate that our proposed method removes noise pages caused by constant edge capacities and improves the quality of web communities.
ER -