Although a distributed index on a distributed hash table (DHT) enables efficient document query processing in Peer-to-Peer information retrieval (P2P IR), the index costs a lot to construct and it tends to be an unfair management because of the unbalanced term frequency distribution. We devised a new distributed index, named Huffman-DHT, for P2P IR. The new index uses an algorithm similar to Huffman coding with a modification to the DHT structure based on the term distribution. In a Huffman-DHT, a frequent term is assigned to a short ID and allocated a large space in the node ID space in DHT. Throuth ID management, the Huffman-DHT balances the index registration accesses among peers and reduces load concentrations. Huffman-DHT is the first approach to adapt concepts of coding theory and term frequency distribution to load balancing. We evaluated this approach in experiments using a document collection and assessed its load balancing capabilities in P2P IR. The experimental results indicated that it is most effective when the P2P system consists of about 30,000 nodes and contains many documents. Moreover, we proved that we can construct a Huffman-DHT easily by estimating the probability distribution of the term occurrence from a small number of sample documents.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Hisashi KURASAWA, Atsuhiro TAKASU, Jun ADACHI, "Load Balancing Scheme on the Basis of Huffman Coding for P2P Information Retrieval" in IEICE TRANSACTIONS on Information,
vol. E92-D, no. 10, pp. 2064-2072, October 2009, doi: 10.1587/transinf.E92.D.2064.
Abstract: Although a distributed index on a distributed hash table (DHT) enables efficient document query processing in Peer-to-Peer information retrieval (P2P IR), the index costs a lot to construct and it tends to be an unfair management because of the unbalanced term frequency distribution. We devised a new distributed index, named Huffman-DHT, for P2P IR. The new index uses an algorithm similar to Huffman coding with a modification to the DHT structure based on the term distribution. In a Huffman-DHT, a frequent term is assigned to a short ID and allocated a large space in the node ID space in DHT. Throuth ID management, the Huffman-DHT balances the index registration accesses among peers and reduces load concentrations. Huffman-DHT is the first approach to adapt concepts of coding theory and term frequency distribution to load balancing. We evaluated this approach in experiments using a document collection and assessed its load balancing capabilities in P2P IR. The experimental results indicated that it is most effective when the P2P system consists of about 30,000 nodes and contains many documents. Moreover, we proved that we can construct a Huffman-DHT easily by estimating the probability distribution of the term occurrence from a small number of sample documents.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.E92.D.2064/_p
Copy
@ARTICLE{e92-d_10_2064,
author={Hisashi KURASAWA, Atsuhiro TAKASU, Jun ADACHI, },
journal={IEICE TRANSACTIONS on Information},
title={Load Balancing Scheme on the Basis of Huffman Coding for P2P Information Retrieval},
year={2009},
volume={E92-D},
number={10},
pages={2064-2072},
abstract={Although a distributed index on a distributed hash table (DHT) enables efficient document query processing in Peer-to-Peer information retrieval (P2P IR), the index costs a lot to construct and it tends to be an unfair management because of the unbalanced term frequency distribution. We devised a new distributed index, named Huffman-DHT, for P2P IR. The new index uses an algorithm similar to Huffman coding with a modification to the DHT structure based on the term distribution. In a Huffman-DHT, a frequent term is assigned to a short ID and allocated a large space in the node ID space in DHT. Throuth ID management, the Huffman-DHT balances the index registration accesses among peers and reduces load concentrations. Huffman-DHT is the first approach to adapt concepts of coding theory and term frequency distribution to load balancing. We evaluated this approach in experiments using a document collection and assessed its load balancing capabilities in P2P IR. The experimental results indicated that it is most effective when the P2P system consists of about 30,000 nodes and contains many documents. Moreover, we proved that we can construct a Huffman-DHT easily by estimating the probability distribution of the term occurrence from a small number of sample documents.},
keywords={},
doi={10.1587/transinf.E92.D.2064},
ISSN={1745-1361},
month={October},}
Copy
TY - JOUR
TI - Load Balancing Scheme on the Basis of Huffman Coding for P2P Information Retrieval
T2 - IEICE TRANSACTIONS on Information
SP - 2064
EP - 2072
AU - Hisashi KURASAWA
AU - Atsuhiro TAKASU
AU - Jun ADACHI
PY - 2009
DO - 10.1587/transinf.E92.D.2064
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E92-D
IS - 10
JA - IEICE TRANSACTIONS on Information
Y1 - October 2009
AB - Although a distributed index on a distributed hash table (DHT) enables efficient document query processing in Peer-to-Peer information retrieval (P2P IR), the index costs a lot to construct and it tends to be an unfair management because of the unbalanced term frequency distribution. We devised a new distributed index, named Huffman-DHT, for P2P IR. The new index uses an algorithm similar to Huffman coding with a modification to the DHT structure based on the term distribution. In a Huffman-DHT, a frequent term is assigned to a short ID and allocated a large space in the node ID space in DHT. Throuth ID management, the Huffman-DHT balances the index registration accesses among peers and reduces load concentrations. Huffman-DHT is the first approach to adapt concepts of coding theory and term frequency distribution to load balancing. We evaluated this approach in experiments using a document collection and assessed its load balancing capabilities in P2P IR. The experimental results indicated that it is most effective when the P2P system consists of about 30,000 nodes and contains many documents. Moreover, we proved that we can construct a Huffman-DHT easily by estimating the probability distribution of the term occurrence from a small number of sample documents.
ER -