IEICE global.ieice.org Site

Keyword Search Result

[Keyword] join algorithm(2hit)

1-2hit

Grid-Based Parallel Algorithms of Join Queries for Analyzing Multi-Dimensional Data on MapReduce
Miyoung JANG Jae-Woo CHANG

PAPER-Technologies for Knowledge Support Platform

Pubricized:
2018/01/19
Vol:
E101-D No:4
Page(s):
964-976
Recently, the join processing of large-scale datasets in MapReduce environments has become an important issue. However, the existing MapReduce-based join algorithms suffer from too much overhead for constructing and updating the data index. Moreover, the similarity computation cost is high because the existing algorithms partition data without considering the data distribution. In this paper, we propose two grid-based join algorithms for MapReduce. First, we propose a similarity join algorithm that evenly distributes join candidates using a dynamic grid index, which partitions data considering data density and similarity threshold. We use a bottom-up approach by merging initial grid cells into partitions and assigning them to MapReduce jobs. Second, we propose a k-NN join query processing algorithm for MapReduce. To reduce the data transmission cost, we determine an optimal grid cell size by considering the data distribution of randomly selected samples. Then, we perform kNN join by assigning the only related join data to a reducer. From performance analysis, we show that our similarity join query processing algorithm and our k-NN join algorithm outperform existing algorithms by up to 10 times, in terms of query processing time.
Tag-Partitioned Join
Jeong Uk KIM Jae Moon LEE Myunghwan KIM

PAPER-Databases

Vol:
E75-D No:3
Page(s):
291-297
A tag-partitioned join algorithm is described. The algorithm partitions only one relation, while other partition-based algorithms partition both relations. It is performed as the joinable tuples of one relation are rearranged and some of them are duplicated according to the original sequence of the join attribute values of the other relation. To do this, the algorithm first finds the positions of all the tuples of the other relation which are joinable with each tuple of one relation, and then partitions joinable tuples of one relation into buckets by using the positions found. Final joining is performed on the partitioned relation and the other relation. We analyze and compare the performance of the algorithm with that of other partition-based join algorithms. The comparison shows that our method is better than other partition-based methods under the practical values of the analysis parameters.

Keyword Search Result

[Keyword] join algorithm(2hit)

Grid-Based Parallel Algorithms of Join Queries for Analyzing Multi-Dimensional Data on MapReduce

Tag-Partitioned Join

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles