1-3hit |
A family F of min-wise independent permutations is known to be a useful tool of indexing replicated documents on the Web. We say that a family F of permutations on {0,1,. . . ,n-1} is min-wise independent if for any X {0,1,. . . ,n-1} and any x X, Pr[min {π(X)} = π(x)]= ||X||-1 when π is chosen uniformly at random from F, where ||A|| is the cardinality of a finite set A. We also say that a family F of permutations on {0,1,. . . ,n-1} is d-wise independent if for any distinct x1,x2,. . . ,xd {0,1,. . . , n-1} and any distinct y1,y2,. . . ,yd {0,1,. . . , n-1}, Pr[i=1d π(xi) = π(yi)]= 1/{n(n-1)
A min-wise independent permutation family is known to be an efficient tool to estimate similarity of documents. Toward good understanding of min-wise independence, we present a characterization of exactly min-wise independent permutation families by size uniformity, which represents certain symmetry of the string representation of a family. Also, we present a general construction strategy which produce any exactly min-wise independent permutation family using this characterization.
Yoshinori TAKEI Toshiya ITOH Takahiro SHINOZAKI
A family C of min-wise independent permutations is known to be a useful tool of indexing replicated documents on the Web. For any integer n>0, a family C of permutations on [n]={1,2,. . . ,n} is said to be min-wise independent if for any (nonempty) X [n] and any x X, Pr ( min {π(X)} = π(x))= ||X||-1 when π is chosen uniformly at random from C, where ||A|| is the cardinality of a finite set A. For any integer n>0, it has been known that (1) ||C|| lcm(n,n-1,. . . ,2,1) = en-o(n) for any family C of min-wise independent permutations on [n]; (2) there exists a polynomial time samplable C family of min-wise independent permutations on [n] such that ||C|| 4n. However, it has been unclear whether there exists a min-wise independent family C such that ||C|| = lcm(n,n-1,. . . ,2,1) for each integer n>0 and how to construct such a family C of min-wise independent permutations for each integer n>0 if it exists. In this paper, we shall construct a family Fn of permutations for each integer n>0 and show that Fn is min-wise independent and ||Fn|| = lcm(n,n-1,. . . ,2,1). Moreover, we present a polynomial time sampling algorithm for the family. Thus the family Fn of min-wise independent permutations is optimal in the sense of family size and is easy to implement because of its polynomial time samplability.