1-4hit |
Yoichi SASAKI Tetsuo SHIBUYA Kimihito ITO Hiroki ARIMURA
In this paper, we study the approximate point set matching (APSM) problem with minimum RMSD score under translation, rotation, and one-to-one correspondence in d-dimension. Since most of the previous works about APSM problems use similality scores that do not especially care about one-to-one correspondence between points, such as Hausdorff distance, we cannot easily apply previously proposed methods to our APSM problem. So, we focus on speed-up of exhaustive search algorithms that can find all approximate matches. First, we present an efficient branch-and-bound algorithm using a novel lower bound function of the minimum RMSD score for the enumeration version of APSM problem. Then, we modify this algorithm for the optimization version. Next, we present another algorithm that runs fast with high probability when a set of parameters are fixed. Experimental results on both synthetic datasets and real 3-D molecular datasets showed that our branch-and-bound algorithm achieved significant speed-up over the naive algorithm still keeping the advantage of generating all answers.
Kazuya HARAGUCHI Mutsunori YAGIURA Endre BOROS Toshihide IBARAKI
We consider a data set in which each example is an n-dimensional Boolean vector labeled as true or false. A pattern is a co-occurrence of a particular value combination of a given subset of the variables. If a pattern appears frequently in the true examples and infrequently in the false examples, we consider it a good pattern. In this paper, we discuss the problem of determining the data size needed for removing "deceptive" good patterns; in a data set of a small size, many good patterns may appear superficially, simply by chance, independently of the underlying structure. Our hypothesis is that, in order to remove such deceptive good patterns, the data set should contain a greater number of examples than that at which a random data set contains few good patterns. We justify this hypothesis by computational studies. We also derive a theoretical upper bound on the needed data size in view of our hypothesis.
Daijiroh SUGIYAMA Jun-ichi IMURA
This paper proposes a notion of a controllability measure of discrete-time piecewise affine systems, which is a natural extension of the controllability gramian of linear systems. Although this measure is calculated in a probabilistic way, it may be applied to control of biological systems for providing a policy to experiments for pharmaceutical developments. Thus an application to gene regulatory control of luminescence in the marine bacterium modeled by the piecewise affine system is discussed in this paper.
Ayad SOUFIANE Tsuyoshi ITOKAWA Ryozo NAKAMURA
The linear hashing is a well-known dynamic hashing algorithm designed for internal main memory as well as external secondary memory. Traditional analysis of this search algorithm has been proposed under the assumption that all keys are uniformly accessed. In this paper, we present a discrete analysis of the average search cost of the linear dynamic hashing algorithm for internal main memory in consideration of the frequency of access on each key. In the proposed discrete analysis, the number of probes itself is regarded as a random variable and its probability distribution is derived concretely. Furthermore, the evaluate formula derived from the proposed analysis can exactly evaluate the average search cost in conformity with any probability distribution of the frequency of access. The proposed analysis is compared to the traditional one provided that the frequency of access on each key is uniform, and the differences are discussed.