1-11hit |
This paper addresses the problem of learning Bayesian belief networks (BBN) based on the minimum description length (MDL) principle. First, we give a formula of description length based on which the MDL-based procedure learns a BBN. Secondly, we point out that the difference between the MDL-based and Cooper and Herskovits procedures is essentially in the priors rather than in the approaches (MDL and Bayesian), and recommend a class of priors from which the formula is obtained. Finally, we show as a merit of using the formula that a modified version of the Chow and Liu algorithm is obtained. The modified algorithm finds a set of trees rather than a spanning tree based on the MDL principle.
This paper explores a relationship between parameters for the context tree weighting and weights for a general model weighting technique. In particular, an algorithm is proposed that approximately computes the parameters from the weights, and a condition under which no error for the approximation occurs is derived.
This paper presents some tighter bounds on universal noiseless coding, in particular, the lowerbound tighter than Davisson et al.'s for finite sequence and the upperbound for some typical universal data compression. We find that Davisson et al.'s bound satisfies some optimization in the case of using the Jeffreys prior and also that the derived upperbound in this paper is within O(1/n) from the Clarke and Barron asymptotics in the case of some restricted typical universal data compression defined in the paper.
In this paper we consider the two-class classification problem with high-dimensional data. It is important to find a class of distributions such that we cannot expect good performance in classification for any classifier. In this paper, when two population variance-covariance matrices are different, we give a reasonable sufficient condition for distributions such that the misclassification rate converges to the worst value as the dimension of data tends to infinity for any classifier. Our results can give guidelines to decide whether or not an experiment is worth performing in many fields such as bioinformatics.
In this paper, the computational issue in the problem of learning Bayesian belief networks (BBNs) based on the minimum description length (MDL) principle is addressed. Based on an asymptotic formula of description length, we apply the branch and bound technique to finding true network structures. The resulting algorithm searches considerably saves the computation yet successfully searches the network structure with the minimum value of the formula. Thus far, there has been no search algorithm that finds the optimal solution for examples of practical size and a set of network structures in the sense of the maximum posterior probability, and heuristic searches such as K2 and K3 trap in local optima due to the greedy nature even when the sample size is large. The proposed algorithm, since it minimizes the description length, eventually selects the true network structure as the sample size goes to infinity.
This paper's main objective is to analyze several procedures which select the model g among a set G of stochastic models to minimize the value of an information criterion in the form of L(g)H[g](zn)+(k(g)/2)c(n), where zn is the n observed data emitted by an information source θ which consists of the model gθ∈G and k(gθ) mutually independent stochastic parameters in the model gθ∈G, H[g](zn) is (-1) (the maximum log likelihood value of the data zn with respect to a model g∈G), and c(n) is a predetermined function (penalty function) of n which controls the amount of penalty for increasing the model size. The result is focused on specific performances when the information criteria are applied to the framework of so-called state decomposition. Especially, upper bounds are derived of the following two performance measures for each penalty function c(n): the error probability of the model selection, and the average Kullback-Leibler information between the true information source and the estimated information source.
This paper's main objective is to clearly describe the construction of a universal code for minimizing Davisson's minimax redundancy in a range where the true model and stochastic parameters are unknown. Minimax redundancy is defined as the maximum difference between the expected persymbol code length and the per-symbol source entropy in the source range. A universal coding scheme is here formulated in terms of the weight function, i.e., a method is presented for determining a weight function which minimizes the minimax redundancy even when the true model is unknown. It is subsequently shown that the minimax redundancy achieved through the presented coding method is upper-bounded by the minimax redundancy of Rissanen's semi-predictive coding method.
Junji SHIKATA Yuliang ZHENG Joe SUZUKI Hideki IMAI
The problem we consider in this paper is whether the Menezes-Okamoto-Vanstone (MOV) reduction for attacking elliptic curve cryptosystems can be realized for genera elliptic curves. In realizing the MOV reduction, the base field Fq is extended so that the reduction to the discrete logarithm problem in a finite field is possible. Recent results by Balasubramanian and Koblitz suggest that, if l q-1, such a minimum extension degree is the minimum k such that l|qk-1, which is equivalent to the condition under which the Frey-Ruck (FR) reduction can be applied, where l is the order of the group in the elliptic curve discrete logarithm problem. Our point is that the problem of finding an l-torsion point required in evaluating the Weil pairing should be considered as well from an algorithmic point of view. In this paper, we actually propose a method which leads to a solution of the problem. In addition, our contribution allows us to draw the conclusion that the MOV reduction is indeed as powerful as the FR reduction under l q-1 not only from the viewpoint of the minimum extension degrees but also from that of the effectiveness of algorithms.
Toshiyasu MATSUSHIMA Joe SUZUKI Hiroshige INAZUMI Shigeichi HIRASAWA
In this paper, the criterion for evaluating inductive inference is considered from the information theoretical concept, especially from the view points of source coding and decision theory. Although the finite behavior of the inductive inference methods is important in the application field, there is no theoretical criterion for evaluating hypotheses at a finite stage of process. In the previous paper, we have defined an amount of semantic information included in well-formed formulas (wff) and denoted an analogy between inference and source coding. Inductive inference is regarded as source encoding, assuming that observed facts are compressed into a hypothesis similar to the way a source sequence is compressed into a codeword. By using the above definition, we can apply the criterion of source coding especially Minimum Description Length (MDL) to inductive inference. Therefore, the description length for representing a hypothesis is suitable for the criterion for evaluating a hypothesis inference at the finite stage of the inductive inference process. Moreover, by using the relation between the MDL criterion and Bayes risk, inductive inference can be interpreted with Bayes decision theory. The algorithm for the inductive inference of Horn clause is shown as an application of the proposed criterion. In the algorithm, the search space of hypotheses is restricted by using a refinement relation for increasing efficiency.
The goal of this paper is to describe a practical and efficient algorithm for computing in the Jacobian of a large class of algebraic curves over a finite field. For elliptic and hyperelliptic curves, there exists an algorithm for performing Jacobian group arithmetic in O(g2) operations in the base field, where g is the genus of a curve. The main problem in this paper is whether there exists a method to perform the arithmetic in more general curves. Galbraith, Paulus, and Smart proposed an algorithm to complete the arithmetic in O(g2) operations in the base field for the so-called superelliptic curves. We generalize the algorithm to the class of Cab curves, which includes superelliptic curves as a special case. Furthermore, in the case of Cab curves, we show that the proposed algorithm is not just general but more efficient than the previous algorithm as a parameter a in Cab curves grows large.
Kouki HOJO Boris Ya. RYABKO Joe SUZUKI
Currently, the most popular model in data compression theory is that of stationary ergodic sources. But there do exist sequences each of which is not emitted from any stationary ergodic source but can be compressed sufficiently by a certain algorithm. We estimate the size of the set of such sequences in terms of Hausdorff dimension.