1-10hit |
Ken-ichiro MORIDOMI Kohei HATANO Eiji TAKIMOTO
We prove generalization error bounds of classes of low-rank matrices with some norm constraints for collaborative filtering tasks. Our bounds are tighter, compared to known bounds using rank or the related quantity only, by taking the additional L1 and L∞ constraints into account. Also, we show that our bounds on the Rademacher complexity of the classes are optimal.
The solution of the standard 2-norm-based multiple kernel regression problem and the theoretical limit of the considered model space are discussed in this paper. We prove that 1) The solution of the 2-norm-based multiple kernel regressor constructed by a given training data set does not generally attain the theoretical limit of the considered model space in terms of the generalization errors, even if the training data set is noise-free, 2) The solution of the 2-norm-based multiple kernel regressor is identical to the solution of the single kernel regressor under a noise free setting, in which the adopted single kernel is the sum of the same kernels used in the multiple kernel regressor; and it is also true for a noisy setting with the 2-norm-based regularizer. The first result motivates us to develop a novel framework for the multiple kernel regression problems which yields a better solution close to the theoretical limit, and the second result implies that it is enough to use the single kernel regressors with the sum of given multiple kernels instead of the multiple kernel regressors as long as the 2-norm based criterion is used.
Gentle AdaBoost is widely used in object detection and pattern recognition due to its efficiency and stability. To focus on instances with small margins, Gentle AdaBoost assigns larger weights to these instances during the training. However, misclassification of small-margin instances can still occur, which will cause the weights of these instances to become larger and larger. Eventually, several large-weight instances might dominate the whole data distribution, encouraging Gentle AdaBoost to choose weak hypotheses that fit only these instances in the late training phase. This phenomenon, known as “classifier distortion”, degrades the generalization error and can easily lead to overfitting since the deviation of all selected weak hypotheses is increased by the late-selected ones. To solve this problem, we propose a new variant which we call “Penalized AdaBoost”. In each iteration, our approach not only penalizes the misclassification of instances with small margins but also restrains the weight increase for instances with minimal margins. Our method performs better than Gentle AdaBoost because it avoids the “classifier distortion” effectively. Experiments show that our method achieves far lower generalization errors and a similar training speed compared with Gentle AdaBoost.
Akira TANAKA Hirofumi TAKEBAYASHI Ichigaku TAKIGAWA Hideyuki IMAI Mineichi KUDO
For the last few decades, learning with multiple kernels, represented by the ensemble kernel regressor and the multiple kernel regressor, has attracted much attention in the field of kernel-based machine learning. Although their efficacy was investigated numerically in many works, their theoretical ground is not investigated sufficiently, since we do not have a theoretical framework to evaluate them. In this paper, we introduce a unified framework for evaluating kernel regressors with multiple kernels. On the basis of the framework, we analyze the generalization errors of the ensemble kernel regressor and the multiple kernel regressor, and give a sufficient condition for the ensemble kernel regressor to outperform the multiple kernel regressor in terms of the generalization error in noise-free case. We also show that each kernel regressor can be better than the other without the sufficient condition by giving examples, which supports the importance of the sufficient condition.
Many learning machines such as normal mixtures and layered neural networks are not regular but singular statistical models, because the map from a parameter to a probability distribution is not one-to-one. The conventional statistical asymptotic theory can not be applied to such learning machines because the likelihood function can not be approximated by any normal distribution. Recently, new statistical theory has been established based on algebraic geometry and it was clarified that the generalization and training errors are determined by two birational invariants, the real log canonical threshold and the singular fluctuation. However, their concrete values are left unknown. In the present paper, we propose a new concept, a quasi-regular case in statistical learning theory. A quasi-regular case is not a regular case but a singular case, however, it has the same property as a regular case. In fact, we prove that, in a quasi-regular case, two birational invariants are equal to each other, resulting that the symmetry of the generalization and training errors holds. Moreover, the concrete values of two birational invariants are explicitly obtained, hence the quasi-regular case is useful to study statistical learning theory.
Kento NAKAO Yuta NARUKAWA Seiji MIYOSHI
We consider a model composed of nonlinear perceptrons and analytically investigate its generalization performance using correlated examples in the framework of on-line learning by a statistical mechanical method. In Hebbian and AdaTron learning, the larger the number of examples used in an update, the slower the learning. In contrast, Perceptron learning does not exhibit such behaviors, and the learning becomes fast in some time region.
Ryosuke MIYOSHI Yutaka MAEDA Seiji MIYOSHI
Weight perturbation learning was proposed as a learning rule in which perturbation is added to the variable parameters of learning machines. The generalization performance of weight perturbation learning was analyzed by statistical mechanical methods and was found to have the same asymptotic generalization property as perceptron learning. In this paper we consider the difference between perceptron learning and AdaTron learning, both of which are well-known learning rules. By applying this difference to weight perturbation learning, we propose adaptive weight perturbation learning. The generalization performance of the proposed rule is analyzed by statistical mechanical methods, and it is shown that the proposed learning rule has an outstanding asymptotic property equivalent to that of AdaTron learning.
Seiji MIYOSHI Hiroomi HIKAWA Yutaka MAEDA
We show that simultaneous perturbation can be used as an algorithm for on-line learning, and we report our theoretical investigation on generalization performance obtained with a statistical mechanical method. Asymptotic behavior of generalization error using this algorithm is on the order of t to the minus one-third power, where t is the learning time or the number of learning examples. This order is the same as that using well-known perceptron learning.
Chihiro SEKI Shingo SAKURAI Masafumi MATSUNO Seiji MIYOSHI
In this paper we analytically investigate the generalization performance of learning using correlated inputs in the framework of on-line learning with a statistical mechanical method. We consider a model composed of linear perceptrons with Gaussian noise. First, we analyze the case of the gradient method. We analytically clarify that the larger the correlation among inputs is or the larger the number of inputs is, the stricter the condition the learning rate should satisfy is, and the slower the learning speed is. Second, we treat the block orthogonal projection learning as an alternative learning rule and derive the theory. In a noiseless case, the learning speed does not depend on the correlation and is proportional to the number of inputs used in an update. The learning speed is identical to that of the gradient method with uncorrelated inputs. On the other hand, when there is noise, the larger the correlation among inputs is, the slower the learning speed is and the larger the residual generalization error is.
Shun GOKITA Masashi SUGIYAMA Keisuke SAKURAI
In order to obtain better learning results in supervised learning, it is important to choose model parameters appropriately. Model selection is usually carried out by preparing a finite set of model candidates, estimating a generalization error for each candidate, and choosing the best one from the candidates. If the number of candidates is increased in this procedure, the optimization quality may be improved. However, this in turn increases the computational cost. In this paper, we focus on a generalization error estimator called the regularized subspace information criterion and derive an analytic form of the optimal model parameter over a set of infinitely many model candidates. This allows us to maximize the optimization quality while the computational cost is kept moderate.