The search functionality is under construction.

Keyword Search Result

[Keyword] model selection(20hit)

1-20hit
  • A Novel Approach to Address External Validity Issues in Fault Prediction Using Bandit Algorithms

    Teruki HAYAKAWA  Masateru TSUNODA  Koji TODA  Keitaro NAKASAI  Amjed TAHIR  Kwabena Ebo BENNIN  Akito MONDEN  Kenichi MATSUMOTO  

     
    LETTER-Software Engineering

      Pubricized:
    2020/10/30
      Vol:
    E104-D No:2
      Page(s):
    327-331

    Various software fault prediction models have been proposed in the past twenty years. Many studies have compared and evaluated existing prediction approaches in order to identify the most effective ones. However, in most cases, such models and techniques provide varying results, and their outcomes do not result in best possible performance across different datasets. This is mainly due to the diverse nature of software development projects, and therefore, there is a risk that the selected models lead to inconsistent results across multiple datasets. In this work, we propose the use of bandit algorithms in cases where the accuracy of the models are inconsistent across multiple datasets. In the experiment discussed in this work, we used four conventional prediction models, tested on three different dataset, and then selected the best possible model dynamically by applying bandit algorithms. We then compared our results with those obtained using majority voting. As a result, Epsilon-greedy with ϵ=0.3 showed the best or second-best prediction performance compared with using only one prediction model and majority voting. Our results showed that bandit algorithms can provide promising outcomes when used in fault prediction.

  • A Fast Cross-Validation Algorithm for Kernel Ridge Regression by Eigenvalue Decomposition

    Akira TANAKA  Hideyuki IMAI  

     
    LETTER-Numerical Analysis and Optimization

      Vol:
    E102-A No:9
      Page(s):
    1317-1320

    A fast cross-validation algorithm for model selection in kernel ridge regression problems is proposed, which is aiming to further reduce the computational cost of the algorithm proposed by An et al. by eigenvalue decomposition of a Gram matrix.

  • An Online Self-Constructive Normalized Gaussian Network with Localized Forgetting

    Jana BACKHUS  Ichigaku TAKIGAWA  Hideyuki IMAI  Mineichi KUDO  Masanori SUGIMOTO  

     
    PAPER-Neural Networks and Bioengineering

      Vol:
    E100-A No:3
      Page(s):
    865-876

    In this paper, we introduce a self-constructive Normalized Gaussian Network (NGnet) for online learning tasks. In online tasks, data samples are received sequentially, and domain knowledge is often limited. Then, we need to employ learning methods to the NGnet that possess robust performance and dynamically select an accurate model size. We revise a previously proposed localized forgetting approach for the NGnet and adapt some unit manipulation mechanisms to it for dynamic model selection. The mechanisms are improved for more robustness in negative interference prone environments, and a new merge manipulation is considered to deal with model redundancies. The effectiveness of the proposed method is compared with the previous localized forgetting approach and an established learning method for the NGnet. Several experiments are conducted for a function approximation and chaotic time series forecasting task. The proposed approach possesses robust and favorable performance in different learning situations over all testbeds.

  • Online Speaker Clustering Using Incremental Learning of an Ergodic Hidden Markov Model

    Takafumi KOSHINAKA  Kentaro NAGATOMO  Koichi SHINODA  

     
    PAPER-Speech and Hearing

      Vol:
    E95-D No:10
      Page(s):
    2469-2478

    A novel online speaker clustering method based on a generative model is proposed. It employs an incremental variant of variational Bayesian learning and provides probabilistic (non-deterministic) decisions for each input utterance, on the basis of the history of preceding utterances. It can be expected to be robust against errors in cluster estimation and the classification of utterances, and hence to be applicable to many real-time applications. Experimental results show that it produces 50% fewer classification errors than does a conventional online method. They also show that it is possible to reduce the number of speech recognition errors by combining the method with unsupervised speaker adaptation.

  • Nonparametric Regression Method Based on Orthogonalization and Thresholding

    Katsuyuki HAGIWARA  

     
    PAPER-Artificial Intelligence, Data Mining

      Vol:
    E94-D No:8
      Page(s):
    1610-1619

    In this paper, we consider a nonparametric regression problem using a learning machine defined by a weighted sum of fixed basis functions, where the number of basis functions, or equivalently, the number of weights, is equal to the number of training data. For the learning machine, we propose a training scheme that is based on orthogonalization and thresholding. On the basis of the scheme, vectors of basis function outputs are orthogonalized and coefficients of the orthogonalized vectors are estimated instead of weights. The coefficient is set to zero if it is less than a predetermined threshold level assigned component-wise to each coefficient. We then obtain the resulting weight vector by transforming the thresholded coefficients. In this training scheme, we propose asymptotically reasonable threshold levels to distinguish contributed components from unnecessary ones. To see how this works in a simple case, we derive an upper bound for the generalization error of the training scheme with the given threshold levels. It tells us that an increase in the generalization error is of O(log n/n) when there is a sparse representation of a target function in an orthogonal domain. In implementing the training scheme, eigen-decomposition or the Gram–Schmidt procedure is employed for orthogonalization, and the corresponding training methods are referred to as OHTED and OHTGS. Furthermore, modified versions of OHTED and OHTGS, called OHTED2 and OHTGS2 respectively, are proposed for reduced estimation bias. On real benchmark datasets, OHTED2 and OHTGS2 are found to exhibit relatively good generalization performance. In addition, OHTGS2 is found to be obtain a sparse representation of a target function in terms of the basis functions.

  • Optimal Gaussian Kernel Parameter Selection for SVM Classifier

    Xu YANG  HuiLin XIONG  Xin YANG  

     
    PAPER-Pattern Recognition

      Vol:
    E93-D No:12
      Page(s):
    3352-3358

    The performance of the kernel-based learning algorithms, such as SVM, depends heavily on the proper choice of the kernel parameter. It is desirable for the kernel machines to work on the optimal kernel parameter that adapts well to the input data and the learning tasks. In this paper, we present a novel method for selecting Gaussian kernel parameter by maximizing a class separability criterion, which measures the data distribution in the kernel-induced feature space, and is invariant under any non-singular linear transformation. The experimental results show that both the class separability of the data in the kernel-induced feature space and the classification performance of the SVM classifier are improved by using the optimal kernel parameter.

  • Geometric BIC

    Kenichi KANATANI  

     
    PAPER-Image Recognition, Computer Vision

      Vol:
    E93-D No:1
      Page(s):
    144-151

    The "geometric AIC" and the "geometric MDL" have been proposed as model selection criteria for geometric fitting problems. These correspond to Akaike's "AIC" and Rissanen's "BIC" well known in the statistical estimation framework. Another well known criterion is Schwarz' "BIC", but its counterpart for geometric fitting has not been known. This paper introduces the corresponding criterion, which we call the "geometric BIC", and shows that it is of the same form as the geometric MDL. Our result gives a justification to the geometric MDL from the Bayesian principle.

  • A New Meta-Criterion for Regularized Subspace Information Criterion

    Yasushi HIDAKA  Masashi SUGIYAMA  

     
    PAPER-Pattern Recognition

      Vol:
    E90-D No:11
      Page(s):
    1779-1786

    In order to obtain better generalization performance in supervised learning, model parameters should be determined appropriately, i.e., they should be determined so that the generalization error is minimized. However, since the generalization error is inaccessible in practice, the model parameters are usually determined so that an estimator of the generalization error is minimized. The regularized subspace information criterion (RSIC) is such a generalization error estimator for model selection. RSIC includes an additional regularization parameter and it should be determined appropriately for better model selection. A meta-criterion for determining the regularization parameter has also been proposed and shown to be useful in practice. In this paper, we show that there are several drawbacks in the existing meta-criterion and give an alternative meta-criterion that can solve the problems. Through simulations, we show that the use of the new meta-criterion further improves the model selection performance.

  • Analytic Optimization of Adaptive Ridge Parameters Based on Regularized Subspace Information Criterion

    Shun GOKITA  Masashi SUGIYAMA  Keisuke SAKURAI  

     
    PAPER-Neural Networks and Bioengineering

      Vol:
    E90-A No:11
      Page(s):
    2584-2592

    In order to obtain better learning results in supervised learning, it is important to choose model parameters appropriately. Model selection is usually carried out by preparing a finite set of model candidates, estimating a generalization error for each candidate, and choosing the best one from the candidates. If the number of candidates is increased in this procedure, the optimization quality may be improved. However, this in turn increases the computational cost. In this paper, we focus on a generalization error estimator called the regularized subspace information criterion and derive an analytic form of the optimal model parameter over a set of infinitely many model candidates. This allows us to maximize the optimization quality while the computational cost is kept moderate.

  • Generalization Error Estimation for Non-linear Learning Methods

    Masashi SUGIYAMA  

     
    LETTER-Neural Networks and Bioengineering

      Vol:
    E90-A No:7
      Page(s):
    1496-1499

    Estimating the generalization error is one of the key ingredients of supervised learning since a good generalization error estimator can be used for model selection. An unbiased generalization error estimator called the subspace information criterion (SIC) is shown to be useful for model selection, but its range of application is limited to linear learning methods. In this paper, we extend SIC to be applicable to non-linear learning.

  • Incremental Leaning and Model Selection for Radial Basis Function Network through Sleep

    Koichiro YAMAUCHI  Jiro HAYAMI  

     
    PAPER-Algorithm Theory

      Vol:
    E90-D No:4
      Page(s):
    722-735

    The model selection for neural networks is an essential procedure to get not only high levels of generalization but also a compact data model. Especially in terms of getting the compact model, neural networks usually outperform other kinds of machine learning methods. Generally, models are selected by trial and error testing using whole learning samples given in advance. In many cases, however, it is difficult and time consuming to prepare whole learning samples in advance. To overcome these inconveniences, we propose a hybrid on-line learning system for a radial basis function (RBF) network that repeats quick learning of novel instances by rote during on-line periods (awake phases) and repeats pseudo rehearsal for model selection during out-of-service periods (sleep phases). We call this system Incremental Learning with Sleep (ILS). During sleep phases, the system basically stops the learning of novel instances, and during awake phases, the system responds quickly. We also extended the system so as to shorten the periodic sleep periods. Experimental results showed the system selects more compact data models than those selected by other machine learning systems.

  • Analytic Optimization of Shrinkage Parameters Based on Regularized Subspace Information Criterion

    Masashi SUGIYAMA  Keisuke SAKURAI  

     
    PAPER-Neural Networks and Bioengineering

      Vol:
    E89-A No:8
      Page(s):
    2216-2225

    For obtaining a higher level of generalization capability in supervised learning, model parameters should be optimized, i.e., they should be determined in such a way that the generalization error is minimized. However, since the generalization error is inaccessible in practice, model parameters are usually determined in such a way that an estimate of the generalization error is minimized. A standard procedure for model parameter optimization is to first prepare a finite set of candidates of model parameter values, estimate the generalization error for each candidate, and then choose the best one from the candidates. If the number of candidates is increased in this procedure, the optimization quality may be improved. However, this in turn increases the computational cost. In this paper, we give methods for analytically finding the optimal model parameter value from a set of infinitely many candidates. This maximally enhances the optimization quality while the computational cost is kept reasonable.

  • Active Learning with Model Selection -- Simultaneous Optimization of Sample Points and Models for Trigonometric Polynomial Models

    Masashi SUGIYAMA  Hidemitsu OGAWA  

     
    PAPER-Pattern Recognition

      Vol:
    E86-D No:12
      Page(s):
    2753-2763

    In supervised learning, the selection of sample points and models is crucial for acquiring a higher level of the generalization capability. So far, the problems of active learning and model selection have been independently studied. If sample points and models are simultaneously optimized, then a higher level of the generalization capability is expected. We call this problem active learning with model selection. However, active learning with model selection can not be generally solved by simply combining existing active learning and model selection techniques because of the active learning/model selection dilemma: the model should be fixed for selecting sample points and conversely the sample points should be fixed for selecting models. In this paper, we show that the dilemma can be dissolved if there is a set of sample points that is optimal for all models in consideration. Based on this idea, we give a practical procedure for active learning with model selection in trigonometric polynomial models. The effectiveness of the proposed procedure is demonstrated through computer simulations.

  • Model Selection with Componentwise Shrinkage in Orthogonal Regression

    Katsuyuki HAGIWARA  

     
    PAPER-Digital Signal Processing

      Vol:
    E86-A No:7
      Page(s):
    1749-1758

    In the problem of determining the major frequency components of a signal disturbed by noise, a model selection criterion has been proposed. In this paper, the criterion has been extended to cover a penalized cost function that yields a componentwise shrinkage estimator, and it exhibited a consistent model selection when the proposed criterion was used. Then, a simple numerical simulation was conducted, and it was found that the proposed criterion with an empirically estimated componentwise shrinkage estimator outperforms the original criterion.

  • Subspace Information Criterion for Image Restoration--Optimizing Parameters in Linear Filters

    Masashi SUGIYAMA  Daisuke IMAIZUMI  Hidemitsu OGAWA  

     
    PAPER-Image Processing, Image Pattern Recognition

      Vol:
    E84-D No:9
      Page(s):
    1249-1256

    Most of the image restoration filters proposed so far include parameters that control the restoration properties. For bringing out the optimal restoration performance, these parameters should be determined so as to minimize a certain error measure such as the mean squared error (MSE) between the restored image and original image. However, this is not generally possible since the unknown original image itself is required for evaluating MSE. In this paper, we derive an estimator of MSE called the subspace information criterion (SIC), and propose determining the parameter values so that SIC is minimized. For any linear filter, SIC gives an unbiased estimate of the expected MSE over the noise. Therefore, the proposed method is valid for any linear filter. Computer simulations with the moving-average filter demonstrate that SIC gives a very accurate estimate of MSE in various situations, and the proposed procedure actually gives the optimal parameter values that minimize MSE.

  • The Problem of the Fading Model Selection

    Marcelo Agustin TANEDA  Jun-ichi TAKADA  Kiyomichi ARAKI  

     
    PAPER-Sensing

      Vol:
    E84-B No:3
      Page(s):
    660-666

    Many experimentally and theoretically based models have been proposed to predict, quantitatively evaluate, and combat the fading phenomenon in mobile communication systems. However, to the best of the authors' knowledge, up to now there is no objective method to determine which is the most suitable distribution to model the fading phenomenon based on experimental data. In this work, the Minimum Description Length (MDL) criterion for model selection is proposed for that purpose. Furthermore, the MDL analysis is performed for some of the most widely used fading models based on measurements taken in a sub-urban environment.

  • Moving Object Detection from Optical Flow without Empirical Thresholds

    Naoya OHTA  Kenichi KANATANI  Kazuhiro KIMURA  

     
    LETTER-Image Processing,Computer Graphics and Pattern Recognition

      Vol:
    E81-D No:2
      Page(s):
    243-245

    We show that moving objects can be detected from optical flow without using any knowledge about the magnitude of the noise in the flow or any thresholds to be adjusted empirically. The underlying principle is viewing a particular interpretation about the flow as a geometric model and comparing the relative "goodness" of candidate models measured by the geometric AIC.

  • Automatic Recognition of Regular Figures by Geometric AIC

    Iman TRIONO  Naoya OHTA  Kenichi KANATANI  

     
    LETTER-Image Processing,Computer Graphics and Pattern Recognition

      Vol:
    E81-D No:2
      Page(s):
    246-248

    We implement a graphical interface that automatically transforms a figure input by a mouse into a regular figure which the system infers is the closest to the input. The difficulty lies in the fact that the classes into which the input is to be classified have inclusion relations, which prohibit us from using a simple distance criterion. In this letter, we show that this problem can be resolved by introducing the geometric AIC.

  • Infinity and Planarity Test for Stereo Vision

    Yasushi KANAZAWA  Kenichi KANATANI  

     
    PAPER-Image Processing,Computer Graphics and Pattern Recognition

      Vol:
    E80-D No:8
      Page(s):
    774-779

    Introducing a mathematical model of noise in stereo images, we propose a new criterion for intelligent statistical inference about the scene we are viewing by using the geometric information criterion (geometric AIC). Using synthetic and real-image experiments, we demonstrate that a robot can test whether or not the object is located very far away or the object is a planar surface without using any knowledge about the noise magnitude or any empirically adjustable thresholds.

  • Evaluations for Estimation of an Information Source Based on State Decomposition

    Joe SUZUKI  

     
    PAPER-Information Theory and Coding Theory

      Vol:
    E76-A No:7
      Page(s):
    1240-1251

    This paper's main objective is to analyze several procedures which select the model g among a set G of stochastic models to minimize the value of an information criterion in the form of L(g)H[g](zn)+(k(g)/2)c(n), where zn is the n observed data emitted by an information source θ which consists of the model gθ∈G and k(gθ) mutually independent stochastic parameters in the model gθ∈G, H[g](zn) is (-1) (the maximum log likelihood value of the data zn with respect to a model g∈G), and c(n) is a predetermined function (penalty function) of n which controls the amount of penalty for increasing the model size. The result is focused on specific performances when the information criteria are applied to the framework of so-called state decomposition. Especially, upper bounds are derived of the following two performance measures for each penalty function c(n): the error probability of the model selection, and the average Kullback-Leibler information between the true information source and the estimated information source.