The search functionality is under construction.
The search functionality is under construction.

Author Search Result

[Author] Katsuyuki HAGIWARA(7hit)

1-7hit
  • Nonparametric Regression Method Based on Orthogonalization and Thresholding

    Katsuyuki HAGIWARA  

     
    PAPER-Artificial Intelligence, Data Mining

      Vol:
    E94-D No:8
      Page(s):
    1610-1619

    In this paper, we consider a nonparametric regression problem using a learning machine defined by a weighted sum of fixed basis functions, where the number of basis functions, or equivalently, the number of weights, is equal to the number of training data. For the learning machine, we propose a training scheme that is based on orthogonalization and thresholding. On the basis of the scheme, vectors of basis function outputs are orthogonalized and coefficients of the orthogonalized vectors are estimated instead of weights. The coefficient is set to zero if it is less than a predetermined threshold level assigned component-wise to each coefficient. We then obtain the resulting weight vector by transforming the thresholded coefficients. In this training scheme, we propose asymptotically reasonable threshold levels to distinguish contributed components from unnecessary ones. To see how this works in a simple case, we derive an upper bound for the generalization error of the training scheme with the given threshold levels. It tells us that an increase in the generalization error is of O(log n/n) when there is a sparse representation of a target function in an orthogonal domain. In implementing the training scheme, eigen-decomposition or the Gram–Schmidt procedure is employed for orthogonalization, and the corresponding training methods are referred to as OHTED and OHTGS. Furthermore, modified versions of OHTED and OHTGS, called OHTED2 and OHTGS2 respectively, are proposed for reduced estimation bias. On real benchmark datasets, OHTED2 and OHTGS2 are found to exhibit relatively good generalization performance. In addition, OHTGS2 is found to be obtain a sparse representation of a target function in terms of the basis functions.

  • On the Expected Prediction Error of Orthogonal Regression with Variable Components

    Katsuyuki HAGIWARA  Hiroshi ISHITANI  

     
    PAPER-Algorithms and Data Structures

      Vol:
    E89-A No:12
      Page(s):
    3699-3709

    In this article, we considered the asymptotic expectations of the prediction error and the fitting error of a regression model, in which the component functions are chosen from a finite set of orthogonal functions. Under the least squares estimation, we showed that the asymptotic bias in estimating the prediction error based on the fitting error includes the true number of components, which is essentially unknown in practical applications. On the other hand, under a suitable shrinkage method, we showed that an asymptotically unbiased estimate of the prediction error is given by the fitting error plus a known term except the noise variance.

  • On Gradient Descent Training Under Data Augmentation with On-Line Noisy Copies

    Katsuyuki HAGIWARA  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2023/06/12
      Vol:
    E106-D No:9
      Page(s):
    1537-1545

    In machine learning, data augmentation (DA) is a technique for improving the generalization performance of models. In this paper, we mainly consider gradient descent of linear regression under DA using noisy copies of datasets, in which noise is injected into inputs. We analyze the situation where noisy copies are newly generated and injected into inputs at each epoch, i.e., the case of using on-line noisy copies. Therefore, this article can also be viewed as an analysis on a method using noise injection into a training process by DA. We considered the training process under three training situations which are the full-batch training under the sum of squared errors, and full-batch and mini-batch training under the mean squared error. We showed that, in all cases, training for DA with on-line copies is approximately equivalent to the l2 regularization training for which variance of injected noise is important, whereas the number of copies is not. Moreover, we showed that DA with on-line copies apparently leads to an increase of learning rate in full-batch condition under the sum of squared errors and the mini-batch condition under the mean squared error. The apparent increase in learning rate and regularization effect can be attributed to the original input and additive noise in noisy copies, respectively. These results are confirmed in a numerical experiment in which we found that our result can be applied to usual off-line DA in an under-parameterization scenario and can not in an over-parametrization scenario. Moreover, we experimentally investigated the training process of neural networks under DA with off-line noisy copies and found that our analysis on linear regression can be qualitatively applied to neural networks.

  • A Sparse Modeling Method Based on Reduction of Cost Function in Regularized Forward Selection

    Katsuyuki HAGIWARA  

     
    PAPER-Artificial Intelligence, Data Mining

      Vol:
    E97-D No:1
      Page(s):
    98-106

    Regularized forward selection is viewed as a method for obtaining a sparse representation in a nonparametric regression problem. In regularized forward selection, regression output is represented by a weighted sum of several significant basis functions that are selected from among a large number of candidates by using a greedy training procedure in terms of a regularized cost function and applying an appropriate model selection method. In this paper, we propose a model selection method in regularized forward selection. For the purpose, we focus on the reduction of a cost function, which is brought by appending a new basis function in a greedy training procedure. We first clarify a bias and variance decomposition of the cost reduction and then derive a probabilistic upper bound for the variance of the cost reduction under some conditions. The derived upper bound reflects an essential feature of the greedy training procedure; i.e., it selects a basis function which maximally reduces the cost function. We then propose a thresholding method for determining significant basis functions by applying the derived upper bound as a threshold level and effectively combining it with the leave-one-out cross validation method. Several numerical experiments show that generalization performance of the proposed method is comparable to that of the other methods while the number of basis functions selected by the proposed method is greatly smaller than by the other methods. We can therefore say that the proposed method is able to yield a sparse representation while keeping a relatively good generalization performance. Moreover, our method has an advantage that it is free from a selection of a regularization parameter.

  • Bridging between Soft and Hard Thresholding by Scaling

    Katsuyuki HAGIWARA  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2022/06/09
      Vol:
    E105-D No:9
      Page(s):
    1529-1536

    This study considered an extension of a sparse regularization method with scaling, especially in thresholding methods that are simple and typical examples of sparse modeling. In this study, in the setting of a non-parametric orthogonal regression problem, we developed and analyzed a thresholding method in which soft thresholding estimators are independently expanded by empirical scaling values. The scaling values have a common hyper-parameter that is an order of expansion of an ideal scaling value to achieve hard thresholding. We simply refer to this estimator as a scaled soft thresholding estimator. The scaled soft thresholding method is a bridge method between soft and hard thresholding methods. This new estimator is indeed consistent with an adaptive LASSO estimator in the orthogonal case; i.e., it is thus an another derivation of an adaptive LASSO estimator. It is a general method that includes soft thresholding and non-negative garrote as special cases. We subsequently derived the degree of freedom of the scaled soft thresholding in calculating the Stein's unbiased risk estimate. We found that it is decomposed into the degree of freedom of soft thresholding and the remainder term connecting to the hard thresholding. As the degree of freedom reflects the degree of over-fitting, this implies that the scaled soft thresholding has an another source of over-fitting in addition to the number of un-removed components. The theoretical result was verified by a simple numerical example. In this process, we also focused on the non-monotonicity in the above remainder term of the degree of freedom and found that, in a sparse and large sample setting, it is mainly caused by useless components that are not related to the target function.

  • Model Selection with Componentwise Shrinkage in Orthogonal Regression

    Katsuyuki HAGIWARA  

     
    PAPER-Digital Signal Processing

      Vol:
    E86-A No:7
      Page(s):
    1749-1758

    In the problem of determining the major frequency components of a signal disturbed by noise, a model selection criterion has been proposed. In this paper, the criterion has been extended to cover a penalized cost function that yields a componentwise shrinkage estimator, and it exhibited a consistent model selection when the proposed criterion was used. Then, a simple numerical simulation was conducted, and it was found that the proposed criterion with an empirically estimated componentwise shrinkage estimator outperforms the original criterion.

  • A Scaling and Non-Negative Garrote in Soft-Thresholding

    Katsuyuki HAGIWARA  

     
    PAPER-Artificial Intelligence, Data Mining

      Pubricized:
    2017/07/27
      Vol:
    E100-D No:11
      Page(s):
    2702-2710

    Soft-thresholding is a sparse modeling method typically applied to wavelet denoising in statistical signal processing. It is also important in machine learning since it is an essential nature of the well-known LASSO (Least Absolute Shrinkage and Selection Operator). It is known that soft-thresholding, thus, LASSO suffers from a problem of dilemma between sparsity and generalization. This is caused by excessive shrinkage at a sparse representation. There are several methods for improving this problem in the field of signal processing and machine learning. In this paper, we considered to extend and analyze a method of scaling of soft-thresholding estimators. In a setting of non-parametric orthogonal regression problem including discrete wavelet transform, we introduced component-wise and data-dependent scaling that is indeed identical to non-negative garrote. We here considered a case where a parameter value of soft-thresholding is chosen from absolute values of the least squares estimates, by which the model selection problem reduces to the determination of the number of non-zero coefficient estimates. In this case, we firstly derived a risk and construct SURE (Stein's unbiased risk estimator) that can be used for determining the number of non-zero coefficient estimates. We also analyzed some properties of the risk curve and found that our scaling method with the derived SURE is possible to yield a model with low risk and high sparsity compared to a naive soft-thresholding method with SURE. This theoretical speculation was verified by a simple numerical experiment of wavelet denoising.