On Gradient Descent Training Under Data Augmentation with On-Line Noisy Copies

Katsuyuki HAGIWARA

doi:10.1587/transinf.2023EDP7008

IEICE TRANSACTIONS on Information

On Gradient Descent Training Under Data Augmentation with On-Line Noisy Copies

Katsuyuki HAGIWARA

Full Text Views

0

Cite this

Summary :

In machine learning, data augmentation (DA) is a technique for improving the generalization performance of models. In this paper, we mainly consider gradient descent of linear regression under DA using noisy copies of datasets, in which noise is injected into inputs. We analyze the situation where noisy copies are newly generated and injected into inputs at each epoch, i.e., the case of using on-line noisy copies. Therefore, this article can also be viewed as an analysis on a method using noise injection into a training process by DA. We considered the training process under three training situations which are the full-batch training under the sum of squared errors, and full-batch and mini-batch training under the mean squared error. We showed that, in all cases, training for DA with on-line copies is approximately equivalent to the l₂ regularization training for which variance of injected noise is important, whereas the number of copies is not. Moreover, we showed that DA with on-line copies apparently leads to an increase of learning rate in full-batch condition under the sum of squared errors and the mini-batch condition under the mean squared error. The apparent increase in learning rate and regularization effect can be attributed to the original input and additive noise in noisy copies, respectively. These results are confirmed in a numerical experiment in which we found that our result can be applied to usual off-line DA in an under-parameterization scenario and can not in an over-parametrization scenario. Moreover, we experimentally investigated the training process of neural networks under DA with off-line noisy copies and found that our analysis on linear regression can be qualitatively applied to neural networks.

Publication: IEICE TRANSACTIONS on Information Vol.E106-D No.9 pp.1537-1545

Publication Date: 2023/09/01

Publicized: 2023/06/12

Online ISSN: 1745-1361

DOI: 10.1587/transinf.2023EDP7008

Type of Manuscript: PAPER

Category: Artificial Intelligence, Data Mining

Cite this

Copy

Katsuyuki HAGIWARA, "On Gradient Descent Training Under Data Augmentation with On-Line Noisy Copies" in IEICE TRANSACTIONS on Information, vol. E106-D, no. 9, pp. 1537-1545, September 2023, doi: 10.1587/transinf.2023EDP7008.
Abstract: In machine learning, data augmentation (DA) is a technique for improving the generalization performance of models. In this paper, we mainly consider gradient descent of linear regression under DA using noisy copies of datasets, in which noise is injected into inputs. We analyze the situation where noisy copies are newly generated and injected into inputs at each epoch, i.e., the case of using on-line noisy copies. Therefore, this article can also be viewed as an analysis on a method using noise injection into a training process by DA. We considered the training process under three training situations which are the full-batch training under the sum of squared errors, and full-batch and mini-batch training under the mean squared error. We showed that, in all cases, training for DA with on-line copies is approximately equivalent to the l₂ regularization training for which variance of injected noise is important, whereas the number of copies is not. Moreover, we showed that DA with on-line copies apparently leads to an increase of learning rate in full-batch condition under the sum of squared errors and the mini-batch condition under the mean squared error. The apparent increase in learning rate and regularization effect can be attributed to the original input and additive noise in noisy copies, respectively. These results are confirmed in a numerical experiment in which we found that our result can be applied to usual off-line DA in an under-parameterization scenario and can not in an over-parametrization scenario. Moreover, we experimentally investigated the training process of neural networks under DA with off-line noisy copies and found that our analysis on linear regression can be qualitatively applied to neural networks.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2023EDP7008/_p

Copy

@ARTICLE{e106-d_9_1537,
author={Katsuyuki HAGIWARA, },
journal={IEICE TRANSACTIONS on Information},
title={On Gradient Descent Training Under Data Augmentation with On-Line Noisy Copies},
year={2023},
volume={E106-D},
number={9},
pages={1537-1545},
abstract={In machine learning, data augmentation (DA) is a technique for improving the generalization performance of models. In this paper, we mainly consider gradient descent of linear regression under DA using noisy copies of datasets, in which noise is injected into inputs. We analyze the situation where noisy copies are newly generated and injected into inputs at each epoch, i.e., the case of using on-line noisy copies. Therefore, this article can also be viewed as an analysis on a method using noise injection into a training process by DA. We considered the training process under three training situations which are the full-batch training under the sum of squared errors, and full-batch and mini-batch training under the mean squared error. We showed that, in all cases, training for DA with on-line copies is approximately equivalent to the l₂ regularization training for which variance of injected noise is important, whereas the number of copies is not. Moreover, we showed that DA with on-line copies apparently leads to an increase of learning rate in full-batch condition under the sum of squared errors and the mini-batch condition under the mean squared error. The apparent increase in learning rate and regularization effect can be attributed to the original input and additive noise in noisy copies, respectively. These results are confirmed in a numerical experiment in which we found that our result can be applied to usual off-line DA in an under-parameterization scenario and can not in an over-parametrization scenario. Moreover, we experimentally investigated the training process of neural networks under DA with off-line noisy copies and found that our analysis on linear regression can be qualitatively applied to neural networks.},
keywords={},
doi={10.1587/transinf.2023EDP7008},
ISSN={1745-1361},
month={September},}

Copy

TY - JOUR
TI - On Gradient Descent Training Under Data Augmentation with On-Line Noisy Copies
T2 - IEICE TRANSACTIONS on Information
SP - 1537
EP - 1545
AU - Katsuyuki HAGIWARA
PY - 2023
DO - 10.1587/transinf.2023EDP7008
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E106-D
IS - 9
JA - IEICE TRANSACTIONS on Information
Y1 - September 2023
AB - In machine learning, data augmentation (DA) is a technique for improving the generalization performance of models. In this paper, we mainly consider gradient descent of linear regression under DA using noisy copies of datasets, in which noise is injected into inputs. We analyze the situation where noisy copies are newly generated and injected into inputs at each epoch, i.e., the case of using on-line noisy copies. Therefore, this article can also be viewed as an analysis on a method using noise injection into a training process by DA. We considered the training process under three training situations which are the full-batch training under the sum of squared errors, and full-batch and mini-batch training under the mean squared error. We showed that, in all cases, training for DA with on-line copies is approximately equivalent to the l₂ regularization training for which variance of injected noise is important, whereas the number of copies is not. Moreover, we showed that DA with on-line copies apparently leads to an increase of learning rate in full-batch condition under the sum of squared errors and the mini-batch condition under the mean squared error. The apparent increase in learning rate and regularization effect can be attributed to the original input and additive noise in noisy copies, respectively. These results are confirmed in a numerical experiment in which we found that our result can be applied to usual off-line DA in an under-parameterization scenario and can not in an over-parametrization scenario. Moreover, we experimentally investigated the training process of neural networks under DA with off-line noisy copies and found that our analysis on linear regression can be qualitatively applied to neural networks.
ER -

IEICE TRANSACTIONS on Information

On Gradient Descent Training Under Data Augmentation with On-Line Noisy Copies

Summary :

Authors

Keyword

Latest Issue

Contents

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles

IEICE TRANSACTIONS on Information

On Gradient Descent Training Under Data Augmentation with On-Line Noisy Copies

Summary :

Authors

Keyword

Latest Issue

Contents

Copyrights notice of machine-translated contents

Cite this

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles