The most commonly used activation function in Backpropagation learning is sigmoidal while linear function is also sometimes used at the output layer with the view that choice between these activation functions does not make considerable differences in network's performance. In this letter, we show distinct performance between a network with linear output units and a similar network with sigmoid output units in terms of convergence behavior and generalization ability. We experimented with two types of cost functions, namely, sum-squared error used in standard Backpropagation and log-likelihood recently reported. We find that, with sum-squared error cost function and hidden units with nonsteep sigmoid function, use of linear units at the output layer instead of sigmoidal ones accelerates the convergence speed considerably while generalization ability is slightly degraded. Network with sigmoid output units trained by log-likelihood cost function yields even faster convergence and better generalization but does not converge at all with linear output units. It is also shown that a network with linear output units needs more hidden units for convergence.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Joarder KAMRUZZAMAN, Yukio KUMAGAI, Hiromitsu HIKITA, "Comparison of Convergence Behavior and Generalization Ability in Backpropagation Learning with Linear and Sigmoid Output Units" in IEICE TRANSACTIONS on Fundamentals,
vol. E76-A, no. 6, pp. 1035-1042, June 1993, doi: .
Abstract: The most commonly used activation function in Backpropagation learning is sigmoidal while linear function is also sometimes used at the output layer with the view that choice between these activation functions does not make considerable differences in network's performance. In this letter, we show distinct performance between a network with linear output units and a similar network with sigmoid output units in terms of convergence behavior and generalization ability. We experimented with two types of cost functions, namely, sum-squared error used in standard Backpropagation and log-likelihood recently reported. We find that, with sum-squared error cost function and hidden units with nonsteep sigmoid function, use of linear units at the output layer instead of sigmoidal ones accelerates the convergence speed considerably while generalization ability is slightly degraded. Network with sigmoid output units trained by log-likelihood cost function yields even faster convergence and better generalization but does not converge at all with linear output units. It is also shown that a network with linear output units needs more hidden units for convergence.
URL: https://global.ieice.org/en_transactions/fundamentals/10.1587/e76-a_6_1035/_p
Copy
@ARTICLE{e76-a_6_1035,
author={Joarder KAMRUZZAMAN, Yukio KUMAGAI, Hiromitsu HIKITA, },
journal={IEICE TRANSACTIONS on Fundamentals},
title={Comparison of Convergence Behavior and Generalization Ability in Backpropagation Learning with Linear and Sigmoid Output Units},
year={1993},
volume={E76-A},
number={6},
pages={1035-1042},
abstract={The most commonly used activation function in Backpropagation learning is sigmoidal while linear function is also sometimes used at the output layer with the view that choice between these activation functions does not make considerable differences in network's performance. In this letter, we show distinct performance between a network with linear output units and a similar network with sigmoid output units in terms of convergence behavior and generalization ability. We experimented with two types of cost functions, namely, sum-squared error used in standard Backpropagation and log-likelihood recently reported. We find that, with sum-squared error cost function and hidden units with nonsteep sigmoid function, use of linear units at the output layer instead of sigmoidal ones accelerates the convergence speed considerably while generalization ability is slightly degraded. Network with sigmoid output units trained by log-likelihood cost function yields even faster convergence and better generalization but does not converge at all with linear output units. It is also shown that a network with linear output units needs more hidden units for convergence.},
keywords={},
doi={},
ISSN={},
month={June},}
Copy
TY - JOUR
TI - Comparison of Convergence Behavior and Generalization Ability in Backpropagation Learning with Linear and Sigmoid Output Units
T2 - IEICE TRANSACTIONS on Fundamentals
SP - 1035
EP - 1042
AU - Joarder KAMRUZZAMAN
AU - Yukio KUMAGAI
AU - Hiromitsu HIKITA
PY - 1993
DO -
JO - IEICE TRANSACTIONS on Fundamentals
SN -
VL - E76-A
IS - 6
JA - IEICE TRANSACTIONS on Fundamentals
Y1 - June 1993
AB - The most commonly used activation function in Backpropagation learning is sigmoidal while linear function is also sometimes used at the output layer with the view that choice between these activation functions does not make considerable differences in network's performance. In this letter, we show distinct performance between a network with linear output units and a similar network with sigmoid output units in terms of convergence behavior and generalization ability. We experimented with two types of cost functions, namely, sum-squared error used in standard Backpropagation and log-likelihood recently reported. We find that, with sum-squared error cost function and hidden units with nonsteep sigmoid function, use of linear units at the output layer instead of sigmoidal ones accelerates the convergence speed considerably while generalization ability is slightly degraded. Network with sigmoid output units trained by log-likelihood cost function yields even faster convergence and better generalization but does not converge at all with linear output units. It is also shown that a network with linear output units needs more hidden units for convergence.
ER -