This paper describes a VLSI processor architecture designed for a back-propagation accelerator. Three techniques are used to accelerate the simulation. The first is a multi-processor approach where a neural network simulation is suitable for parallel processing. By constructing a ring network using several processors, the simulation speed is multiplied by the number of the processors. The second technique is internal parallel processing. Each processor contains 4 multipliers and 4 ALUs that all work in parallel. The third technique is pipelining. The connections of eight functional units change according to the current stage of the back-propagation algorithm. Intermediate data is sent from one functional unit to another without being stored in extra registers and data is processed in a pipeline manner. The data is in 24-bit floating point format (18-bit mantissa and 6-bit oxponent). The chip has about 88,000 gates, including microcode ROM for processor control, the processor is designed using 0.8-µm CMOS gate arrays, and the estimated performance at 40 MHz is 20 million connection updates per second (MCUPS). For a ring network with 4 processors, performance can be enhanced up to 90 MCUPS.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Yoshio HIROSE, Hideaki ANBUTSU, Koichi YAMASHITA, Gensuke GOTO, "A VLSI Processor Architecture for a Back-Propagation Accelerator" in IEICE TRANSACTIONS on Electronics,
vol. E75-C, no. 10, pp. 1223-1231, October 1992, doi: .
Abstract: This paper describes a VLSI processor architecture designed for a back-propagation accelerator. Three techniques are used to accelerate the simulation. The first is a multi-processor approach where a neural network simulation is suitable for parallel processing. By constructing a ring network using several processors, the simulation speed is multiplied by the number of the processors. The second technique is internal parallel processing. Each processor contains 4 multipliers and 4 ALUs that all work in parallel. The third technique is pipelining. The connections of eight functional units change according to the current stage of the back-propagation algorithm. Intermediate data is sent from one functional unit to another without being stored in extra registers and data is processed in a pipeline manner. The data is in 24-bit floating point format (18-bit mantissa and 6-bit oxponent). The chip has about 88,000 gates, including microcode ROM for processor control, the processor is designed using 0.8-µm CMOS gate arrays, and the estimated performance at 40 MHz is 20 million connection updates per second (MCUPS). For a ring network with 4 processors, performance can be enhanced up to 90 MCUPS.
URL: https://global.ieice.org/en_transactions/electronics/10.1587/e75-c_10_1223/_p
Copy
@ARTICLE{e75-c_10_1223,
author={Yoshio HIROSE, Hideaki ANBUTSU, Koichi YAMASHITA, Gensuke GOTO, },
journal={IEICE TRANSACTIONS on Electronics},
title={A VLSI Processor Architecture for a Back-Propagation Accelerator},
year={1992},
volume={E75-C},
number={10},
pages={1223-1231},
abstract={This paper describes a VLSI processor architecture designed for a back-propagation accelerator. Three techniques are used to accelerate the simulation. The first is a multi-processor approach where a neural network simulation is suitable for parallel processing. By constructing a ring network using several processors, the simulation speed is multiplied by the number of the processors. The second technique is internal parallel processing. Each processor contains 4 multipliers and 4 ALUs that all work in parallel. The third technique is pipelining. The connections of eight functional units change according to the current stage of the back-propagation algorithm. Intermediate data is sent from one functional unit to another without being stored in extra registers and data is processed in a pipeline manner. The data is in 24-bit floating point format (18-bit mantissa and 6-bit oxponent). The chip has about 88,000 gates, including microcode ROM for processor control, the processor is designed using 0.8-µm CMOS gate arrays, and the estimated performance at 40 MHz is 20 million connection updates per second (MCUPS). For a ring network with 4 processors, performance can be enhanced up to 90 MCUPS.},
keywords={},
doi={},
ISSN={},
month={October},}
Copy
TY - JOUR
TI - A VLSI Processor Architecture for a Back-Propagation Accelerator
T2 - IEICE TRANSACTIONS on Electronics
SP - 1223
EP - 1231
AU - Yoshio HIROSE
AU - Hideaki ANBUTSU
AU - Koichi YAMASHITA
AU - Gensuke GOTO
PY - 1992
DO -
JO - IEICE TRANSACTIONS on Electronics
SN -
VL - E75-C
IS - 10
JA - IEICE TRANSACTIONS on Electronics
Y1 - October 1992
AB - This paper describes a VLSI processor architecture designed for a back-propagation accelerator. Three techniques are used to accelerate the simulation. The first is a multi-processor approach where a neural network simulation is suitable for parallel processing. By constructing a ring network using several processors, the simulation speed is multiplied by the number of the processors. The second technique is internal parallel processing. Each processor contains 4 multipliers and 4 ALUs that all work in parallel. The third technique is pipelining. The connections of eight functional units change according to the current stage of the back-propagation algorithm. Intermediate data is sent from one functional unit to another without being stored in extra registers and data is processed in a pipeline manner. The data is in 24-bit floating point format (18-bit mantissa and 6-bit oxponent). The chip has about 88,000 gates, including microcode ROM for processor control, the processor is designed using 0.8-µm CMOS gate arrays, and the estimated performance at 40 MHz is 20 million connection updates per second (MCUPS). For a ring network with 4 processors, performance can be enhanced up to 90 MCUPS.
ER -