A VLSI Processor Architecture for a Back-Propagation Accelerator

Yoshio HIROSE; Hideaki ANBUTSU; Koichi YAMASHITA; Gensuke GOTO

A VLSI Processor Architecture for a Back-Propagation Accelerator

Yoshio HIROSE, Hideaki ANBUTSU, Koichi YAMASHITA, Gensuke GOTO

Full Text Views

0

Cite this

Summary :

This paper describes a VLSI processor architecture designed for a back-propagation accelerator. Three techniques are used to accelerate the simulation. The first is a multi-processor approach where a neural network simulation is suitable for parallel processing. By constructing a ring network using several processors, the simulation speed is multiplied by the number of the processors. The second technique is internal parallel processing. Each processor contains 4 multipliers and 4 ALUs that all work in parallel. The third technique is pipelining. The connections of eight functional units change according to the current stage of the back-propagation algorithm. Intermediate data is sent from one functional unit to another without being stored in extra registers and data is processed in a pipeline manner. The data is in 24-bit floating point format (18-bit mantissa and 6-bit oxponent). The chip has about 88,000 gates, including microcode ROM for processor control, the processor is designed using 0.8-µm CMOS gate arrays, and the estimated performance at 40 MHz is 20 million connection updates per second (MCUPS). For a ring network with 4 processors, performance can be enhanced up to 90 MCUPS.

Publication: IEICE TRANSACTIONS on Electronics Vol.E75-C No.10 pp.1223-1231

Publication Date: 1992/10/25

Publicized

Online ISSN

DOI

Type of Manuscript: Special Section PAPER (Special Issue on Microprocessors)

Category: Application Specific Processors

Cite this

Copy

Yoshio HIROSE, Hideaki ANBUTSU, Koichi YAMASHITA, Gensuke GOTO, "A VLSI Processor Architecture for a Back-Propagation Accelerator" in IEICE TRANSACTIONS on Electronics, vol. E75-C, no. 10, pp. 1223-1231, October 1992, doi: .
Abstract: This paper describes a VLSI processor architecture designed for a back-propagation accelerator. Three techniques are used to accelerate the simulation. The first is a multi-processor approach where a neural network simulation is suitable for parallel processing. By constructing a ring network using several processors, the simulation speed is multiplied by the number of the processors. The second technique is internal parallel processing. Each processor contains 4 multipliers and 4 ALUs that all work in parallel. The third technique is pipelining. The connections of eight functional units change according to the current stage of the back-propagation algorithm. Intermediate data is sent from one functional unit to another without being stored in extra registers and data is processed in a pipeline manner. The data is in 24-bit floating point format (18-bit mantissa and 6-bit oxponent). The chip has about 88,000 gates, including microcode ROM for processor control, the processor is designed using 0.8-µm CMOS gate arrays, and the estimated performance at 40 MHz is 20 million connection updates per second (MCUPS). For a ring network with 4 processors, performance can be enhanced up to 90 MCUPS.
URL: https://global.ieice.org/en_transactions/electronics/10.1587/e75-c_10_1223/_p

Copy

@ARTICLE{e75-c_10_1223,
author={Yoshio HIROSE, Hideaki ANBUTSU, Koichi YAMASHITA, Gensuke GOTO, },
journal={IEICE TRANSACTIONS on Electronics},
title={A VLSI Processor Architecture for a Back-Propagation Accelerator},
year={1992},
volume={E75-C},
number={10},
pages={1223-1231},
abstract={This paper describes a VLSI processor architecture designed for a back-propagation accelerator. Three techniques are used to accelerate the simulation. The first is a multi-processor approach where a neural network simulation is suitable for parallel processing. By constructing a ring network using several processors, the simulation speed is multiplied by the number of the processors. The second technique is internal parallel processing. Each processor contains 4 multipliers and 4 ALUs that all work in parallel. The third technique is pipelining. The connections of eight functional units change according to the current stage of the back-propagation algorithm. Intermediate data is sent from one functional unit to another without being stored in extra registers and data is processed in a pipeline manner. The data is in 24-bit floating point format (18-bit mantissa and 6-bit oxponent). The chip has about 88,000 gates, including microcode ROM for processor control, the processor is designed using 0.8-µm CMOS gate arrays, and the estimated performance at 40 MHz is 20 million connection updates per second (MCUPS). For a ring network with 4 processors, performance can be enhanced up to 90 MCUPS.},
keywords={},
doi={},
ISSN={},
month={October},}

Copy

TY - JOUR
TI - A VLSI Processor Architecture for a Back-Propagation Accelerator
T2 - IEICE TRANSACTIONS on Electronics
SP - 1223
EP - 1231
AU - Yoshio HIROSE
AU - Hideaki ANBUTSU
AU - Koichi YAMASHITA
AU - Gensuke GOTO
PY - 1992
DO -
JO - IEICE TRANSACTIONS on Electronics
SN -
VL - E75-C
IS - 10
JA - IEICE TRANSACTIONS on Electronics
Y1 - October 1992
AB - This paper describes a VLSI processor architecture designed for a back-propagation accelerator. Three techniques are used to accelerate the simulation. The first is a multi-processor approach where a neural network simulation is suitable for parallel processing. By constructing a ring network using several processors, the simulation speed is multiplied by the number of the processors. The second technique is internal parallel processing. Each processor contains 4 multipliers and 4 ALUs that all work in parallel. The third technique is pipelining. The connections of eight functional units change according to the current stage of the back-propagation algorithm. Intermediate data is sent from one functional unit to another without being stored in extra registers and data is processed in a pipeline manner. The data is in 24-bit floating point format (18-bit mantissa and 6-bit oxponent). The chip has about 88,000 gates, including microcode ROM for processor control, the processor is designed using 0.8-µm CMOS gate arrays, and the estimated performance at 40 MHz is 20 million connection updates per second (MCUPS). For a ring network with 4 processors, performance can be enhanced up to 90 MCUPS.
ER -