A 250-MIPS, 125-MFLOPS peak performance processing element (PE), which is being developed for an on-chip multiprocessor, has been modeled and evaluated. The PE includes the following new architecture components: an FPU shared by several IUs in order to increase the efficiency of the FPU pipelines, an on-chip data cache with a prefetch mechanism to reduce clock cycles waiting for memory, and an interface to high speed DRAM, such as Rambus DRAM and Synchronous DRAM. As a result, a PE model with an FPU shared by four or eight IUs causes only 10% performance reduction compared to a model with an un-shared FPU model while saving the cost of three FPUs. Furthermore, a PE model with prefetch operates 1.2 to 1.8 times faster than a model without prefetch at 250-MHz clock rate when the Rambus DRAM is connected. It becomes clear that this PE architecture can bring a high effective performance at over 250-MHz, and is cost-effective for the on-chip multiprocessor.
Masafumi TAKAHASHI
Hiroshige FUJII
Emi KANEKO
Takeshi YOSHIDA
Toshinori SATO
Hiroyuki TAKANO
Haruyuki TAGO
Seigo SUZUKI
Nobuyuki GOTO
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Masafumi TAKAHASHI, Hiroshige FUJII, Emi KANEKO, Takeshi YOSHIDA, Toshinori SATO, Hiroyuki TAKANO, Haruyuki TAGO, Seigo SUZUKI, Nobuyuki GOTO, "Performance Evaluation of a Processing Element for an On-Chip Multiprocessor" in IEICE TRANSACTIONS on Electronics,
vol. E77-C, no. 7, pp. 1092-1100, July 1994, doi: .
Abstract: A 250-MIPS, 125-MFLOPS peak performance processing element (PE), which is being developed for an on-chip multiprocessor, has been modeled and evaluated. The PE includes the following new architecture components: an FPU shared by several IUs in order to increase the efficiency of the FPU pipelines, an on-chip data cache with a prefetch mechanism to reduce clock cycles waiting for memory, and an interface to high speed DRAM, such as Rambus DRAM and Synchronous DRAM. As a result, a PE model with an FPU shared by four or eight IUs causes only 10% performance reduction compared to a model with an un-shared FPU model while saving the cost of three FPUs. Furthermore, a PE model with prefetch operates 1.2 to 1.8 times faster than a model without prefetch at 250-MHz clock rate when the Rambus DRAM is connected. It becomes clear that this PE architecture can bring a high effective performance at over 250-MHz, and is cost-effective for the on-chip multiprocessor.
URL: https://global.ieice.org/en_transactions/electronics/10.1587/e77-c_7_1092/_p
Copy
@ARTICLE{e77-c_7_1092,
author={Masafumi TAKAHASHI, Hiroshige FUJII, Emi KANEKO, Takeshi YOSHIDA, Toshinori SATO, Hiroyuki TAKANO, Haruyuki TAGO, Seigo SUZUKI, Nobuyuki GOTO, },
journal={IEICE TRANSACTIONS on Electronics},
title={Performance Evaluation of a Processing Element for an On-Chip Multiprocessor},
year={1994},
volume={E77-C},
number={7},
pages={1092-1100},
abstract={A 250-MIPS, 125-MFLOPS peak performance processing element (PE), which is being developed for an on-chip multiprocessor, has been modeled and evaluated. The PE includes the following new architecture components: an FPU shared by several IUs in order to increase the efficiency of the FPU pipelines, an on-chip data cache with a prefetch mechanism to reduce clock cycles waiting for memory, and an interface to high speed DRAM, such as Rambus DRAM and Synchronous DRAM. As a result, a PE model with an FPU shared by four or eight IUs causes only 10% performance reduction compared to a model with an un-shared FPU model while saving the cost of three FPUs. Furthermore, a PE model with prefetch operates 1.2 to 1.8 times faster than a model without prefetch at 250-MHz clock rate when the Rambus DRAM is connected. It becomes clear that this PE architecture can bring a high effective performance at over 250-MHz, and is cost-effective for the on-chip multiprocessor.},
keywords={},
doi={},
ISSN={},
month={July},}
Copy
TY - JOUR
TI - Performance Evaluation of a Processing Element for an On-Chip Multiprocessor
T2 - IEICE TRANSACTIONS on Electronics
SP - 1092
EP - 1100
AU - Masafumi TAKAHASHI
AU - Hiroshige FUJII
AU - Emi KANEKO
AU - Takeshi YOSHIDA
AU - Toshinori SATO
AU - Hiroyuki TAKANO
AU - Haruyuki TAGO
AU - Seigo SUZUKI
AU - Nobuyuki GOTO
PY - 1994
DO -
JO - IEICE TRANSACTIONS on Electronics
SN -
VL - E77-C
IS - 7
JA - IEICE TRANSACTIONS on Electronics
Y1 - July 1994
AB - A 250-MIPS, 125-MFLOPS peak performance processing element (PE), which is being developed for an on-chip multiprocessor, has been modeled and evaluated. The PE includes the following new architecture components: an FPU shared by several IUs in order to increase the efficiency of the FPU pipelines, an on-chip data cache with a prefetch mechanism to reduce clock cycles waiting for memory, and an interface to high speed DRAM, such as Rambus DRAM and Synchronous DRAM. As a result, a PE model with an FPU shared by four or eight IUs causes only 10% performance reduction compared to a model with an un-shared FPU model while saving the cost of three FPUs. Furthermore, a PE model with prefetch operates 1.2 to 1.8 times faster than a model without prefetch at 250-MHz clock rate when the Rambus DRAM is connected. It becomes clear that this PE architecture can bring a high effective performance at over 250-MHz, and is cost-effective for the on-chip multiprocessor.
ER -