Packet Processing Architecture with Off-Chip Last Level Cache Using Interleaved 3D-Stacked DRAM

Tomohiro KORIKAWA; Akio KAWABATA; Fujun HE; Eiji OKI

doi:10.1587/transcom.2020EBP3017

IEICE TRANSACTIONS on Communications

Open Access
Packet Processing Architecture with Off-Chip Last Level Cache Using Interleaved 3D-Stacked DRAM

Tomohiro KORIKAWA, Akio KAWABATA, Fujun HE, Eiji OKI

Full Text Views

58

Cite this

Free PDF (1.8MB)

Summary :

The performance of packet processing applications is dependent on the memory access speed of network systems. Table lookup requires fast memory access and is one of the most common processes in various packet processing applications, which can be a dominant performance bottleneck. Therefore, in Network Function Virtualization (NFV)-aware environments, on-chip fast cache memories of a CPU of general-purpose hardware become critical to achieve high performance packet processing speeds of over tens of Gbps. Also, multiple types of applications and complex applications are executed in the same system simultaneously in carrier network systems, which require adequate cache memory capacities as well. In this paper, we propose a packet processing architecture that utilizes interleaved 3 Dimensional (3D)-stacked Dynamic Random Access Memory (DRAM) devices as off-chip Last Level Cache (LLC) in addition to several levels of dedicated cache memories of each CPU core. Entries of a lookup table are distributed in every bank and vault to utilize both bank interleaving and vault-level memory parallelism. Frequently accessed entries in 3D-stacked DRAM are also cached in on-chip dedicated cache memories of each CPU core. The evaluation results show that the proposed architecture reduces the memory access latency by 57%, and increases the throughput by 100% while reducing the blocking probability but about 10% compared to the architecture with shared on-chip LLC. These results indicate that 3D-stacked DRAM can be practical as off-chip LLC in parallel packet processing systems.

Publication: IEICE TRANSACTIONS on Communications Vol.E104-B No.2 pp.149-157

Publication Date: 2021/02/01

Publicized: 2020/08/06

Online ISSN: 1745-1345

DOI: 10.1587/transcom.2020EBP3017

Type of Manuscript: PAPER

Category: Network System

Authors

Tomohiro KORIKAWA
  NTT Corporation
Akio KAWABATA
  NTT Corporation
Fujun HE
  Kyoto University
Eiji OKI
  Kyoto University

Keyword

cache memory, communication system, memory architecture, network function virtualization

Cite this

Copy

Tomohiro KORIKAWA, Akio KAWABATA, Fujun HE, Eiji OKI, "Packet Processing Architecture with Off-Chip Last Level Cache Using Interleaved 3D-Stacked DRAM" in IEICE TRANSACTIONS on Communications, vol. E104-B, no. 2, pp. 149-157, February 2021, doi: 10.1587/transcom.2020EBP3017.
Abstract: The performance of packet processing applications is dependent on the memory access speed of network systems. Table lookup requires fast memory access and is one of the most common processes in various packet processing applications, which can be a dominant performance bottleneck. Therefore, in Network Function Virtualization (NFV)-aware environments, on-chip fast cache memories of a CPU of general-purpose hardware become critical to achieve high performance packet processing speeds of over tens of Gbps. Also, multiple types of applications and complex applications are executed in the same system simultaneously in carrier network systems, which require adequate cache memory capacities as well. In this paper, we propose a packet processing architecture that utilizes interleaved 3 Dimensional (3D)-stacked Dynamic Random Access Memory (DRAM) devices as off-chip Last Level Cache (LLC) in addition to several levels of dedicated cache memories of each CPU core. Entries of a lookup table are distributed in every bank and vault to utilize both bank interleaving and vault-level memory parallelism. Frequently accessed entries in 3D-stacked DRAM are also cached in on-chip dedicated cache memories of each CPU core. The evaluation results show that the proposed architecture reduces the memory access latency by 57%, and increases the throughput by 100% while reducing the blocking probability but about 10% compared to the architecture with shared on-chip LLC. These results indicate that 3D-stacked DRAM can be practical as off-chip LLC in parallel packet processing systems.
URL: https://global.ieice.org/en_transactions/communications/10.1587/transcom.2020EBP3017/_p

Copy

@ARTICLE{e104-b_2_149,
author={Tomohiro KORIKAWA, Akio KAWABATA, Fujun HE, Eiji OKI, },
journal={IEICE TRANSACTIONS on Communications},
title={Packet Processing Architecture with Off-Chip Last Level Cache Using Interleaved 3D-Stacked DRAM},
year={2021},
volume={E104-B},
number={2},
pages={149-157},
abstract={The performance of packet processing applications is dependent on the memory access speed of network systems. Table lookup requires fast memory access and is one of the most common processes in various packet processing applications, which can be a dominant performance bottleneck. Therefore, in Network Function Virtualization (NFV)-aware environments, on-chip fast cache memories of a CPU of general-purpose hardware become critical to achieve high performance packet processing speeds of over tens of Gbps. Also, multiple types of applications and complex applications are executed in the same system simultaneously in carrier network systems, which require adequate cache memory capacities as well. In this paper, we propose a packet processing architecture that utilizes interleaved 3 Dimensional (3D)-stacked Dynamic Random Access Memory (DRAM) devices as off-chip Last Level Cache (LLC) in addition to several levels of dedicated cache memories of each CPU core. Entries of a lookup table are distributed in every bank and vault to utilize both bank interleaving and vault-level memory parallelism. Frequently accessed entries in 3D-stacked DRAM are also cached in on-chip dedicated cache memories of each CPU core. The evaluation results show that the proposed architecture reduces the memory access latency by 57%, and increases the throughput by 100% while reducing the blocking probability but about 10% compared to the architecture with shared on-chip LLC. These results indicate that 3D-stacked DRAM can be practical as off-chip LLC in parallel packet processing systems.},
keywords={},
doi={10.1587/transcom.2020EBP3017},
ISSN={1745-1345},
month={February},}

Copy

TY - JOUR
TI - Packet Processing Architecture with Off-Chip Last Level Cache Using Interleaved 3D-Stacked DRAM
T2 - IEICE TRANSACTIONS on Communications
SP - 149
EP - 157
AU - Tomohiro KORIKAWA
AU - Akio KAWABATA
AU - Fujun HE
AU - Eiji OKI
PY - 2021
DO - 10.1587/transcom.2020EBP3017
JO - IEICE TRANSACTIONS on Communications
SN - 1745-1345
VL - E104-B
IS - 2
JA - IEICE TRANSACTIONS on Communications
Y1 - February 2021
AB - The performance of packet processing applications is dependent on the memory access speed of network systems. Table lookup requires fast memory access and is one of the most common processes in various packet processing applications, which can be a dominant performance bottleneck. Therefore, in Network Function Virtualization (NFV)-aware environments, on-chip fast cache memories of a CPU of general-purpose hardware become critical to achieve high performance packet processing speeds of over tens of Gbps. Also, multiple types of applications and complex applications are executed in the same system simultaneously in carrier network systems, which require adequate cache memory capacities as well. In this paper, we propose a packet processing architecture that utilizes interleaved 3 Dimensional (3D)-stacked Dynamic Random Access Memory (DRAM) devices as off-chip Last Level Cache (LLC) in addition to several levels of dedicated cache memories of each CPU core. Entries of a lookup table are distributed in every bank and vault to utilize both bank interleaving and vault-level memory parallelism. Frequently accessed entries in 3D-stacked DRAM are also cached in on-chip dedicated cache memories of each CPU core. The evaluation results show that the proposed architecture reduces the memory access latency by 57%, and increases the throughput by 100% while reducing the blocking probability but about 10% compared to the architecture with shared on-chip LLC. These results indicate that 3D-stacked DRAM can be practical as off-chip LLC in parallel packet processing systems.
ER -

IEICE TRANSACTIONS on Communications