Network-on-Chips (NoCs) are important components for scalable many-core processors. Because the performance of parallel applications is usually sensitive to the latency of NoCs, reducing it is a primary requirement. In this study, a compression router that hides the (de)compression-operation delay is proposed. The compression router (de)compresses the contents of the incoming packet before the switch arbitration is completed, thus shortening the packet length without latency penalty and reducing the network injection-and-ejection latency. Evaluation results show that the compression router improves up to 33% of the parallel application performance (conjugate gradients (CG), fast Fourier transform (FT), integer sort (IS), and traveling salesman problem (TSP)) and 63% of the effective network throughput by 1.8 compression ratio on NoC. The cost is an increase in router area and its energy consumption by 0.22mm2 and 1.6 times compared to the conventional virtual-channel router. Another finding is that off-loading the decompressor onto a network interface decreases the compression-router area by 57% at the expense of the moderate increase in communication latency.
Naoya NIWA
Keio University
Yoshiya SHIKAMA
Keio University
Hideharu AMANO
Keio University
Michihiro KOIBUCHI
National Institute of Informatics,PRESTO JST
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Naoya NIWA, Yoshiya SHIKAMA, Hideharu AMANO, Michihiro KOIBUCHI, "A Compression Router for Low-Latency Network-on-Chip" in IEICE TRANSACTIONS on Information,
vol. E106-D, no. 2, pp. 170-180, February 2023, doi: 10.1587/transinf.2022EDP7080.
Abstract: Network-on-Chips (NoCs) are important components for scalable many-core processors. Because the performance of parallel applications is usually sensitive to the latency of NoCs, reducing it is a primary requirement. In this study, a compression router that hides the (de)compression-operation delay is proposed. The compression router (de)compresses the contents of the incoming packet before the switch arbitration is completed, thus shortening the packet length without latency penalty and reducing the network injection-and-ejection latency. Evaluation results show that the compression router improves up to 33% of the parallel application performance (conjugate gradients (CG), fast Fourier transform (FT), integer sort (IS), and traveling salesman problem (TSP)) and 63% of the effective network throughput by 1.8 compression ratio on NoC. The cost is an increase in router area and its energy consumption by 0.22mm2 and 1.6 times compared to the conventional virtual-channel router. Another finding is that off-loading the decompressor onto a network interface decreases the compression-router area by 57% at the expense of the moderate increase in communication latency.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2022EDP7080/_p
Copy
@ARTICLE{e106-d_2_170,
author={Naoya NIWA, Yoshiya SHIKAMA, Hideharu AMANO, Michihiro KOIBUCHI, },
journal={IEICE TRANSACTIONS on Information},
title={A Compression Router for Low-Latency Network-on-Chip},
year={2023},
volume={E106-D},
number={2},
pages={170-180},
abstract={Network-on-Chips (NoCs) are important components for scalable many-core processors. Because the performance of parallel applications is usually sensitive to the latency of NoCs, reducing it is a primary requirement. In this study, a compression router that hides the (de)compression-operation delay is proposed. The compression router (de)compresses the contents of the incoming packet before the switch arbitration is completed, thus shortening the packet length without latency penalty and reducing the network injection-and-ejection latency. Evaluation results show that the compression router improves up to 33% of the parallel application performance (conjugate gradients (CG), fast Fourier transform (FT), integer sort (IS), and traveling salesman problem (TSP)) and 63% of the effective network throughput by 1.8 compression ratio on NoC. The cost is an increase in router area and its energy consumption by 0.22mm2 and 1.6 times compared to the conventional virtual-channel router. Another finding is that off-loading the decompressor onto a network interface decreases the compression-router area by 57% at the expense of the moderate increase in communication latency.},
keywords={},
doi={10.1587/transinf.2022EDP7080},
ISSN={1745-1361},
month={February},}
Copy
TY - JOUR
TI - A Compression Router for Low-Latency Network-on-Chip
T2 - IEICE TRANSACTIONS on Information
SP - 170
EP - 180
AU - Naoya NIWA
AU - Yoshiya SHIKAMA
AU - Hideharu AMANO
AU - Michihiro KOIBUCHI
PY - 2023
DO - 10.1587/transinf.2022EDP7080
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E106-D
IS - 2
JA - IEICE TRANSACTIONS on Information
Y1 - February 2023
AB - Network-on-Chips (NoCs) are important components for scalable many-core processors. Because the performance of parallel applications is usually sensitive to the latency of NoCs, reducing it is a primary requirement. In this study, a compression router that hides the (de)compression-operation delay is proposed. The compression router (de)compresses the contents of the incoming packet before the switch arbitration is completed, thus shortening the packet length without latency penalty and reducing the network injection-and-ejection latency. Evaluation results show that the compression router improves up to 33% of the parallel application performance (conjugate gradients (CG), fast Fourier transform (FT), integer sort (IS), and traveling salesman problem (TSP)) and 63% of the effective network throughput by 1.8 compression ratio on NoC. The cost is an increase in router area and its energy consumption by 0.22mm2 and 1.6 times compared to the conventional virtual-channel router. Another finding is that off-loading the decompressor onto a network interface decreases the compression-router area by 57% at the expense of the moderate increase in communication latency.
ER -