Like many processors, GPGPU suffers from memory wall. The traditional solution for this issue is to use efficient schedulers to hide long memory access latency or use data prefetch mech-anism to reduce the latency caused by data transfer. In this paper, we study the instruction fetch stage of GPU's pipeline and analyze the relationship between the capacity of GPU kernel and instruction miss rate. We improve the next line prefetch mechanism to fit the SIMT model of GPU and determine the optimal parameters of prefetch mechanism on GPU through experiments. The experimental result shows that the prefetch mechanism can achieve 12.17% performance improvement on average. Compared with the solution of enlarging I-Cache, prefetch mechanism has the advantages of more beneficiaries and lower cost.
Jianli CAO
Dalian University of Technology
Zhikui CHEN
Dalian University of Technology
Yuxin WANG
Dalian University of Technology
He GUO
Dalian University of Technology
Pengcheng WANG
Jianghuai College of Ahui University
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Jianli CAO, Zhikui CHEN, Yuxin WANG, He GUO, Pengcheng WANG, "Instruction Prefetch for Improving GPGPU Performance" in IEICE TRANSACTIONS on Fundamentals,
vol. E104-A, no. 5, pp. 773-785, May 2021, doi: 10.1587/transfun.2020EAP1105.
Abstract: Like many processors, GPGPU suffers from memory wall. The traditional solution for this issue is to use efficient schedulers to hide long memory access latency or use data prefetch mech-anism to reduce the latency caused by data transfer. In this paper, we study the instruction fetch stage of GPU's pipeline and analyze the relationship between the capacity of GPU kernel and instruction miss rate. We improve the next line prefetch mechanism to fit the SIMT model of GPU and determine the optimal parameters of prefetch mechanism on GPU through experiments. The experimental result shows that the prefetch mechanism can achieve 12.17% performance improvement on average. Compared with the solution of enlarging I-Cache, prefetch mechanism has the advantages of more beneficiaries and lower cost.
URL: https://global.ieice.org/en_transactions/fundamentals/10.1587/transfun.2020EAP1105/_p
Copy
@ARTICLE{e104-a_5_773,
author={Jianli CAO, Zhikui CHEN, Yuxin WANG, He GUO, Pengcheng WANG, },
journal={IEICE TRANSACTIONS on Fundamentals},
title={Instruction Prefetch for Improving GPGPU Performance},
year={2021},
volume={E104-A},
number={5},
pages={773-785},
abstract={Like many processors, GPGPU suffers from memory wall. The traditional solution for this issue is to use efficient schedulers to hide long memory access latency or use data prefetch mech-anism to reduce the latency caused by data transfer. In this paper, we study the instruction fetch stage of GPU's pipeline and analyze the relationship between the capacity of GPU kernel and instruction miss rate. We improve the next line prefetch mechanism to fit the SIMT model of GPU and determine the optimal parameters of prefetch mechanism on GPU through experiments. The experimental result shows that the prefetch mechanism can achieve 12.17% performance improvement on average. Compared with the solution of enlarging I-Cache, prefetch mechanism has the advantages of more beneficiaries and lower cost.},
keywords={},
doi={10.1587/transfun.2020EAP1105},
ISSN={1745-1337},
month={May},}
Copy
TY - JOUR
TI - Instruction Prefetch for Improving GPGPU Performance
T2 - IEICE TRANSACTIONS on Fundamentals
SP - 773
EP - 785
AU - Jianli CAO
AU - Zhikui CHEN
AU - Yuxin WANG
AU - He GUO
AU - Pengcheng WANG
PY - 2021
DO - 10.1587/transfun.2020EAP1105
JO - IEICE TRANSACTIONS on Fundamentals
SN - 1745-1337
VL - E104-A
IS - 5
JA - IEICE TRANSACTIONS on Fundamentals
Y1 - May 2021
AB - Like many processors, GPGPU suffers from memory wall. The traditional solution for this issue is to use efficient schedulers to hide long memory access latency or use data prefetch mech-anism to reduce the latency caused by data transfer. In this paper, we study the instruction fetch stage of GPU's pipeline and analyze the relationship between the capacity of GPU kernel and instruction miss rate. We improve the next line prefetch mechanism to fit the SIMT model of GPU and determine the optimal parameters of prefetch mechanism on GPU through experiments. The experimental result shows that the prefetch mechanism can achieve 12.17% performance improvement on average. Compared with the solution of enlarging I-Cache, prefetch mechanism has the advantages of more beneficiaries and lower cost.
ER -