Instruction Prefetch for Improving GPGPU Performance

Jianli CAO; Zhikui CHEN; Yuxin WANG; He GUO; Pengcheng WANG

doi:10.1587/transfun.2020EAP1105

IEICE TRANSACTIONS on Fundamentals

Instruction Prefetch for Improving GPGPU Performance

Jianli CAO, Zhikui CHEN, Yuxin WANG, He GUO, Pengcheng WANG

Full Text Views

0

Cite this

Summary :

Like many processors, GPGPU suffers from memory wall. The traditional solution for this issue is to use efficient schedulers to hide long memory access latency or use data prefetch mech-anism to reduce the latency caused by data transfer. In this paper, we study the instruction fetch stage of GPU's pipeline and analyze the relationship between the capacity of GPU kernel and instruction miss rate. We improve the next line prefetch mechanism to fit the SIMT model of GPU and determine the optimal parameters of prefetch mechanism on GPU through experiments. The experimental result shows that the prefetch mechanism can achieve 12.17% performance improvement on average. Compared with the solution of enlarging I-Cache, prefetch mechanism has the advantages of more beneficiaries and lower cost.

Publication: IEICE TRANSACTIONS on Fundamentals Vol.E104-A No.5 pp.773-785

Publication Date: 2021/05/01

Publicized: 2020/11/16

Online ISSN: 1745-1337

DOI: 10.1587/transfun.2020EAP1105

Type of Manuscript: PAPER

Category: VLSI Design Technology and CAD

Authors

Jianli CAO
  Dalian University of Technology
Zhikui CHEN
  Dalian University of Technology
Yuxin WANG
  Dalian University of Technology
He GUO
  Dalian University of Technology
Pengcheng WANG
  Jianghuai College of Ahui University

Keyword

GPGPU, I-Cache, warp scheduler, instruction prefetch

Cite this

Copy

Jianli CAO, Zhikui CHEN, Yuxin WANG, He GUO, Pengcheng WANG, "Instruction Prefetch for Improving GPGPU Performance" in IEICE TRANSACTIONS on Fundamentals, vol. E104-A, no. 5, pp. 773-785, May 2021, doi: 10.1587/transfun.2020EAP1105.
Abstract: Like many processors, GPGPU suffers from memory wall. The traditional solution for this issue is to use efficient schedulers to hide long memory access latency or use data prefetch mech-anism to reduce the latency caused by data transfer. In this paper, we study the instruction fetch stage of GPU's pipeline and analyze the relationship between the capacity of GPU kernel and instruction miss rate. We improve the next line prefetch mechanism to fit the SIMT model of GPU and determine the optimal parameters of prefetch mechanism on GPU through experiments. The experimental result shows that the prefetch mechanism can achieve 12.17% performance improvement on average. Compared with the solution of enlarging I-Cache, prefetch mechanism has the advantages of more beneficiaries and lower cost.
URL: https://global.ieice.org/en_transactions/fundamentals/10.1587/transfun.2020EAP1105/_p

Copy

@ARTICLE{e104-a_5_773,
author={Jianli CAO, Zhikui CHEN, Yuxin WANG, He GUO, Pengcheng WANG, },
journal={IEICE TRANSACTIONS on Fundamentals},
title={Instruction Prefetch for Improving GPGPU Performance},
year={2021},
volume={E104-A},
number={5},
pages={773-785},
abstract={Like many processors, GPGPU suffers from memory wall. The traditional solution for this issue is to use efficient schedulers to hide long memory access latency or use data prefetch mech-anism to reduce the latency caused by data transfer. In this paper, we study the instruction fetch stage of GPU's pipeline and analyze the relationship between the capacity of GPU kernel and instruction miss rate. We improve the next line prefetch mechanism to fit the SIMT model of GPU and determine the optimal parameters of prefetch mechanism on GPU through experiments. The experimental result shows that the prefetch mechanism can achieve 12.17% performance improvement on average. Compared with the solution of enlarging I-Cache, prefetch mechanism has the advantages of more beneficiaries and lower cost.},
keywords={},
doi={10.1587/transfun.2020EAP1105},
ISSN={1745-1337},
month={May},}

Copy

TY - JOUR
TI - Instruction Prefetch for Improving GPGPU Performance
T2 - IEICE TRANSACTIONS on Fundamentals
SP - 773
EP - 785
AU - Jianli CAO
AU - Zhikui CHEN
AU - Yuxin WANG
AU - He GUO
AU - Pengcheng WANG
PY - 2021
DO - 10.1587/transfun.2020EAP1105
JO - IEICE TRANSACTIONS on Fundamentals
SN - 1745-1337
VL - E104-A
IS - 5
JA - IEICE TRANSACTIONS on Fundamentals
Y1 - May 2021
AB - Like many processors, GPGPU suffers from memory wall. The traditional solution for this issue is to use efficient schedulers to hide long memory access latency or use data prefetch mech-anism to reduce the latency caused by data transfer. In this paper, we study the instruction fetch stage of GPU's pipeline and analyze the relationship between the capacity of GPU kernel and instruction miss rate. We improve the next line prefetch mechanism to fit the SIMT model of GPU and determine the optimal parameters of prefetch mechanism on GPU through experiments. The experimental result shows that the prefetch mechanism can achieve 12.17% performance improvement on average. Compared with the solution of enlarging I-Cache, prefetch mechanism has the advantages of more beneficiaries and lower cost.
ER -

IEICE TRANSACTIONS on Fundamentals