The search functionality is under construction.
The search functionality is under construction.

Exploiting the Task-Pipelined Parallelism of Stream Programs on Many-Core GPUs

Shuai MU, Dongdong LI, Yubei CHEN, Yangdong DENG, Zhihua WANG

  • Full Text Views

    0

  • Cite this

Summary :

By exploiting data-level parallelism, Graphics Processing Units (GPUs) have become a high-throughput, general purpose computing platform. Many real-world applications especially those following a stream processing pattern, however, feature interleaved task-pipelined and data parallelism. Current GPUs are ill equipped for such applications due to the insufficient usage of computing resources and/or the excessive off-chip memory traffic. In this paper, we focus on microarchitectural enhancements to enable task-pipelined execution of data-parallel kernels on GPUs. We propose an efficient adaptive dynamic scheduling mechanism and a moderately modified L2 design. With minor hardware overhead, our techniques orchestrate both task-pipeline and data parallelisms in a unified manner. Simulation results derived by a cycle-accurate simulator on real-world applications prove that the proposed GPU microarchitecture improves the computing throughput by 18% and reduces the overall accesses to off-chip GPU memory by 13%.

Publication
IEICE TRANSACTIONS on Information Vol.E96-D No.10 pp.2194-2207
Publication Date
2013/10/01
Publicized
Online ISSN
1745-1361
DOI
10.1587/transinf.E96.D.2194
Type of Manuscript
PAPER
Category
Computer System

Authors

Shuai MU
  Tsinghua University
Dongdong LI
  Tsinghua University
Yubei CHEN
  Tsinghua University
Yangdong DENG
  Tsinghua University
Zhihua WANG
  Tsinghua University

Keyword