GPU-Chariot: A Programming Framework for Stream Applications Running on Multi-GPU Systems

Fumihiko INO; Shinta NAKAGAWA; Kenichi HAGIHARA

doi:10.1587/transinf.E96.D.2604

GPU-Chariot: A Programming Framework for Stream Applications Running on Multi-GPU Systems

Fumihiko INO, Shinta NAKAGAWA, Kenichi HAGIHARA

Full Text Views

0

Cite this

Summary :

This paper presents a stream programming framework, named GPU-chariot, for accelerating stream applications running on graphics processing units (GPUs). The main contribution of our framework is that it realizes efficient software pipelines on multi-GPU systems by enabling out-of-order execution of CPU functions, kernels, and data transfers. To achieve this out-of-order execution, we apply a runtime scheduler that not only maximizes the utilization of system resources but also encapsulates the number of GPUs available in the system. In addition, we implement a load-balancing capability to flow data efficiently through multiple GPUs. Furthermore, a callback interface enables overlapping execution of functions in third-party libraries. By using kernels with different performance bottlenecks, we show that our out-of-order execution is up to 20% faster than in-order execution. Finally, we conduct several case studies on a 4-GPU system and demonstrate the advantages of GPU-chariot over a manually pipelined code. We conclude that GPU-chariot can be useful when developing stream applications with software pipelines on multiple GPUs and CPUs.

Publication: IEICE TRANSACTIONS on Information Vol.E96-D No.12 pp.2604-2616

Publication Date: 2013/12/01

Publicized

Online ISSN: 1745-1361

DOI: 10.1587/transinf.E96.D.2604

Type of Manuscript: Special Section PAPER (Special Section on Parallel and Distributed Computing and Networking)

Category

Authors

Fumihiko INO
  Osaka University
Shinta NAKAGAWA
  NEC Corporation
Kenichi HAGIHARA
  Osaka University

Keyword

stream processing, GPGPU, CUDA, task scheduling

Cite this

Copy

Fumihiko INO, Shinta NAKAGAWA, Kenichi HAGIHARA, "GPU-Chariot: A Programming Framework for Stream Applications Running on Multi-GPU Systems" in IEICE TRANSACTIONS on Information, vol. E96-D, no. 12, pp. 2604-2616, December 2013, doi: 10.1587/transinf.E96.D.2604.
Abstract: This paper presents a stream programming framework, named GPU-chariot, for accelerating stream applications running on graphics processing units (GPUs). The main contribution of our framework is that it realizes efficient software pipelines on multi-GPU systems by enabling out-of-order execution of CPU functions, kernels, and data transfers. To achieve this out-of-order execution, we apply a runtime scheduler that not only maximizes the utilization of system resources but also encapsulates the number of GPUs available in the system. In addition, we implement a load-balancing capability to flow data efficiently through multiple GPUs. Furthermore, a callback interface enables overlapping execution of functions in third-party libraries. By using kernels with different performance bottlenecks, we show that our out-of-order execution is up to 20% faster than in-order execution. Finally, we conduct several case studies on a 4-GPU system and demonstrate the advantages of GPU-chariot over a manually pipelined code. We conclude that GPU-chariot can be useful when developing stream applications with software pipelines on multiple GPUs and CPUs.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.E96.D.2604/_p

Copy

@ARTICLE{e96-d_12_2604,
author={Fumihiko INO, Shinta NAKAGAWA, Kenichi HAGIHARA, },
journal={IEICE TRANSACTIONS on Information},
title={GPU-Chariot: A Programming Framework for Stream Applications Running on Multi-GPU Systems},
year={2013},
volume={E96-D},
number={12},
pages={2604-2616},
abstract={This paper presents a stream programming framework, named GPU-chariot, for accelerating stream applications running on graphics processing units (GPUs). The main contribution of our framework is that it realizes efficient software pipelines on multi-GPU systems by enabling out-of-order execution of CPU functions, kernels, and data transfers. To achieve this out-of-order execution, we apply a runtime scheduler that not only maximizes the utilization of system resources but also encapsulates the number of GPUs available in the system. In addition, we implement a load-balancing capability to flow data efficiently through multiple GPUs. Furthermore, a callback interface enables overlapping execution of functions in third-party libraries. By using kernels with different performance bottlenecks, we show that our out-of-order execution is up to 20% faster than in-order execution. Finally, we conduct several case studies on a 4-GPU system and demonstrate the advantages of GPU-chariot over a manually pipelined code. We conclude that GPU-chariot can be useful when developing stream applications with software pipelines on multiple GPUs and CPUs.},
keywords={},
doi={10.1587/transinf.E96.D.2604},
ISSN={1745-1361},
month={December},}

Copy

TY - JOUR
TI - GPU-Chariot: A Programming Framework for Stream Applications Running on Multi-GPU Systems
T2 - IEICE TRANSACTIONS on Information
SP - 2604
EP - 2616
AU - Fumihiko INO
AU - Shinta NAKAGAWA
AU - Kenichi HAGIHARA
PY - 2013
DO - 10.1587/transinf.E96.D.2604
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E96-D
IS - 12
JA - IEICE TRANSACTIONS on Information
Y1 - December 2013
AB - This paper presents a stream programming framework, named GPU-chariot, for accelerating stream applications running on graphics processing units (GPUs). The main contribution of our framework is that it realizes efficient software pipelines on multi-GPU systems by enabling out-of-order execution of CPU functions, kernels, and data transfers. To achieve this out-of-order execution, we apply a runtime scheduler that not only maximizes the utilization of system resources but also encapsulates the number of GPUs available in the system. In addition, we implement a load-balancing capability to flow data efficiently through multiple GPUs. Furthermore, a callback interface enables overlapping execution of functions in third-party libraries. By using kernels with different performance bottlenecks, we show that our out-of-order execution is up to 20% faster than in-order execution. Finally, we conduct several case studies on a 4-GPU system and demonstrate the advantages of GPU-chariot over a manually pipelined code. We conclude that GPU-chariot can be useful when developing stream applications with software pipelines on multiple GPUs and CPUs.
ER -