Cache-Aware GPU Optimization for Out-of-Core Cone Beam CT Reconstruction of High-Resolution Volumes

Yuechao LU; Fumihiko INO; Kenichi HAGIHARA

doi:10.1587/transinf.2016EDP7174

Cache-Aware GPU Optimization for Out-of-Core Cone Beam CT Reconstruction of High-Resolution Volumes

Yuechao LU, Fumihiko INO, Kenichi HAGIHARA

Full Text Views

0

Cite this

Summary :

This paper proposes a cache-aware optimization method to accelerate out-of-core cone beam computed tomography reconstruction on a graphics processing unit (GPU) device. Our proposed method extends a previous method by increasing the cache hit rate so as to speed up the reconstruction of high-resolution volumes that exceed the capacity of device memory. More specifically, our approach accelerates the well-known Feldkamp-Davis-Kress algorithm by utilizing the following three strategies: (1) a loop organization strategy that identifies the best tradeoff point between the cache hit rate and the number of off-chip memory accesses; (2) a data structure that exploits high locality within a layered texture; and (3) a fully pipelined strategy for hiding file input/output (I/O) time with GPU execution and data transfer times. We implement our proposed method on NVIDIA's latest Maxwell architecture and provide tuning guidelines for adjusting the execution parameters, which include the granularity and shape of thread blocks as well as the granularity of I/O data to be streamed through the pipeline, which maximizes reconstruction performance. Our experimental results show that it took less than three minutes to reconstruct a 2048³-voxel volume from 1200 2048²-pixel projection images on a single GPU; this translates to a speedup of approximately 1.47 as compared to the previous method.

Publication: IEICE TRANSACTIONS on Information Vol.E99-D No.12 pp.3060-3071

Publication Date: 2016/12/01

Publicized: 2016/09/05

Online ISSN: 1745-1361

DOI: 10.1587/transinf.2016EDP7174

Type of Manuscript: PAPER

Category: Computer System

Authors

Yuechao LU
  Osaka University
Fumihiko INO
  Osaka University
Kenichi HAGIHARA
  Osaka University

Keyword

cone beam reconstruction, GPU, CUDA, cache optimization

Cite this

Copy

Yuechao LU, Fumihiko INO, Kenichi HAGIHARA, "Cache-Aware GPU Optimization for Out-of-Core Cone Beam CT Reconstruction of High-Resolution Volumes" in IEICE TRANSACTIONS on Information, vol. E99-D, no. 12, pp. 3060-3071, December 2016, doi: 10.1587/transinf.2016EDP7174.
Abstract: This paper proposes a cache-aware optimization method to accelerate out-of-core cone beam computed tomography reconstruction on a graphics processing unit (GPU) device. Our proposed method extends a previous method by increasing the cache hit rate so as to speed up the reconstruction of high-resolution volumes that exceed the capacity of device memory. More specifically, our approach accelerates the well-known Feldkamp-Davis-Kress algorithm by utilizing the following three strategies: (1) a loop organization strategy that identifies the best tradeoff point between the cache hit rate and the number of off-chip memory accesses; (2) a data structure that exploits high locality within a layered texture; and (3) a fully pipelined strategy for hiding file input/output (I/O) time with GPU execution and data transfer times. We implement our proposed method on NVIDIA's latest Maxwell architecture and provide tuning guidelines for adjusting the execution parameters, which include the granularity and shape of thread blocks as well as the granularity of I/O data to be streamed through the pipeline, which maximizes reconstruction performance. Our experimental results show that it took less than three minutes to reconstruct a 2048³-voxel volume from 1200 2048²-pixel projection images on a single GPU; this translates to a speedup of approximately 1.47 as compared to the previous method.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2016EDP7174/_p

Copy

@ARTICLE{e99-d_12_3060,
author={Yuechao LU, Fumihiko INO, Kenichi HAGIHARA, },
journal={IEICE TRANSACTIONS on Information},
title={Cache-Aware GPU Optimization for Out-of-Core Cone Beam CT Reconstruction of High-Resolution Volumes},
year={2016},
volume={E99-D},
number={12},
pages={3060-3071},
abstract={This paper proposes a cache-aware optimization method to accelerate out-of-core cone beam computed tomography reconstruction on a graphics processing unit (GPU) device. Our proposed method extends a previous method by increasing the cache hit rate so as to speed up the reconstruction of high-resolution volumes that exceed the capacity of device memory. More specifically, our approach accelerates the well-known Feldkamp-Davis-Kress algorithm by utilizing the following three strategies: (1) a loop organization strategy that identifies the best tradeoff point between the cache hit rate and the number of off-chip memory accesses; (2) a data structure that exploits high locality within a layered texture; and (3) a fully pipelined strategy for hiding file input/output (I/O) time with GPU execution and data transfer times. We implement our proposed method on NVIDIA's latest Maxwell architecture and provide tuning guidelines for adjusting the execution parameters, which include the granularity and shape of thread blocks as well as the granularity of I/O data to be streamed through the pipeline, which maximizes reconstruction performance. Our experimental results show that it took less than three minutes to reconstruct a 2048³-voxel volume from 1200 2048²-pixel projection images on a single GPU; this translates to a speedup of approximately 1.47 as compared to the previous method.},
keywords={},
doi={10.1587/transinf.2016EDP7174},
ISSN={1745-1361},
month={December},}

Copy

TY - JOUR
TI - Cache-Aware GPU Optimization for Out-of-Core Cone Beam CT Reconstruction of High-Resolution Volumes
T2 - IEICE TRANSACTIONS on Information
SP - 3060
EP - 3071
AU - Yuechao LU
AU - Fumihiko INO
AU - Kenichi HAGIHARA
PY - 2016
DO - 10.1587/transinf.2016EDP7174
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E99-D
IS - 12
JA - IEICE TRANSACTIONS on Information
Y1 - December 2016
AB - This paper proposes a cache-aware optimization method to accelerate out-of-core cone beam computed tomography reconstruction on a graphics processing unit (GPU) device. Our proposed method extends a previous method by increasing the cache hit rate so as to speed up the reconstruction of high-resolution volumes that exceed the capacity of device memory. More specifically, our approach accelerates the well-known Feldkamp-Davis-Kress algorithm by utilizing the following three strategies: (1) a loop organization strategy that identifies the best tradeoff point between the cache hit rate and the number of off-chip memory accesses; (2) a data structure that exploits high locality within a layered texture; and (3) a fully pipelined strategy for hiding file input/output (I/O) time with GPU execution and data transfer times. We implement our proposed method on NVIDIA's latest Maxwell architecture and provide tuning guidelines for adjusting the execution parameters, which include the granularity and shape of thread blocks as well as the granularity of I/O data to be streamed through the pipeline, which maximizes reconstruction performance. Our experimental results show that it took less than three minutes to reconstruct a 2048³-voxel volume from 1200 2048²-pixel projection images on a single GPU; this translates to a speedup of approximately 1.47 as compared to the previous method.
ER -