Full Text Views
31
The 3D holographic display has long been expected as a future human interface as it does not require users to wear special devices. However, its heavy computation requirement prevents the realization of such displays. A recent study says that objects and holograms with several giga-pixels should be processed in real time for the realization of high resolution and wide view angle. To this problem, first, we have adapted a conventional FFT algorithm to a GPU cluster environment in order to avoid heavy inter-node communications. Then, we have applied several single-node and multi-node optimization and parallelization techniques. The single-node optimizations include a change of the way of object decomposition, reduction of data transfer between the CPU and GPU, kernel integration, stream processing, and utilization of multiple GPUs within a node. The multi-node optimizations include distribution methods of object data from host node to the other nodes. Experimental results show that intra-node optimizations attain 11.52 times speed-up from the original single node code. Further, multi-node optimizations using 8 nodes, 2 GPUs per node, attain an execution time of 4.28 sec for generating a 1.6 giga-pixel hologram from a 3.2 giga-pixel object. It means a 237.92 times speed-up of the sequential processing by CPU and 41.78 times speed-up of multi-threaded execution on multicore-CPU, using a conventional FFT-based algorithm.
Takanobu BABA
Utsunomiya University
Shinpei WATANABE
Acs Co., Ltd.
Boaz JESSIE JACKIN
National Institute of Information and Communications Technology
Kanemitsu OOTSU
Utsunomiya University
Takeshi OHKAWA
Utsunomiya University
Takashi YOKOTA
Utsunomiya University
Yoshio HAYASAKI
Utsunomiya University
Toyohiko YATAGAI
Utsunomiya University
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Takanobu BABA, Shinpei WATANABE, Boaz JESSIE JACKIN, Kanemitsu OOTSU, Takeshi OHKAWA, Takashi YOKOTA, Yoshio HAYASAKI, Toyohiko YATAGAI, "Fast Computation with Efficient Object Data Distribution for Large-Scale Hologram Generation on a Multi-GPU Cluster" in IEICE TRANSACTIONS on Information,
vol. E102-D, no. 7, pp. 1310-1320, July 2019, doi: 10.1587/transinf.2018EDP7346.
Abstract: The 3D holographic display has long been expected as a future human interface as it does not require users to wear special devices. However, its heavy computation requirement prevents the realization of such displays. A recent study says that objects and holograms with several giga-pixels should be processed in real time for the realization of high resolution and wide view angle. To this problem, first, we have adapted a conventional FFT algorithm to a GPU cluster environment in order to avoid heavy inter-node communications. Then, we have applied several single-node and multi-node optimization and parallelization techniques. The single-node optimizations include a change of the way of object decomposition, reduction of data transfer between the CPU and GPU, kernel integration, stream processing, and utilization of multiple GPUs within a node. The multi-node optimizations include distribution methods of object data from host node to the other nodes. Experimental results show that intra-node optimizations attain 11.52 times speed-up from the original single node code. Further, multi-node optimizations using 8 nodes, 2 GPUs per node, attain an execution time of 4.28 sec for generating a 1.6 giga-pixel hologram from a 3.2 giga-pixel object. It means a 237.92 times speed-up of the sequential processing by CPU and 41.78 times speed-up of multi-threaded execution on multicore-CPU, using a conventional FFT-based algorithm.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2018EDP7346/_p
Copy
@ARTICLE{e102-d_7_1310,
author={Takanobu BABA, Shinpei WATANABE, Boaz JESSIE JACKIN, Kanemitsu OOTSU, Takeshi OHKAWA, Takashi YOKOTA, Yoshio HAYASAKI, Toyohiko YATAGAI, },
journal={IEICE TRANSACTIONS on Information},
title={Fast Computation with Efficient Object Data Distribution for Large-Scale Hologram Generation on a Multi-GPU Cluster},
year={2019},
volume={E102-D},
number={7},
pages={1310-1320},
abstract={The 3D holographic display has long been expected as a future human interface as it does not require users to wear special devices. However, its heavy computation requirement prevents the realization of such displays. A recent study says that objects and holograms with several giga-pixels should be processed in real time for the realization of high resolution and wide view angle. To this problem, first, we have adapted a conventional FFT algorithm to a GPU cluster environment in order to avoid heavy inter-node communications. Then, we have applied several single-node and multi-node optimization and parallelization techniques. The single-node optimizations include a change of the way of object decomposition, reduction of data transfer between the CPU and GPU, kernel integration, stream processing, and utilization of multiple GPUs within a node. The multi-node optimizations include distribution methods of object data from host node to the other nodes. Experimental results show that intra-node optimizations attain 11.52 times speed-up from the original single node code. Further, multi-node optimizations using 8 nodes, 2 GPUs per node, attain an execution time of 4.28 sec for generating a 1.6 giga-pixel hologram from a 3.2 giga-pixel object. It means a 237.92 times speed-up of the sequential processing by CPU and 41.78 times speed-up of multi-threaded execution on multicore-CPU, using a conventional FFT-based algorithm.},
keywords={},
doi={10.1587/transinf.2018EDP7346},
ISSN={1745-1361},
month={July},}
Copy
TY - JOUR
TI - Fast Computation with Efficient Object Data Distribution for Large-Scale Hologram Generation on a Multi-GPU Cluster
T2 - IEICE TRANSACTIONS on Information
SP - 1310
EP - 1320
AU - Takanobu BABA
AU - Shinpei WATANABE
AU - Boaz JESSIE JACKIN
AU - Kanemitsu OOTSU
AU - Takeshi OHKAWA
AU - Takashi YOKOTA
AU - Yoshio HAYASAKI
AU - Toyohiko YATAGAI
PY - 2019
DO - 10.1587/transinf.2018EDP7346
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E102-D
IS - 7
JA - IEICE TRANSACTIONS on Information
Y1 - July 2019
AB - The 3D holographic display has long been expected as a future human interface as it does not require users to wear special devices. However, its heavy computation requirement prevents the realization of such displays. A recent study says that objects and holograms with several giga-pixels should be processed in real time for the realization of high resolution and wide view angle. To this problem, first, we have adapted a conventional FFT algorithm to a GPU cluster environment in order to avoid heavy inter-node communications. Then, we have applied several single-node and multi-node optimization and parallelization techniques. The single-node optimizations include a change of the way of object decomposition, reduction of data transfer between the CPU and GPU, kernel integration, stream processing, and utilization of multiple GPUs within a node. The multi-node optimizations include distribution methods of object data from host node to the other nodes. Experimental results show that intra-node optimizations attain 11.52 times speed-up from the original single node code. Further, multi-node optimizations using 8 nodes, 2 GPUs per node, attain an execution time of 4.28 sec for generating a 1.6 giga-pixel hologram from a 3.2 giga-pixel object. It means a 237.92 times speed-up of the sequential processing by CPU and 41.78 times speed-up of multi-threaded execution on multicore-CPU, using a conventional FFT-based algorithm.
ER -