Optimal Parallel Algorithms for Computing the Sum, the Prefix-Sums, and the Summed Area Table on the Memory Machine Models

Koji NAKANO

doi:10.1587/transinf.E96.D.2626

IEICE TRANSACTIONS on Information

Optimal Parallel Algorithms for Computing the Sum, the Prefix-Sums, and the Summed Area Table on the Memory Machine Models

Koji NAKANO

Full Text Views

0

Cite this

Summary :

The main contribution of this paper is to show optimal parallel algorithms to compute the sum, the prefix-sums, and the summed area table on two memory machine models, the Discrete Memory Machine (DMM) and the Unified Memory Machine (UMM). The DMM and the UMM are theoretical parallel computing models that capture the essence of the shared memory and the global memory of GPUs. These models have three parameters, the number p of threads, and the width w of the memory, and the memory access latency l. We first show that the sum of n numbers can be computed in $O({nover w}+{nlover p}+llog n)$ time units on the DMM and the UMM. We then go on to show that $Omega({nover w}+{nlover p}+llog n)$ time units are necessary to compute the sum. We also present a parallel algorithm that computes the prefix-sums of n numbers in $O({nover w}+{nlover p}+llog n)$ time units on the DMM and the UMM. Finally, we show that the summed area table of size $sqrt{n} imessqrt{n}$ can be computed in $O({nover w}+{nlover p}+llog n)$ time units on the DMM and the UMM. Since the computation of the prefix-sums and the summed area table is at least as hard as the sum computation, these parallel algorithms are also optimal.

Publication: IEICE TRANSACTIONS on Information Vol.E96-D No.12 pp.2626-2634

Publication Date: 2013/12/01

Publicized

Online ISSN: 1745-1361

DOI: 10.1587/transinf.E96.D.2626

Type of Manuscript: Special Section PAPER (Special Section on Parallel and Distributed Computing and Networking)

Category

Authors

Koji NAKANO
Hiroshima University

Keyword

memory machine models, prefix-sums computation, parallel algorithm, GPU, CUDA

Cite this

Copy

Koji NAKANO, "Optimal Parallel Algorithms for Computing the Sum, the Prefix-Sums, and the Summed Area Table on the Memory Machine Models" in IEICE TRANSACTIONS on Information, vol. E96-D, no. 12, pp. 2626-2634, December 2013, doi: 10.1587/transinf.E96.D.2626.
Abstract: The main contribution of this paper is to show optimal parallel algorithms to compute the sum, the prefix-sums, and the summed area table on two memory machine models, the Discrete Memory Machine (DMM) and the Unified Memory Machine (UMM). The DMM and the UMM are theoretical parallel computing models that capture the essence of the shared memory and the global memory of GPUs. These models have three parameters, the number p of threads, and the width w of the memory, and the memory access latency l. We first show that the sum of n numbers can be computed in $O({nover w}+{nlover p}+llog n)$ time units on the DMM and the UMM. We then go on to show that $Omega({nover w}+{nlover p}+llog n)$ time units are necessary to compute the sum. We also present a parallel algorithm that computes the prefix-sums of n numbers in $O({nover w}+{nlover p}+llog n)$ time units on the DMM and the UMM. Finally, we show that the summed area table of size $sqrt{n} imessqrt{n}$ can be computed in $O({nover w}+{nlover p}+llog n)$ time units on the DMM and the UMM. Since the computation of the prefix-sums and the summed area table is at least as hard as the sum computation, these parallel algorithms are also optimal.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.E96.D.2626/_p

Copy

@ARTICLE{e96-d_12_2626,
author={Koji NAKANO, },
journal={IEICE TRANSACTIONS on Information},
title={Optimal Parallel Algorithms for Computing the Sum, the Prefix-Sums, and the Summed Area Table on the Memory Machine Models},
year={2013},
volume={E96-D},
number={12},
pages={2626-2634},
abstract={The main contribution of this paper is to show optimal parallel algorithms to compute the sum, the prefix-sums, and the summed area table on two memory machine models, the Discrete Memory Machine (DMM) and the Unified Memory Machine (UMM). The DMM and the UMM are theoretical parallel computing models that capture the essence of the shared memory and the global memory of GPUs. These models have three parameters, the number p of threads, and the width w of the memory, and the memory access latency l. We first show that the sum of n numbers can be computed in $O({nover w}+{nlover p}+llog n)$ time units on the DMM and the UMM. We then go on to show that $Omega({nover w}+{nlover p}+llog n)$ time units are necessary to compute the sum. We also present a parallel algorithm that computes the prefix-sums of n numbers in $O({nover w}+{nlover p}+llog n)$ time units on the DMM and the UMM. Finally, we show that the summed area table of size $sqrt{n} imessqrt{n}$ can be computed in $O({nover w}+{nlover p}+llog n)$ time units on the DMM and the UMM. Since the computation of the prefix-sums and the summed area table is at least as hard as the sum computation, these parallel algorithms are also optimal.},
keywords={},
doi={10.1587/transinf.E96.D.2626},
ISSN={1745-1361},
month={December},}

Copy

TY - JOUR
TI - Optimal Parallel Algorithms for Computing the Sum, the Prefix-Sums, and the Summed Area Table on the Memory Machine Models
T2 - IEICE TRANSACTIONS on Information
SP - 2626
EP - 2634
AU - Koji NAKANO
PY - 2013
DO - 10.1587/transinf.E96.D.2626
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E96-D
IS - 12
JA - IEICE TRANSACTIONS on Information
Y1 - December 2013
AB - The main contribution of this paper is to show optimal parallel algorithms to compute the sum, the prefix-sums, and the summed area table on two memory machine models, the Discrete Memory Machine (DMM) and the Unified Memory Machine (UMM). The DMM and the UMM are theoretical parallel computing models that capture the essence of the shared memory and the global memory of GPUs. These models have three parameters, the number p of threads, and the width w of the memory, and the memory access latency l. We first show that the sum of n numbers can be computed in $O({nover w}+{nlover p}+llog n)$ time units on the DMM and the UMM. We then go on to show that $Omega({nover w}+{nlover p}+llog n)$ time units are necessary to compute the sum. We also present a parallel algorithm that computes the prefix-sums of n numbers in $O({nover w}+{nlover p}+llog n)$ time units on the DMM and the UMM. Finally, we show that the summed area table of size $sqrt{n} imessqrt{n}$ can be computed in $O({nover w}+{nlover p}+llog n)$ time units on the DMM and the UMM. Since the computation of the prefix-sums and the summed area table is at least as hard as the sum computation, these parallel algorithms are also optimal.
ER -

IEICE TRANSACTIONS on Information