Fully Parallelized LZW Decompression for CUDA-Enabled GPUs

Shunji FUNASAKA; Koji NAKANO; Yasuaki ITO

doi:10.1587/transinf.2016PAP0011

IEICE TRANSACTIONS on Information

Fully Parallelized LZW Decompression for CUDA-Enabled GPUs

Shunji FUNASAKA, Koji NAKANO, Yasuaki ITO

Full Text Views

0

Cite this

Summary :

The main contribution of this paper is to present a work-optimal parallel algorithm for LZW decompression and to implement it in a CUDA-enabled GPU. Since sequential LZW decompression creates a dictionary table by reading codes in a compressed file one by one, it is not easy to parallelize it. We first present a work-optimal parallel LZW decompression algorithm on the CREW-PRAM (Concurrent-Read Exclusive-Write Parallel Random Access Machine), which is a standard theoretical parallel computing model with a shared memory. We then go on to present an efficient implementation of this parallel algorithm on a GPU. The experimental results show that our GPU implementation performs LZW decompression in 1.15 milliseconds for a gray scale TIFF image with 4096×3072 pixels stored in the global memory of GeForce GTX 980. On the other hand, sequential LZW decompression for the same image stored in the main memory of Intel Core i7 CPU takes 50.1 milliseconds. Thus, our parallel LZW decompression on the global memory of the GPU is 43.6 times faster than a sequential LZW decompression on the main memory of the CPU for this image. To show the applicability of our GPU implementation for LZW decompression, we evaluated the SSD-GPU data loading time for three scenarios. The experimental results show that the scenario using our LZW decompression on the GPU is faster than the others.

Publication: IEICE TRANSACTIONS on Information Vol.E99-D No.12 pp.2986-2994

Publication Date: 2016/12/01

Publicized: 2016/08/25

Online ISSN: 1745-1361

DOI: 10.1587/transinf.2016PAP0011

Type of Manuscript: Special Section PAPER (Special Section on Parallel and Distributed Computing and Networking)

Category: GPU computing

Authors

Shunji FUNASAKA
  Hiroshima University
Koji NAKANO
  Hiroshima University
Yasuaki ITO
  Hiroshima University

Keyword

data compression, big data, parallel algorithm, GPU, CUDA

Cite this

Copy

Shunji FUNASAKA, Koji NAKANO, Yasuaki ITO, "Fully Parallelized LZW Decompression for CUDA-Enabled GPUs" in IEICE TRANSACTIONS on Information, vol. E99-D, no. 12, pp. 2986-2994, December 2016, doi: 10.1587/transinf.2016PAP0011.
Abstract: The main contribution of this paper is to present a work-optimal parallel algorithm for LZW decompression and to implement it in a CUDA-enabled GPU. Since sequential LZW decompression creates a dictionary table by reading codes in a compressed file one by one, it is not easy to parallelize it. We first present a work-optimal parallel LZW decompression algorithm on the CREW-PRAM (Concurrent-Read Exclusive-Write Parallel Random Access Machine), which is a standard theoretical parallel computing model with a shared memory. We then go on to present an efficient implementation of this parallel algorithm on a GPU. The experimental results show that our GPU implementation performs LZW decompression in 1.15 milliseconds for a gray scale TIFF image with 4096×3072 pixels stored in the global memory of GeForce GTX 980. On the other hand, sequential LZW decompression for the same image stored in the main memory of Intel Core i7 CPU takes 50.1 milliseconds. Thus, our parallel LZW decompression on the global memory of the GPU is 43.6 times faster than a sequential LZW decompression on the main memory of the CPU for this image. To show the applicability of our GPU implementation for LZW decompression, we evaluated the SSD-GPU data loading time for three scenarios. The experimental results show that the scenario using our LZW decompression on the GPU is faster than the others.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2016PAP0011/_p

Copy

@ARTICLE{e99-d_12_2986,
author={Shunji FUNASAKA, Koji NAKANO, Yasuaki ITO, },
journal={IEICE TRANSACTIONS on Information},
title={Fully Parallelized LZW Decompression for CUDA-Enabled GPUs},
year={2016},
volume={E99-D},
number={12},
pages={2986-2994},
abstract={The main contribution of this paper is to present a work-optimal parallel algorithm for LZW decompression and to implement it in a CUDA-enabled GPU. Since sequential LZW decompression creates a dictionary table by reading codes in a compressed file one by one, it is not easy to parallelize it. We first present a work-optimal parallel LZW decompression algorithm on the CREW-PRAM (Concurrent-Read Exclusive-Write Parallel Random Access Machine), which is a standard theoretical parallel computing model with a shared memory. We then go on to present an efficient implementation of this parallel algorithm on a GPU. The experimental results show that our GPU implementation performs LZW decompression in 1.15 milliseconds for a gray scale TIFF image with 4096×3072 pixels stored in the global memory of GeForce GTX 980. On the other hand, sequential LZW decompression for the same image stored in the main memory of Intel Core i7 CPU takes 50.1 milliseconds. Thus, our parallel LZW decompression on the global memory of the GPU is 43.6 times faster than a sequential LZW decompression on the main memory of the CPU for this image. To show the applicability of our GPU implementation for LZW decompression, we evaluated the SSD-GPU data loading time for three scenarios. The experimental results show that the scenario using our LZW decompression on the GPU is faster than the others.},
keywords={},
doi={10.1587/transinf.2016PAP0011},
ISSN={1745-1361},
month={December},}

Copy

TY - JOUR
TI - Fully Parallelized LZW Decompression for CUDA-Enabled GPUs
T2 - IEICE TRANSACTIONS on Information
SP - 2986
EP - 2994
AU - Shunji FUNASAKA
AU - Koji NAKANO
AU - Yasuaki ITO
PY - 2016
DO - 10.1587/transinf.2016PAP0011
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E99-D
IS - 12
JA - IEICE TRANSACTIONS on Information
Y1 - December 2016
AB - The main contribution of this paper is to present a work-optimal parallel algorithm for LZW decompression and to implement it in a CUDA-enabled GPU. Since sequential LZW decompression creates a dictionary table by reading codes in a compressed file one by one, it is not easy to parallelize it. We first present a work-optimal parallel LZW decompression algorithm on the CREW-PRAM (Concurrent-Read Exclusive-Write Parallel Random Access Machine), which is a standard theoretical parallel computing model with a shared memory. We then go on to present an efficient implementation of this parallel algorithm on a GPU. The experimental results show that our GPU implementation performs LZW decompression in 1.15 milliseconds for a gray scale TIFF image with 4096×3072 pixels stored in the global memory of GeForce GTX 980. On the other hand, sequential LZW decompression for the same image stored in the main memory of Intel Core i7 CPU takes 50.1 milliseconds. Thus, our parallel LZW decompression on the global memory of the GPU is 43.6 times faster than a sequential LZW decompression on the main memory of the CPU for this image. To show the applicability of our GPU implementation for LZW decompression, we evaluated the SSD-GPU data loading time for three scenarios. The experimental results show that the scenario using our LZW decompression on the GPU is faster than the others.
ER -

IEICE TRANSACTIONS on Information