In off-line analysis, the demand for high precision signal processing has introduced a new method called Empirical Mode Decomposition (EMD), which is used for analyzing a complex set of data. Unfortunately, EMD is highly compute-intensive. In this paper, we show parallel implementation of Empirical Mode Decomposition on a GPU. We propose the use of “partial+total” switching method to increase performance while keeping the precision. We also focused on reducing the computation complexity in the above method from O(N) on a single CPU to O(N/P log (N)) on a GPU. Evaluation results show our single GPU implementation using Tesla C2050 (Fermi architecture) achieves a 29.9x speedup partially, and a 11.8x speedup totally when compared to a single Intel dual core CPU.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Pulung WASKITO, Shinobu MIWA, Yasue MITSUKURA, Hironori NAKAJO, "Evaluation of GPU-Based Empirical Mode Decomposition for Off-Line Analysis" in IEICE TRANSACTIONS on Information,
vol. E94-D, no. 12, pp. 2328-2337, December 2011, doi: 10.1587/transinf.E94.D.2328.
Abstract: In off-line analysis, the demand for high precision signal processing has introduced a new method called Empirical Mode Decomposition (EMD), which is used for analyzing a complex set of data. Unfortunately, EMD is highly compute-intensive. In this paper, we show parallel implementation of Empirical Mode Decomposition on a GPU. We propose the use of “partial+total” switching method to increase performance while keeping the precision. We also focused on reducing the computation complexity in the above method from O(N) on a single CPU to O(N/P log (N)) on a GPU. Evaluation results show our single GPU implementation using Tesla C2050 (Fermi architecture) achieves a 29.9x speedup partially, and a 11.8x speedup totally when compared to a single Intel dual core CPU.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.E94.D.2328/_p
Copy
@ARTICLE{e94-d_12_2328,
author={Pulung WASKITO, Shinobu MIWA, Yasue MITSUKURA, Hironori NAKAJO, },
journal={IEICE TRANSACTIONS on Information},
title={Evaluation of GPU-Based Empirical Mode Decomposition for Off-Line Analysis},
year={2011},
volume={E94-D},
number={12},
pages={2328-2337},
abstract={In off-line analysis, the demand for high precision signal processing has introduced a new method called Empirical Mode Decomposition (EMD), which is used for analyzing a complex set of data. Unfortunately, EMD is highly compute-intensive. In this paper, we show parallel implementation of Empirical Mode Decomposition on a GPU. We propose the use of “partial+total” switching method to increase performance while keeping the precision. We also focused on reducing the computation complexity in the above method from O(N) on a single CPU to O(N/P log (N)) on a GPU. Evaluation results show our single GPU implementation using Tesla C2050 (Fermi architecture) achieves a 29.9x speedup partially, and a 11.8x speedup totally when compared to a single Intel dual core CPU.},
keywords={},
doi={10.1587/transinf.E94.D.2328},
ISSN={1745-1361},
month={December},}
Copy
TY - JOUR
TI - Evaluation of GPU-Based Empirical Mode Decomposition for Off-Line Analysis
T2 - IEICE TRANSACTIONS on Information
SP - 2328
EP - 2337
AU - Pulung WASKITO
AU - Shinobu MIWA
AU - Yasue MITSUKURA
AU - Hironori NAKAJO
PY - 2011
DO - 10.1587/transinf.E94.D.2328
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E94-D
IS - 12
JA - IEICE TRANSACTIONS on Information
Y1 - December 2011
AB - In off-line analysis, the demand for high precision signal processing has introduced a new method called Empirical Mode Decomposition (EMD), which is used for analyzing a complex set of data. Unfortunately, EMD is highly compute-intensive. In this paper, we show parallel implementation of Empirical Mode Decomposition on a GPU. We propose the use of “partial+total” switching method to increase performance while keeping the precision. We also focused on reducing the computation complexity in the above method from O(N) on a single CPU to O(N/P log (N)) on a GPU. Evaluation results show our single GPU implementation using Tesla C2050 (Fermi architecture) achieves a 29.9x speedup partially, and a 11.8x speedup totally when compared to a single Intel dual core CPU.
ER -