1-2hit |
Pulung WASKITO Shinobu MIWA Yasue MITSUKURA Hironori NAKAJO
In off-line analysis, the demand for high precision signal processing has introduced a new method called Empirical Mode Decomposition (EMD), which is used for analyzing a complex set of data. Unfortunately, EMD is highly compute-intensive. In this paper, we show parallel implementation of Empirical Mode Decomposition on a GPU. We propose the use of “partial+total” switching method to increase performance while keeping the precision. We also focused on reducing the computation complexity in the above method from O(N) on a single CPU to O(N/P log (N)) on a GPU. Evaluation results show our single GPU implementation using Tesla C2050 (Fermi architecture) achieves a 29.9x speedup partially, and a 11.8x speedup totally when compared to a single Intel dual core CPU.
Computation methods using custom circuits are frequently employed to improve the throughput and power efficiency of computing systems. Hardware development, however, can incur significant development costs because designs at the register-transfer level (RTL) with a hardware description language (HDL) are time-consuming. This paper proposes a hardware and software co-design environment, named Mulvery, which is designed for non-professional hardware designer We focus on the similarities between functional reactive programming (FRP) and dataflow in computation. This study provides an idea to design hardware with a dynamic typing language, such as Ruby, using FRP and provides the proof-of-concept of the method. Mulvery, which is a hardware and software co-design tool based on our method, reduces development costs. Mulvery exhibited high performance compared with software processing techniques not equipped with hardware knowledge. According to the experiment, the method allows us to design hardware without degradation of performance. The sample application applied a Laplacian filter to an image with a size of 128×128 and processed a convolution operation within one clock.