1-5hit |
Since first introduced in 2008 with the 1.0 specification, OpenCL has steadily evolved over the decade to increase its support for heterogeneous parallel systems. In this paper, we accelerate stochastic simulation of biochemical reaction networks on modern GPUs (graphics processing units) by means of the OpenCL programming language. In implementing the OpenCL version of the stochastic simulation algorithm, we carefully apply its data-parallel execution model to optimize the performance provided by the underlying hardware parallelism of the modern GPUs. To evaluate our OpenCL implementation of the stochastic simulation algorithm, we perform a comparative analysis in terms of the performance using the CPU-based cluster implementation and the NVidia CUDA implementation. In addition to the initial report on the performance of OpenCL on GPUs, we also discuss applicability and programmability of OpenCL in the context of GPU-based scientific computing.
We present an OpenACC-based parallelization implementation of stochastic algorithms for simulating biochemical reaction networks on modern GPUs (graphics processing units). To investigate the effectiveness of using OpenACC for leveraging the massive hardware parallelism of the GPU architecture, we carefully apply OpenACC's language constructs and mechanisms to implementing a parallel version of stochastic simulation algorithms on the GPU. Using our OpenACC implementation in comparison to both the NVidia CUDA and the CPU-based implementations, we report our initial experiences on OpenACC's performance and programming productivity in the context of GPU-accelerated scientific computing.
We present a GPU (graphics processing unit) accelerated stochastic algorithm implementation for simulating biochemical reaction networks using the latest NVidia architecture. To effectively utilize the massive parallelism offered by the NVidia Pascal hardware, we apply a set of performance tuning methods and guidelines such as exploiting the architecture's memory hierarchy in our algorithm implementation. Based on our experimentation results as well as comparative analysis using CPU-based implementations, we report our initial experiences on the performance of modern GPUs in the context of scientific computing.
Liqiang ZHANG Chao LI Haoliang SUN Changwen ZHENG Pin LV
Due to the complicated composition of cloud and its disordered transformation, the rendering of cloud does not perfectly meet actual prospect by current methods. Based on physical characteristics of cloud, a physical cellular automata model of Dynamic cloud is designed according to intrinsic factor of cloud, which describes the rules of hydro-movement, deposition and accumulation and diffusion. Then a parallel computing architecture is designed to compute the large-scale data set required by the rendering of dynamical cloud, and a GPU-based ray-casting algorithm is implemented to render the cloud volume data. The experiment shows that cloud rendering method based on physical cellular automata model is very efficient and able to adequately exhibit the detail of cloud.
Ryusuke MIYAMOTO Hiroki SUGANO
Pedestrian detection from visual images, which is used for driver assistance or video surveillance, is a recent challenging problem. Co-occurrence histograms of oriented gradients (CoHOG) is a powerful feature descriptor for pedestrian detection and achieves the highest detection accuracy. However, its calculation cost is too large to calculate it in real-time on state-of-the-art processors. In this paper, to obtain optimal parallel implementation for an NVIDIA GPU, several kinds of parallelism of CoHOG-based detection are shown and evaluated suitability for implementation. The experimental result shows that the detection process can be performed at 16.5 fps in QVGA images on NVIDIA Tesla C1060 by optimized parallel implementation. By our evaluation, it is shown that the optimal strategy of parallel implementation for an NVIDIA GPU is different from that of FPGA. We discuss about the reason and show the advantages of each device. To show the scalability and portability of GPU implementation, the same object code is executed on other NVIDA GPUs. The experimental result shows that GTX570 can perform the CoHOG-based pedestiran detection 21.3 fps in QVGA images.