The search functionality is under construction.

Keyword Search Result

[Keyword] GPU(89hit)

81-89hit(89hit)

  • A Parallel Framework for Fast Photomosaics

    Dongwann KANG  Sang-Hyun SEO  Seung-Taek RYOO  Kyung-Hyun YOON  

     
    PAPER-Computer Graphics

      Vol:
    E94-D No:10
      Page(s):
    2036-2042

    Main bottleneck of photomosaic algorithm is a search for a best matched image. Unlike several techniques which use fast approximation search for increasing the speed, we propose a parallel framework for fast photomosaic using a programmable GPU. This paper suggests a design of vertex structure for a best match searching on each cell of photomosaic grid and shows a texture representation of image database. The shader programs which are used for searching a best match and rendering image tiles into a display are presented. In addition, a simple duplicate reduction and color correction methods are proposed. Our algorithm not only offers dramatic enhancement of speed, but also always guarantees the 'exact' result.

  • NUFFT- & GPU-Based Fast Imaging of Vegetation

    Amedeo CAPOZZOLI  Claudio CURCIO  Antonio DI VICO  Angelo LISENO  

     
    PAPER-Sensing

      Vol:
    E94-B No:7
      Page(s):
    2092-2103

    We develop an effective algorithm, based on the filtered backprojection (FBP) approach, for the imaging of vegetation. Under the FBP scheme, the reconstruction amounts at a non-trivial Fourier inversion, since the data are Fourier samples arranged on a non-Cartesian grid. The computational issue is efficiently tackled by Non-Uniform Fast Fourier Transforms (NUFFTs), whose complexity grows asymptotically as that of a standard FFT. Furthermore, significant speed-ups, as compared to fast CPU implementations, are obtained by a parallel versions of the NUFFT algorithm, purposely designed to be run on Graphic Processing Units (GPUs) by using the CUDA language. The performance of the parallel algorithm has been assessed in comparison to a CPU-multicore accelerated, Matlab implementation of the same routine, to other CPU-multicore accelerated implementations based on standard FFT and employing linear, cubic, spline and sinc interpolations and to a different, parallel algorithm exploiting a parallel linear interpolation stage. The proposed approach has resulted the most computationally convenient. Furthermore, an indoor, polarimetric experimental setup is developed, capable to isolate and introduce, one at a time, different non-idealities of a real acquisition, as the sources (wind, rain) of temporal decorrelation. Experimental far-field polarimetric measurements on a thuja plicata (western redcedar) tree point out the performance of the set up algorithm, its robustness against data truncation and temporal decorrelation as well as the possibility of discriminating scatterers with different features within the investigated scene.

  • Real-Time Object Detection Using Adaptive Background Model and Margined Sign Correlation

    Ayaka YAMAMOTO  Yoshio IWAI  Hiroshi ISHIGURO  

     
    PAPER-Image Recognition, Computer Vision

      Vol:
    E94-D No:2
      Page(s):
    325-335

    Background subtraction is widely used in detecting moving objects; however, changing illumination conditions, color similarity, and real-time performance remain important problems. In this paper, we introduce a sequential method for adaptively estimating background components using Kalman filters, and a novel method for detecting objects using margined sign correlation (MSC). By applying MSC to our adaptive background model, the proposed system can perform object detection robustly and accurately. The proposed method is suitable for implementation on a graphics processing unit (GPU) and as such, the system realizes real-time performance efficiently. Experimental results demonstrate the performance of the proposed system.

  • Acceleration of Computing the Kleene Star in Max-Plus Algebra Using CUDA GPUs

    Hiroyuki GOTO  

     
    LETTER-Fundamentals of Information Systems

      Vol:
    E94-D No:2
      Page(s):
    371-374

    This research aims to accelerate the computation module in max-plus algebra using CUDA technology on graphics processing units (GPUs) designed for high-performance computing. Our target is the Kleene star of a weighted adjacency matrix for directed acyclic graphs (DAGs). Using a inexpensive GPU card for our experiments, we obtained more than a 16-fold speedup compared with an Athlon 64 X2.

  • Acceleration of Differential Power Analysis through the Parallel Use of GPU and CPU

    Sung Jae LEE  Seog Chung SEO  Dong-Guk HAN  Seokhie HONG  Sangjin LEE  

     
    LETTER-Cryptography and Information Security

      Vol:
    E93-A No:9
      Page(s):
    1688-1692

    This paper proposes methods for accelerating DPA by using the CPU and the GPU in a parallel manner. The overhead of naive DPA evaluation software increases excessively as the number of points in a trace or the number of traces is enlarged due to the rapid increase of file I/O overhead. This paper presents some techniques, with respect to DPA-arithmetic and file handling, which can make the overhead of DPA software become not extreme but gradual as the increase of the amount of trace data to be processed. Through generic experiments, we show that the software, equipped with the proposed methods, using both CPU and GPU can shorten the time for evaluating the DPA resistance of devices by almost half.

  • Accelerating Smith-Waterman Algorithm for Biological Database Search on CUDA-Compatible GPUs

    Yuma MUNEKAWA  Fumihiko INO  Kenichi HAGIHARA  

     
    PAPER-Parallel and Distributed Architecture

      Vol:
    E93-D No:6
      Page(s):
    1479-1488

    This paper presents a fast method capable of accelerating the Smith-Waterman algorithm for biological database search on a cluster of graphics processing units (GPUs). Our method is implemented using compute unified device architecture (CUDA), which is available on the nVIDIA GPU. As compared with previous methods, our method has four major contributions. (1) The method efficiently uses on-chip shared memory to reduce the data amount being transferred between off-chip video memory and processing elements in the GPU. (2) It also reduces the number of data fetches by applying a data reuse technique to query and database sequences. (3) A pipelined method is also implemented to overlap GPU execution with database access. (4) Finally, a master/worker paradigm is employed to accelerate hundreds of database searches on a cluster system. In experiments, the peak performance on a GeForce GTX 280 card reaches 8.32 giga cell updates per second (GCUPS). We also find that our method reduces the amount of data fetches to 1/140, achieving approximately three times higher performance than a previous CUDA-based method. Our 32-node cluster version is approximately 28 times faster than a single GPU version. Furthermore, the effective performance reaches 75.6 giga instructions per second (GIPS) using 32 GeForce 8800 GTX cards.

  • Design and Implementation of a Real-Time Video-Based Rendering System Using a Network Camera Array

    Yuichi TAGUCHI  Keita TAKAHASHI  Takeshi NAEMURA  

     
    PAPER-Image Processing and Video Processing

      Vol:
    E92-D No:7
      Page(s):
    1442-1452

    We present a real-time video-based rendering system using a network camera array. Our system consists of 64 commodity network cameras that are connected to a single PC through a gigabit Ethernet. To render a high-quality novel view, our system estimates a view-dependent per-pixel depth map in real time by using a layered representation. The rendering algorithm is fully implemented on the GPU, which allows our system to efficiently perform capturing and rendering processes as a pipeline by using the CPU and GPU independently. Using QVGA input video resolution, our system renders a free-viewpoint video at up to 30 frames per second, depending on the output video resolution and the number of depth layers. Experimental results show high-quality images synthesized from various scenes.

  • Media Processing LSI Architectures for Automotives -- Challenges and Future Trends --

    Ichiro KURODA  Shorin KYO  

     
    INVITED PAPER

      Vol:
    E90-C No:10
      Page(s):
    1850-1857

    This paper presents media processor architectures for automotive applications. Media processing applications with their requirements for LSI implementations are first described for vision based driver assistance as well as graphical user interface for car navigation using 3D graphics. Then, parallel processing architectures for vision and graphics in these applications are reviewed with their performance and cost. After that, future trends of automotive media processing such as integration of vision and 3D graphics functions are shown with their applications and the required performance. Moreover, parallel processing architectures are discussed for the integration of vision and graphics. Finally, an prospect of a next-generation media processing LSI for automotives is provided.

  • Real-Time Space Carving Using Graphics Hardware

    Christian NITSCHKE  Atsushi NAKAZAWA  Haruo TAKEMURA  

     
    PAPER

      Vol:
    E90-D No:8
      Page(s):
    1175-1184

    Reconstruction of real-world scenes from a set of multiple images is a topic in computer vision and 3D computer graphics with many interesting applications. Attempts have been made to real-time reconstruction on PC cluster systems. While these provide enough performance, they are expensive and less flexible. Approaches that use a GPU hardware-acceleration on single workstations achieve real-time framerates for novel-view synthesis, but do not provide an explicit volumetric representation. This work shows our efforts in developing a GPU hardware-accelerated framework for providing a photo-consistent reconstruction of a dynamic 3D scene. High performance is achieved by employing a shape from silhouette technique in advance. Since the entire processing is done on a single PC, the framework can be applied in mobile environments, enabling a wide range of further applications. We explain our approach using programmable vertex and fragment processors and compare it to highly optimized CPU implementations. We show that the new approach can outperform the latter by more than one magnitude and give an outlook for interesting future enhancements.

81-89hit(89hit)