The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] graphics(78hit)

1-20hit(78hit)

  • Multiparallel MMT: Faster ISD Algorithm Solving High-Dimensional Syndrome Decoding Problem

    Shintaro NARISADA  Kazuhide FUKUSHIMA  Shinsaku KIYOMOTO  

     
    PAPER

      Pubricized:
    2022/11/09
      Vol:
    E106-A No:3
      Page(s):
    241-252

    The hardness of the syndrome decoding problem (SDP) is the primary evidence for the security of code-based cryptosystems, which are one of the finalists in a project to standardize post-quantum cryptography conducted by the U.S. National Institute of Standards and Technology (NIST-PQC). Information set decoding (ISD) is a general term for algorithms that solve SDP efficiently. In this paper, we conducted a concrete analysis of the time complexity of the latest ISD algorithms under the limitation of memory using the syndrome decoding estimator proposed by Esser et al. As a result, we present that theoretically nonoptimal ISDs, such as May-Meurer-Thomae (MMT) and May-Ozerov, have lower time complexity than other ISDs in some actual SDP instances. Based on these facts, we further studied the possibility of multiple parallelization for these ISDs and proposed the first GPU algorithm for MMT, the multiparallel MMT algorithm. In the experiments, we show that the multiparallel MMT algorithm is faster than existing ISD algorithms. In addition, we report the first successful attempts to solve the 510-, 530-, 540- and 550-dimensional SDP instances in the Decoding Challenge contest using the multiparallel MMT.

  • Benchmarking Modern Edge Devices for AI Applications

    Pilsung KANG  Jongmin JO  

     
    PAPER-Computer System

      Pubricized:
    2020/12/08
      Vol:
    E104-D No:3
      Page(s):
    394-403

    AI (artificial intelligence) has grown at an overwhelming speed for the last decade, to the extent that it has become one of the mainstream tools that drive the advancements in science and technology. Meanwhile, the paradigm of edge computing has emerged as one of the foremost areas in which applications using the AI technology are being most actively researched, due to its potential benefits and impact on today's widespread networked computing environments. In this paper, we evaluate two major entry-level offerings in the state-of-the-art edge device technology, which highlight increased computing power and specialized hardware support for AI applications. We perform a set of deep learning benchmarks on the devices to measure their performance. By comparing the performance with other GPU (graphics processing unit) accelerated systems in different platforms, we assess the computational capability of the modern edge devices featuring a significant amount of hardware parallelism.

  • Rootkit inside GPU Kernel Execution

    Ohmin KWON  Hyun KWON  Hyunsoo YOON  

     
    LETTER-Dependable Computing

      Pubricized:
    2019/08/19
      Vol:
    E102-D No:11
      Page(s):
    2261-2264

    We propose a rootkit installation method inside a GPU kernel execution process which works through GPU context manipulation. In GPU-based applications such as deep learning computations and cryptographic operations, the proposed method uses the feature by which the execution flow of the GPU kernel obeys the GPU context information in GPU memory. The proposed method consists of two key ideas. The first is GPU code manipulation, which is able to hijack the execution flow of the original GPU kernel to execute an injected payload without affecting the original GPU computation result. The second is a self-page-table update execution during which the GPU kernel updates its page table to access any location in system memory. After the installation, the malicious payload is executed only in the GPU kernel, and any no evidence remains in system memory. Thus, it cannot be detected by conventional rootkit detection methods.

  • View Priority Based Threads Allocation and Binary Search Oriented Reweight for GPU Accelerated Real-Time 3D Ball Tracking

    Yilin HOU  Ziwei DENG  Xina CHENG  Takeshi IKENAGA  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2018/08/31
      Vol:
    E101-D No:12
      Page(s):
    3190-3198

    In real-time 3D ball tracking of sports analysis in computer vision technology, complex algorithms which assure the accuracy could be time-consuming. Particle filter based algorithm has a large potential to accelerate since the algorithm between particles has the chance to be paralleled in heterogeneous CPU-GPU platform. Still, with the target multi-view 3D ball tracking algorithm, challenges exist: 1) serial flowchart for each step in the algorithm; 2) repeated processing for multiple views' processing; 3) the low degree of parallelism in reweight and resampling steps for sequential processing. On the CPU-GPU platform, this paper proposes the double stream system flow, the view priority based threads allocation, and the binary search oriented reweight. Double stream system flow assigns tasks which there is no data dependency exists into different streams for each frame processing to achieve parallelism in system structure level. View priority based threads allocation manipulates threads in multi-view observation task. Threads number is view number multiplied by particles number, and with view priority assigning, which could help both memory accessing and computing achieving parallelism. Binary search oriented reweight reduces the time complexity by avoiding to generate cumulative distribution function and uses an unordered array to implement a binary search. The experiment is based on videos which record the final game of an official volleyball match (2014 Inter-High School Games of Men's Volleyball held in Tokyo Metropolitan Gymnasium in Aug. 2014) and the test sequences are taken by multiple-view system which is made of 4 cameras locating at the four corners of the court. The success rate achieves 99.23% which is the same as target algorithm while the time consumption has been accelerated from 75.1ms/frame in CPU environment to 3.05ms/frame in the proposed system which is 24.62 times speed up, also, it achieves 2.33 times speedup compared with basic GPU implemented work.

  • Energy-Based Tree Illustration System: ETIS

    Katsuto NAKAJIMA  Azusa MAMA  Yuki MORIMOTO  

     
    LETTER-Computer Graphics

      Pubricized:
    2016/05/25
      Vol:
    E99-D No:9
      Page(s):
    2417-2421

    We propose a system named ETIS (Energy-based Tree Illustration System) for automatically generating tree illustrations characteristic of two-dimensional ones with features such as exaggerated branch curves, leaves, and flowers. The growth behavior of the trees can be controlled by adjusting the energy. The canopy shape and the region to fill with leaves and flowers are also controlled by hand-drawn guide lines.

  • Design and Comparison of Immersive Gesture Interfaces for HMD Based Virtual World Navigation

    Bong-Soo SOHN  

     
    LETTER-Computer Graphics

      Pubricized:
    2016/04/05
      Vol:
    E99-D No:7
      Page(s):
    1957-1960

    Mass-market head mounted displays (HMDs) are currently attracting a wide interest from consumers because they allow immersive virtual reality (VR) experiences at an affordable cost. Flying over a virtual environment is a common application of HMD. However, conventional keyboard- or mouse-based interfaces decrease the level of immersion. From this motivation, we design three types of immersive gesture interfaces (bird, superman, and hand) for the flyover navigation. A Kinect depth camera is used to recognize each gesture by extracting and analyzing user's body skeletons. We evaluate the usability of each interface through a user study. As a result, we analyze the advantages and disadvantages of each interface, and demonstrate that our gesture interfaces are preferable for obtaining a high level of immersion and fun in an HMD based VR environment.

  • Controlling the Simulation of Cumuliform Clouds Based on Fluid Dynamics

    Tatsuki KAWAGUCHI  Yoshinori DOBASHI  Tsuyoshi YAMAMOTO  

     
    LETTER-Computer Graphics

      Pubricized:
    2015/07/24
      Vol:
    E98-D No:11
      Page(s):
    2034-2037

    Controlling fluid simulation is one of the important research topics in computer graphics. In this paper, we focus on controlling the simulation of cumuliform cloud formation. Using a previously proposed method for controlling cloud simulation the convergence speed is very slow; therefore, it takes a long time before the clouds form the desired shapes. We improved the method and accelerated the convergence by introducing a new mechanism for controlling the amount of water vapor added. We demonstrate the effectiveness of the proposed method by several examples.

  • Contour Gradient Tree for Automatic Extraction of Salient Object Surfaces from 3D Imaging Data

    Bong-Soo SOHN  

     
    LETTER-Computer Graphics

      Pubricized:
    2015/07/31
      Vol:
    E98-D No:11
      Page(s):
    2038-2042

    Isosurface extraction is one of the most popular techniques for visualizing scalar volume data. However, volume data contains infinitely many isosurfaces. Furthermore, a single isosurface might contain many connected components, or contours, with each representing a different object surface. Hence, it is often a tedious and time-consuming manual process to find and extract contours that are interesting to users. This paper describes a novel method for automatically extracting salient contours from volume data. For this purpose, we propose a contour gradient tree (CGT) that contains the information of salient contours and their saliency magnitude. We organize the CGT in a hierarchical way to generate a sequence of contours in saliency order. Our method was applied to various medical datasets. Experimental results show that our method can automatically extract salient contours that represent regions of interest in the data.

  • Parallelization of Dynamic Time Warping on a Heterogeneous Platform

    Yao ZHENG  Limin XIAO  Wenqi TANG  Lihong SHANG  Guangchao YAO  Li RUAN  

     
    LETTER-Algorithms and Data Structures

      Vol:
    E97-A No:11
      Page(s):
    2258-2262

    The dynamic time warping (DTW) algorithm is widely used to determine time series similarity search. As DTW has quadratic time complexity, the time taken for similarity search is the bottleneck for virtually all time series data mining algorithms. In this paper, we present a parallel approach for DTW on a heterogeneous platform with a graphics processing unit (GPU). In order to exploit fine-grained data-level parallelism, we propose a specific parallel decomposition in DTW. Furthermore, we introduce an optimization technique called diamond tiling to improve the utilization of threads. Results show that our approach substantially reduces computational time.

  • Efficient Screen Space Anisotropic Blurred Soft Shadows

    Zhongxiang ZHENG  Suguru SAITO  

     
    PAPER-Rendering

      Vol:
    E97-D No:8
      Page(s):
    2038-2045

    Shadow mapping is an efficient method to generate shadows in real time computer graphics and has broad variations from hard to soft shadow synthesis. Soft shadowing based on shadow mapping is a blurring technique on a shadow map or on screen space. Blurring on screen space has an advantage for efficient sampling on a shadow map, since the blurred target array has exactly the same coordinates as the screen. However, a previous blurring method on screen space has a drawback: the generated shadow is not correct when a view direction has a large angle to the normal of the shadowed plane. In this paper, we introduce a new screen space based method for soft shadowing that is fast and generates soft shadows more accurately than the previous screen space soft shadow mapping method. The resultant images show shadows produced by our method just stand in the same place, while shadows by the previous method change in terms of penumbra while the view moves. Surprisingly, although our method is more complex than the previous method, the measurement results of the calculation time show our method is almost the same performance. This is because it controls the blurring area more accurately and thus successfully reduces multiplications for blurring.

  • Throughput and Power Efficiency Evaluation of Block Ciphers on Kepler and GCN GPUs Using Micro-Benchmark Analysis

    Naoki NISHIKAWA  Keisuke IWAI  Hidema TANAKA  Takakazu KUROKAWA  

     
    PAPER-Fundamentals of Information Systems

      Vol:
    E97-D No:6
      Page(s):
    1506-1515

    Computer systems with GPUs are expected to become a strong methodology for high-speed encryption processing. Moreover, power consumption has remained a primary deterrent for such processing on devices of all sizes. However, GPU vendors are currently announcing their future roadmaps of GPU architecture development: Nvidia Corp. promotes the Kepler architecture and AMD Corp. emphasizes the GCN architecture. Therefore, we evaluated throughput and power efficiency of three 128-bit block ciphers on GPUs with recent Nvidia Kepler and AMD GCN architectures. From our experiments, whereas the throughput and per-watt throughput of AES-128 on Radeon HD 7970 (2048 cores) with GCN architecture are 205.0Gbps and 1.3Gbps/Watt respectively, those on Geforce GTX 680 (1536 cores) with Kepler architecture are, respectively, 63.9Gbps and 0.43Gbps/W; an approximately 3.2 times throughput difference occurs between AES-128 on the two GPUs. Next, we investigate the reasons for the throughput difference using our micro-benchmark suites. According to the results, we speculate that to ameliorate Kepler GPUs as co-processor of block ciphers, the arithmetic and logical instructions must be improved in terms of software and hardware.

  • Fast Density-Based Clustering Using Graphics Processing Units

    Woong-Kee LOH  Yang-Sae MOON  Young-Ho PARK  

     
    LETTER-Artificial Intelligence, Data Mining

      Vol:
    E97-D No:5
      Page(s):
    1349-1352

    Due to the recent technical advances, GPUs are used for general applications as well as screen display. Many research results have been proposed to the performance of previous CPU-based algorithms by a few hundred times using the GPUs. In this paper, we propose a density-based clustering algorithm called GSCAN, which reduces the number of unnecessary distance computations using a grid structure. As a result of our experiments, GSCAN outperformed CUDA-DClust [2] and DBSCAN [3] by up to 13.9 and 32.6 times, respectively.

  • A Line Smoothing Method of Hand-Drawn Strokes Using Adaptive Moving Average for Illustration Tracing Tasks

    Hotaka KAWASE  Mikio SHINYA  Michio SHIRAISHI  

     
    PAPER-Computer Graphics

      Vol:
    E95-D No:11
      Page(s):
    2704-2709

    There are many web sites where net users can post and distribute their illustration images. A typical way to draw a digital illustration is first to draw rough lines on a paper and then to trace the lines on a graphics-tablet by hand. The input lines usually contain fluctuation due to hand-drawing, which limits the quality of illustration. Therefore, it is important to remove the fluctuation and to smooth the lines while maintaining sharp features such as corners. Although naive applications of moving average filters can smooth input lines, they may cause over-smoothing artifacts in which sharp features are lost by the filtering. This paper describes an improved line smoothing method using adaptive moving averages, which smoothes input lines while keeping high curvature points. The proposed method evaluates curvatures of input lines and adaptively controls the filter-size to reduce the over-smoothing artifacts. Experiments demonstrated advantages of the proposed method over the previous method in terms of achieving smoothing effect while still preserving sharp feature preservation.

  • OpenGL SC Implementation on the OpenGL Hardware

    Nakhoon BAEK  Hwanyong LEE  

     
    LETTER-Computer Graphics

      Vol:
    E95-D No:10
      Page(s):
    2589-2592

    The need for the OpenGL-family of the 3D rendering API's are highly increasing, especially for graphical human-machine interfaces on various systems. In the case of safety-critical market for avionics, military, medical and automotive applications, OpenGL SC, the safety critical profile of the OpenGL standard plays the major role for graphical interfaces. In this paper, we present an efficient way of implementing OpenGL SC 3D graphics API for the environments with hardware-supported OpenGL 1.1 and its multi-texture extension facility, which is widely available on recent embedded systems. Our approach achieved the OpenGL SC features at the low development cost on the embedded systems and also on general personal computers. Our final result shows its compliance with the OpenGL SC standard specification. From the efficiency point of view, we measured its execution times for various application programs, to show a remarkable speed-up.

  • Design of an OpenVG Hardware Rendering Engine

    Yong-Luo SHEN  Seok-Jae KIM  Sang-Woo SEO  Hyun-Goo LEE  Hyeong-Cheol OH  

     
    PAPER-Computer System

      Vol:
    E94-D No:12
      Page(s):
    2409-2417

    This paper introduces a hardware engine for rendering two-dimensional vector graphics based on the OpenVG standard in portable devices. We focus on two design challenges posed by the rendering engines: the number of vertices to represent the images and the amount of memory usage. Redundant vertices are eliminated using adaptive tessellation, in which the redundancy can be judged using a proposed cost-per-quality measure. A simplified edge-flag rendering algorithm and the scanline-based rendering scheme are adopted to reduce external memory access. The designed rendering engine occupies approximately 173 K gates and can satisfy real-time requirements of many applications when it is implemented using a 0.18 µm, 1.8 V CMOS standard cell library. An FPGA prototype using a system-on-a-chip platform has been developed and tested.

  • NUFFT- & GPU-Based Fast Imaging of Vegetation

    Amedeo CAPOZZOLI  Claudio CURCIO  Antonio DI VICO  Angelo LISENO  

     
    PAPER-Sensing

      Vol:
    E94-B No:7
      Page(s):
    2092-2103

    We develop an effective algorithm, based on the filtered backprojection (FBP) approach, for the imaging of vegetation. Under the FBP scheme, the reconstruction amounts at a non-trivial Fourier inversion, since the data are Fourier samples arranged on a non-Cartesian grid. The computational issue is efficiently tackled by Non-Uniform Fast Fourier Transforms (NUFFTs), whose complexity grows asymptotically as that of a standard FFT. Furthermore, significant speed-ups, as compared to fast CPU implementations, are obtained by a parallel versions of the NUFFT algorithm, purposely designed to be run on Graphic Processing Units (GPUs) by using the CUDA language. The performance of the parallel algorithm has been assessed in comparison to a CPU-multicore accelerated, Matlab implementation of the same routine, to other CPU-multicore accelerated implementations based on standard FFT and employing linear, cubic, spline and sinc interpolations and to a different, parallel algorithm exploiting a parallel linear interpolation stage. The proposed approach has resulted the most computationally convenient. Furthermore, an indoor, polarimetric experimental setup is developed, capable to isolate and introduce, one at a time, different non-idealities of a real acquisition, as the sources (wind, rain) of temporal decorrelation. Experimental far-field polarimetric measurements on a thuja plicata (western redcedar) tree point out the performance of the set up algorithm, its robustness against data truncation and temporal decorrelation as well as the possibility of discriminating scatterers with different features within the investigated scene.

  • Accurate Human Detection by Appearance and Motion

    Shaopeng TANG  Satoshi GOTO  

     
    PAPER

      Vol:
    E93-D No:10
      Page(s):
    2728-2736

    In this paper, a human detection method is developed. An appearance based detector and a motion based detector are proposed respectively. A multi scale block histogram of template feature (MB-HOT) is used to detect human by the appearance. It integrates the gray value information and the gradient value information, and represents the relationship of three blocks. Experiment on INRIA dataset shows that this feature is more discriminative than other features, such as histogram of orientation gradient (HOG). A motion based feature is also proposed to capture the relative motion of human body. This feature is calculated in optical flow domain and experimental result in our dataset shows that this feature outperforms other motion based features. The detection responses obtained by two features are combined to reduce the false detection. Graphic process unit (GPU) based implementation is proposed to accelerate the calculation of two features, and make it suitable for real time applications.

  • A Fast Ray-Tracing Using Bounding Spheres and Frustum Rays for Dynamic Scene Rendering

    Ken-ichi SUZUKI  Yoshiyuki KAERIYAMA  Kazuhiko KOMATSU  Ryusuke EGAWA  Nobuyuki OHBA  Hiroaki KOBAYASHI  

     
    PAPER-Computer Graphics

      Vol:
    E93-D No:4
      Page(s):
    891-902

    Ray tracing is one of the most popular techniques for generating photo-realistic images. Extensive research and development work has made interactive static scene rendering realistic. This paper deals with interactive dynamic scene rendering in which not only the eye point but also the objects in the scene change their 3D locations every frame. In order to realize interactive dynamic scene rendering, RTRPS (Ray Tracing based on Ray Plane and Bounding Sphere), which utilizes the coherency in rays, objects, and grouped-rays, is introduced. RTRPS uses bounding spheres as the spatial data structure which utilizes the coherency in objects. By using bounding spheres, RTRPS can ignore the rotation of moving objects within a sphere, and shorten the update time between frames. RTRPS utilizes the coherency in rays by merging rays into a ray-plane, assuming that the secondary rays and shadow rays are shot through an aligned grid. Since a pair of ray-planes shares an original ray, the intersection for the ray can be completed using the coherency in the ray-planes. Because of the three kinds of coherency, RTRPS can significantly reduce the number of intersection tests for ray tracing. Further acceleration techniques for ray-plane-sphere and ray-triangle intersection are also presented. A parallel projection technique converts a 3D vector inner product operation into a 2D operation and reduces the number of floating point operations. Techniques based on frustum culling and binary-tree structured ray-planes optimize the order of intersection tests between ray-planes and a sphere, resulting in 50% to 90% reduction of intersection tests. Two ray-triangle intersection techniques are also introduced, which are effective when a large number of rays are packed into a ray-plane. Our performance evaluations indicate that RTRPS gives 13 to 392 times speed up in comparison with a ray tracing algorithm without organized rays and spheres. We found out that RTRPS also provides competitive performance even if only primary rays are used.

  • A System-Level Model of Design Space Exploration for a Tile-Based 3D Graphics SoC Refinement

    Liang-Bi CHEN  Chi-Tsai YEH  Hung-Yu CHEN  Ing-Jer HUANG  

     
    PAPER-Embedded, Real-Time and Reconfigurable Systems

      Vol:
    E92-A No:12
      Page(s):
    3193-3202

    3D graphics application is widely used in consumer electronics which is an inevitable tendency in the future. In general, the higher abstraction level is used to model a complex system like 3D graphics SoC. However, the concerned issue is that how to use efficient methods to traverse design space hierarchically, reduce simulation time, and refine the performance fast. This paper demonstrates a system-level design space exploration model for a tile-based 3D graphics SoC refinement. This model uses UML tools which can assist designers to traverse the whole system and reduces simulation time dramatically by adopting SystemC. As a result, the system performance is improved 198% at geometry function and 69% at rendering function, respectively.

  • Adaptive Scanline Filling Algorithm for OpenVG 2D Vector Graphics Accelerator

    Daewoong KIM  Kilhyung CHA  Soo-Ik CHAE  

     
    LETTER-Computer Graphics

      Vol:
    E92-D No:7
      Page(s):
    1500-1502

    We propose an optimized scanline filling algorithm for OpenVG two-dimensional vector graphics. For each scanline of a path, it adaptively selects a left or right scanning direction that minimizes the number of pixels visited during scanning. According to the experimental results, the proposed algorithm reduces the number of pixels visited by 6 to 37% relative to that with a constant scanning direction for all the scanlines.

1-20hit(78hit)