IEICE global.ieice.org Site

Keyword Search Result

[Keyword] graphics(78hit)

1-20hit(78hit)

Multiparallel MMT: Faster ISD Algorithm Solving High-Dimensional Syndrome Decoding Problem
Shintaro NARISADA Kazuhide FUKUSHIMA Shinsaku KIYOMOTO

PAPER

Pubricized:
2022/11/09
Vol:
E106-A No:3
Page(s):
241-252
The hardness of the syndrome decoding problem (SDP) is the primary evidence for the security of code-based cryptosystems, which are one of the finalists in a project to standardize post-quantum cryptography conducted by the U.S. National Institute of Standards and Technology (NIST-PQC). Information set decoding (ISD) is a general term for algorithms that solve SDP efficiently. In this paper, we conducted a concrete analysis of the time complexity of the latest ISD algorithms under the limitation of memory using the syndrome decoding estimator proposed by Esser et al. As a result, we present that theoretically nonoptimal ISDs, such as May-Meurer-Thomae (MMT) and May-Ozerov, have lower time complexity than other ISDs in some actual SDP instances. Based on these facts, we further studied the possibility of multiple parallelization for these ISDs and proposed the first GPU algorithm for MMT, the multiparallel MMT algorithm. In the experiments, we show that the multiparallel MMT algorithm is faster than existing ISD algorithms. In addition, we report the first successful attempts to solve the 510-, 530-, 540- and 550-dimensional SDP instances in the Decoding Challenge contest using the multiparallel MMT.
Benchmarking Modern Edge Devices for AI Applications
Pilsung KANG Jongmin JO

PAPER-Computer System

Pubricized:
2020/12/08
Vol:
E104-D No:3
Page(s):
394-403
AI (artificial intelligence) has grown at an overwhelming speed for the last decade, to the extent that it has become one of the mainstream tools that drive the advancements in science and technology. Meanwhile, the paradigm of edge computing has emerged as one of the foremost areas in which applications using the AI technology are being most actively researched, due to its potential benefits and impact on today's widespread networked computing environments. In this paper, we evaluate two major entry-level offerings in the state-of-the-art edge device technology, which highlight increased computing power and specialized hardware support for AI applications. We perform a set of deep learning benchmarks on the devices to measure their performance. By comparing the performance with other GPU (graphics processing unit) accelerated systems in different platforms, we assess the computational capability of the modern edge devices featuring a significant amount of hardware parallelism.
Rootkit inside GPU Kernel Execution
Ohmin KWON Hyun KWON Hyunsoo YOON

LETTER-Dependable Computing

Pubricized:
2019/08/19
Vol:
E102-D No:11
Page(s):
2261-2264
We propose a rootkit installation method inside a GPU kernel execution process which works through GPU context manipulation. In GPU-based applications such as deep learning computations and cryptographic operations, the proposed method uses the feature by which the execution flow of the GPU kernel obeys the GPU context information in GPU memory. The proposed method consists of two key ideas. The first is GPU code manipulation, which is able to hijack the execution flow of the original GPU kernel to execute an injected payload without affecting the original GPU computation result. The second is a self-page-table update execution during which the GPU kernel updates its page table to access any location in system memory. After the installation, the malicious payload is executed only in the GPU kernel, and any no evidence remains in system memory. Thus, it cannot be detected by conventional rootkit detection methods.
View Priority Based Threads Allocation and Binary Search Oriented Reweight for GPU Accelerated Real-Time 3D Ball Tracking
Yilin HOU Ziwei DENG Xina CHENG Takeshi IKENAGA

PAPER-Image Recognition, Computer Vision

Pubricized:
2018/08/31
Vol:
E101-D No:12
Page(s):
3190-3198
In real-time 3D ball tracking of sports analysis in computer vision technology, complex algorithms which assure the accuracy could be time-consuming. Particle filter based algorithm has a large potential to accelerate since the algorithm between particles has the chance to be paralleled in heterogeneous CPU-GPU platform. Still, with the target multi-view 3D ball tracking algorithm, challenges exist: 1) serial flowchart for each step in the algorithm; 2) repeated processing for multiple views' processing; 3) the low degree of parallelism in reweight and resampling steps for sequential processing. On the CPU-GPU platform, this paper proposes the double stream system flow, the view priority based threads allocation, and the binary search oriented reweight. Double stream system flow assigns tasks which there is no data dependency exists into different streams for each frame processing to achieve parallelism in system structure level. View priority based threads allocation manipulates threads in multi-view observation task. Threads number is view number multiplied by particles number, and with view priority assigning, which could help both memory accessing and computing achieving parallelism. Binary search oriented reweight reduces the time complexity by avoiding to generate cumulative distribution function and uses an unordered array to implement a binary search. The experiment is based on videos which record the final game of an official volleyball match (2014 Inter-High School Games of Men's Volleyball held in Tokyo Metropolitan Gymnasium in Aug. 2014) and the test sequences are taken by multiple-view system which is made of 4 cameras locating at the four corners of the court. The success rate achieves 99.23% which is the same as target algorithm while the time consumption has been accelerated from 75.1ms/frame in CPU environment to 3.05ms/frame in the proposed system which is 24.62 times speed up, also, it achieves 2.33 times speedup compared with basic GPU implemented work.
Energy-Based Tree Illustration System: ETIS
Katsuto NAKAJIMA Azusa MAMA Yuki MORIMOTO

LETTER-Computer Graphics

Pubricized:
2016/05/25
Vol:
E99-D No:9
Page(s):
2417-2421
We propose a system named ETIS (Energy-based Tree Illustration System) for automatically generating tree illustrations characteristic of two-dimensional ones with features such as exaggerated branch curves, leaves, and flowers. The growth behavior of the trees can be controlled by adjusting the energy. The canopy shape and the region to fill with leaves and flowers are also controlled by hand-drawn guide lines.
Design and Comparison of Immersive Gesture Interfaces for HMD Based Virtual World Navigation
Bong-Soo SOHN

LETTER-Computer Graphics

Pubricized:
2016/04/05
Vol:
E99-D No:7
Page(s):
1957-1960
Mass-market head mounted displays (HMDs) are currently attracting a wide interest from consumers because they allow immersive virtual reality (VR) experiences at an affordable cost. Flying over a virtual environment is a common application of HMD. However, conventional keyboard- or mouse-based interfaces decrease the level of immersion. From this motivation, we design three types of immersive gesture interfaces (bird, superman, and hand) for the flyover navigation. A Kinect depth camera is used to recognize each gesture by extracting and analyzing user's body skeletons. We evaluate the usability of each interface through a user study. As a result, we analyze the advantages and disadvantages of each interface, and demonstrate that our gesture interfaces are preferable for obtaining a high level of immersion and fun in an HMD based VR environment.
Controlling the Simulation of Cumuliform Clouds Based on Fluid Dynamics
Tatsuki KAWAGUCHI Yoshinori DOBASHI Tsuyoshi YAMAMOTO

LETTER-Computer Graphics

Pubricized:
2015/07/24
Vol:
E98-D No:11
Page(s):
2034-2037
Controlling fluid simulation is one of the important research topics in computer graphics. In this paper, we focus on controlling the simulation of cumuliform cloud formation. Using a previously proposed method for controlling cloud simulation the convergence speed is very slow; therefore, it takes a long time before the clouds form the desired shapes. We improved the method and accelerated the convergence by introducing a new mechanism for controlling the amount of water vapor added. We demonstrate the effectiveness of the proposed method by several examples.
Contour Gradient Tree for Automatic Extraction of Salient Object Surfaces from 3D Imaging Data
Bong-Soo SOHN

LETTER-Computer Graphics

Pubricized:
2015/07/31
Vol:
E98-D No:11
Page(s):
2038-2042
Isosurface extraction is one of the most popular techniques for visualizing scalar volume data. However, volume data contains infinitely many isosurfaces. Furthermore, a single isosurface might contain many connected components, or contours, with each representing a different object surface. Hence, it is often a tedious and time-consuming manual process to find and extract contours that are interesting to users. This paper describes a novel method for automatically extracting salient contours from volume data. For this purpose, we propose a contour gradient tree (CGT) that contains the information of salient contours and their saliency magnitude. We organize the CGT in a hierarchical way to generate a sequence of contours in saliency order. Our method was applied to various medical datasets. Experimental results show that our method can automatically extract salient contours that represent regions of interest in the data.
Parallelization of Dynamic Time Warping on a Heterogeneous Platform
Yao ZHENG Limin XIAO Wenqi TANG Lihong SHANG Guangchao YAO Li RUAN

LETTER-Algorithms and Data Structures

Vol:
E97-A No:11
Page(s):
2258-2262
The dynamic time warping (DTW) algorithm is widely used to determine time series similarity search. As DTW has quadratic time complexity, the time taken for similarity search is the bottleneck for virtually all time series data mining algorithms. In this paper, we present a parallel approach for DTW on a heterogeneous platform with a graphics processing unit (GPU). In order to exploit fine-grained data-level parallelism, we propose a specific parallel decomposition in DTW. Furthermore, we introduce an optimization technique called diamond tiling to improve the utilization of threads. Results show that our approach substantially reduces computational time.
Efficient Screen Space Anisotropic Blurred Soft Shadows
Zhongxiang ZHENG Suguru SAITO

PAPER-Rendering

Vol:
E97-D No:8
Page(s):
2038-2045
Shadow mapping is an efficient method to generate shadows in real time computer graphics and has broad variations from hard to soft shadow synthesis. Soft shadowing based on shadow mapping is a blurring technique on a shadow map or on screen space. Blurring on screen space has an advantage for efficient sampling on a shadow map, since the blurred target array has exactly the same coordinates as the screen. However, a previous blurring method on screen space has a drawback: the generated shadow is not correct when a view direction has a large angle to the normal of the shadowed plane. In this paper, we introduce a new screen space based method for soft shadowing that is fast and generates soft shadows more accurately than the previous screen space soft shadow mapping method. The resultant images show shadows produced by our method just stand in the same place, while shadows by the previous method change in terms of penumbra while the view moves. Surprisingly, although our method is more complex than the previous method, the measurement results of the calculation time show our method is almost the same performance. This is because it controls the blurring area more accurately and thus successfully reduces multiplications for blurring.
Throughput and Power Efficiency Evaluation of Block Ciphers on Kepler and GCN GPUs Using Micro-Benchmark Analysis
Naoki NISHIKAWA Keisuke IWAI Hidema TANAKA Takakazu KUROKAWA

PAPER-Fundamentals of Information Systems

Vol:
E97-D No:6
Page(s):
1506-1515
Computer systems with GPUs are expected to become a strong methodology for high-speed encryption processing. Moreover, power consumption has remained a primary deterrent for such processing on devices of all sizes. However, GPU vendors are currently announcing their future roadmaps of GPU architecture development: Nvidia Corp. promotes the Kepler architecture and AMD Corp. emphasizes the GCN architecture. Therefore, we evaluated throughput and power efficiency of three 128-bit block ciphers on GPUs with recent Nvidia Kepler and AMD GCN architectures. From our experiments, whereas the throughput and per-watt throughput of AES-128 on Radeon HD 7970 (2048 cores) with GCN architecture are 205.0Gbps and 1.3Gbps/Watt respectively, those on Geforce GTX 680 (1536 cores) with Kepler architecture are, respectively, 63.9Gbps and 0.43Gbps/W; an approximately 3.2 times throughput difference occurs between AES-128 on the two GPUs. Next, we investigate the reasons for the throughput difference using our micro-benchmark suites. According to the results, we speculate that to ameliorate Kepler GPUs as co-processor of block ciphers, the arithmetic and logical instructions must be improved in terms of software and hardware.
Fast Density-Based Clustering Using Graphics Processing Units
Woong-Kee LOH Yang-Sae MOON Young-Ho PARK

LETTER-Artificial Intelligence, Data Mining

Vol:
E97-D No:5
Page(s):
1349-1352
- HTML
- PDF(1MB) >> Buy this Article
- Errata[Uploaded on July 1,2014]
Due to the recent technical advances, GPUs are used for general applications as well as screen display. Many research results have been proposed to the performance of previous CPU-based algorithms by a few hundred times using the GPUs. In this paper, we propose a density-based clustering algorithm called GSCAN, which reduces the number of unnecessary distance computations using a grid structure. As a result of our experiments, GSCAN outperformed CUDA-DClust [2] and DBSCAN [3] by up to 13.9 and 32.6 times, respectively.
A Line Smoothing Method of Hand-Drawn Strokes Using Adaptive Moving Average for Illustration Tracing Tasks
Hotaka KAWASE Mikio SHINYA Michio SHIRAISHI

PAPER-Computer Graphics

Vol:
E95-D No:11
Page(s):
2704-2709
There are many web sites where net users can post and distribute their illustration images. A typical way to draw a digital illustration is first to draw rough lines on a paper and then to trace the lines on a graphics-tablet by hand. The input lines usually contain fluctuation due to hand-drawing, which limits the quality of illustration. Therefore, it is important to remove the fluctuation and to smooth the lines while maintaining sharp features such as corners. Although naive applications of moving average filters can smooth input lines, they may cause over-smoothing artifacts in which sharp features are lost by the filtering. This paper describes an improved line smoothing method using adaptive moving averages, which smoothes input lines while keeping high curvature points. The proposed method evaluates curvatures of input lines and adaptively controls the filter-size to reduce the over-smoothing artifacts. Experiments demonstrated advantages of the proposed method over the previous method in terms of achieving smoothing effect while still preserving sharp feature preservation.
OpenGL SC Implementation on the OpenGL Hardware
Nakhoon BAEK Hwanyong LEE

LETTER-Computer Graphics

Vol:
E95-D No:10
Page(s):
2589-2592
The need for the OpenGL-family of the 3D rendering API's are highly increasing, especially for graphical human-machine interfaces on various systems. In the case of safety-critical market for avionics, military, medical and automotive applications, OpenGL SC, the safety critical profile of the OpenGL standard plays the major role for graphical interfaces. In this paper, we present an efficient way of implementing OpenGL SC 3D graphics API for the environments with hardware-supported OpenGL 1.1 and its multi-texture extension facility, which is widely available on recent embedded systems. Our approach achieved the OpenGL SC features at the low development cost on the embedded systems and also on general personal computers. Our final result shows its compliance with the OpenGL SC standard specification. From the efficiency point of view, we measured its execution times for various application programs, to show a remarkable speed-up.
Design of an OpenVG Hardware Rendering Engine
Yong-Luo SHEN Seok-Jae KIM Sang-Woo SEO Hyun-Goo LEE Hyeong-Cheol OH

PAPER-Computer System

Vol:
E94-D No:12
Page(s):
2409-2417
This paper introduces a hardware engine for rendering two-dimensional vector graphics based on the OpenVG standard in portable devices. We focus on two design challenges posed by the rendering engines: the number of vertices to represent the images and the amount of memory usage. Redundant vertices are eliminated using adaptive tessellation, in which the redundancy can be judged using a proposed cost-per-quality measure. A simplified edge-flag rendering algorithm and the scanline-based rendering scheme are adopted to reduce external memory access. The designed rendering engine occupies approximately 173 K gates and can satisfy real-time requirements of many applications when it is implemented using a 0.18 µm, 1.8 V CMOS standard cell library. An FPGA prototype using a system-on-a-chip platform has been developed and tested.
NUFFT- & GPU-Based Fast Imaging of Vegetation
Amedeo CAPOZZOLI Claudio CURCIO Antonio DI VICO Angelo LISENO

PAPER-Sensing

Vol:
E94-B No:7
Page(s):
2092-2103
We develop an effective algorithm, based on the filtered backprojection (FBP) approach, for the imaging of vegetation. Under the FBP scheme, the reconstruction amounts at a non-trivial Fourier inversion, since the data are Fourier samples arranged on a non-Cartesian grid. The computational issue is efficiently tackled by Non-Uniform Fast Fourier Transforms (NUFFTs), whose complexity grows asymptotically as that of a standard FFT. Furthermore, significant speed-ups, as compared to fast CPU implementations, are obtained by a parallel versions of the NUFFT algorithm, purposely designed to be run on Graphic Processing Units (GPUs) by using the CUDA language. The performance of the parallel algorithm has been assessed in comparison to a CPU-multicore accelerated, Matlab implementation of the same routine, to other CPU-multicore accelerated implementations based on standard FFT and employing linear, cubic, spline and sinc interpolations and to a different, parallel algorithm exploiting a parallel linear interpolation stage. The proposed approach has resulted the most computationally convenient. Furthermore, an indoor, polarimetric experimental setup is developed, capable to isolate and introduce, one at a time, different non-idealities of a real acquisition, as the sources (wind, rain) of temporal decorrelation. Experimental far-field polarimetric measurements on a thuja plicata (western redcedar) tree point out the performance of the set up algorithm, its robustness against data truncation and temporal decorrelation as well as the possibility of discriminating scatterers with different features within the investigated scene.
Accurate Human Detection by Appearance and Motion
Shaopeng TANG Satoshi GOTO

PAPER

Vol:
E93-D No:10
Page(s):
2728-2736
In this paper, a human detection method is developed. An appearance based detector and a motion based detector are proposed respectively. A multi scale block histogram of template feature (MB-HOT) is used to detect human by the appearance. It integrates the gray value information and the gradient value information, and represents the relationship of three blocks. Experiment on INRIA dataset shows that this feature is more discriminative than other features, such as histogram of orientation gradient (HOG). A motion based feature is also proposed to capture the relative motion of human body. This feature is calculated in optical flow domain and experimental result in our dataset shows that this feature outperforms other motion based features. The detection responses obtained by two features are combined to reduce the false detection. Graphic process unit (GPU) based implementation is proposed to accelerate the calculation of two features, and make it suitable for real time applications.
A Fast Ray-Tracing Using Bounding Spheres and Frustum Rays for Dynamic Scene Rendering
Ken-ichi SUZUKI Yoshiyuki KAERIYAMA Kazuhiko KOMATSU Ryusuke EGAWA Nobuyuki OHBA Hiroaki KOBAYASHI

PAPER-Computer Graphics

Vol:
E93-D No:4
Page(s):
891-902
Ray tracing is one of the most popular techniques for generating photo-realistic images. Extensive research and development work has made interactive static scene rendering realistic. This paper deals with interactive dynamic scene rendering in which not only the eye point but also the objects in the scene change their 3D locations every frame. In order to realize interactive dynamic scene rendering, RTRPS (Ray Tracing based on Ray Plane and Bounding Sphere), which utilizes the coherency in rays, objects, and grouped-rays, is introduced. RTRPS uses bounding spheres as the spatial data structure which utilizes the coherency in objects. By using bounding spheres, RTRPS can ignore the rotation of moving objects within a sphere, and shorten the update time between frames. RTRPS utilizes the coherency in rays by merging rays into a ray-plane, assuming that the secondary rays and shadow rays are shot through an aligned grid. Since a pair of ray-planes shares an original ray, the intersection for the ray can be completed using the coherency in the ray-planes. Because of the three kinds of coherency, RTRPS can significantly reduce the number of intersection tests for ray tracing. Further acceleration techniques for ray-plane-sphere and ray-triangle intersection are also presented. A parallel projection technique converts a 3D vector inner product operation into a 2D operation and reduces the number of floating point operations. Techniques based on frustum culling and binary-tree structured ray-planes optimize the order of intersection tests between ray-planes and a sphere, resulting in 50% to 90% reduction of intersection tests. Two ray-triangle intersection techniques are also introduced, which are effective when a large number of rays are packed into a ray-plane. Our performance evaluations indicate that RTRPS gives 13 to 392 times speed up in comparison with a ray tracing algorithm without organized rays and spheres. We found out that RTRPS also provides competitive performance even if only primary rays are used.
A System-Level Model of Design Space Exploration for a Tile-Based 3D Graphics SoC Refinement
Liang-Bi CHEN Chi-Tsai YEH Hung-Yu CHEN Ing-Jer HUANG

PAPER-Embedded, Real-Time and Reconfigurable Systems

Vol:
E92-A No:12
Page(s):
3193-3202
3D graphics application is widely used in consumer electronics which is an inevitable tendency in the future. In general, the higher abstraction level is used to model a complex system like 3D graphics SoC. However, the concerned issue is that how to use efficient methods to traverse design space hierarchically, reduce simulation time, and refine the performance fast. This paper demonstrates a system-level design space exploration model for a tile-based 3D graphics SoC refinement. This model uses UML tools which can assist designers to traverse the whole system and reduces simulation time dramatically by adopting SystemC. As a result, the system performance is improved 198% at geometry function and 69% at rendering function, respectively.
Adaptive Scanline Filling Algorithm for OpenVG 2D Vector Graphics Accelerator
Daewoong KIM Kilhyung CHA Soo-Ik CHAE

LETTER-Computer Graphics

Vol:
E92-D No:7
Page(s):
1500-1502
We propose an optimized scanline filling algorithm for OpenVG two-dimensional vector graphics. For each scanline of a path, it adaptively selects a left or right scanning direction that minimizes the number of pixels visited during scanning. According to the experimental results, the proposed algorithm reduces the number of pixels visited by 6 to 37% relative to that with a constant scanning direction for all the scanlines.

1-20hit(78hit)

Keyword Search Result

[Keyword] graphics(78hit)

Multiparallel MMT: Faster ISD Algorithm Solving High-Dimensional Syndrome Decoding Problem

Benchmarking Modern Edge Devices for AI Applications

Rootkit inside GPU Kernel Execution

View Priority Based Threads Allocation and Binary Search Oriented Reweight for GPU Accelerated Real-Time 3D Ball Tracking

Energy-Based Tree Illustration System: ETIS

Design and Comparison of Immersive Gesture Interfaces for HMD Based Virtual World Navigation

Controlling the Simulation of Cumuliform Clouds Based on Fluid Dynamics

Contour Gradient Tree for Automatic Extraction of Salient Object Surfaces from 3D Imaging Data

Parallelization of Dynamic Time Warping on a Heterogeneous Platform

Efficient Screen Space Anisotropic Blurred Soft Shadows

Throughput and Power Efficiency Evaluation of Block Ciphers on Kepler and GCN GPUs Using Micro-Benchmark Analysis

Fast Density-Based Clustering Using Graphics Processing Units

A Line Smoothing Method of Hand-Drawn Strokes Using Adaptive Moving Average for Illustration Tracing Tasks

OpenGL SC Implementation on the OpenGL Hardware

Design of an OpenVG Hardware Rendering Engine

NUFFT- & GPU-Based Fast Imaging of Vegetation

Accurate Human Detection by Appearance and Motion

A Fast Ray-Tracing Using Bounding Spheres and Frustum Rays for Dynamic Scene Rendering

A System-Level Model of Design Space Exploration for a Tile-Based 3D Graphics SoC Refinement

Adaptive Scanline Filling Algorithm for OpenVG 2D Vector Graphics Accelerator

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles