The search functionality is under construction.

Author Search Result

[Author] Yuichiro SHIBATA(13hit)

1-13hit
  • Implementation of Data Driven Applications on a Multi-Context Reconfigurable Device

    Masaki UNO  Yuichiro SHIBATA  Hideharu AMANO  

     
    PAPER

      Vol:
    E86-D No:5
      Page(s):
    841-849

    WASMII is a virtual hardware system that executes dataflow algorithms using a dynamically reconfigurable multi-context device with a data driven control mechanism. Although the effectiveness of the system has been evaluated through simulations and using an emulator, implementation of WASMII was infeasible due to the unavailability of such a device. However, the first prototype of a practical dynamically reconfigurable multi-context device called DRL has been developed by NEC, and we developed a reconfigurable test bed using four sample DRL chips. On this board, we have implemented and executed some simple applications of WASMII mechanism. Evaluation results show that the performance of the parallel implementation of WASMII is almost twice as that of a PC with a CPU based on the corresponding technology.

  • Flexible Load-Dependent Soft-Start Method for Digital PID Control DC-DC Converter in 380Vdc System

    Hidenori MARUTA  Tsutomu SAKAI  Suguru SAGARA  Yuichiro SHIBATA  Keiichi HIROSE  Fujio KUROKAWA  

     
    PAPER-Energy in Electronics Communications

      Pubricized:
    2016/10/17
      Vol:
    E100-B No:4
      Page(s):
    518-528

    The purpose of this paper is to propose a flexible load-dependent digital soft-start control method for dc-dc converters in a 380Vdc system. The soft-start operation is needed to prevent negative effects such as large inrush current and output overshoot to a power supply in the start-up process of dc-dc converters. In the conventional soft-start operation, a dc-dc converter has a very slow start-up to deal with the light load condition. Therefore, it always takes a long time in any load condition to start up a power supply and obtain the desired output. In the proposed soft-start control method, the speed of the start-up process is flexibly controlled depending on the load condition. To obtain the optimal speed for any load condition, the speed of the soft-start is determined from a approximated function of load current, which is estimated from experiment results in advance. The proposed soft-start control method is evaluated both in simulations and experiments. From results, it is confirmed that the proposed method has superior soft-start characteristics compared to the conventional one.

  • A Hardware Oriented Approximate Convex Hull Algorithm and its FPGA Implementation Open Access

    Tatsuma MORI  Taito MANABE  Yuichiro SHIBATA  

     
    PAPER

      Pubricized:
    2021/09/02
      Vol:
    E105-A No:3
      Page(s):
    459-467

    The convex hull is the minimum convex surrounding a given set of points. Since the process of finding convex hulls has various practical application fields including embedded real-time systems, efficient acceleration of convex hull algorithms is an important problem in computer geometry. In this paper, we discuss an FPGA acceleration approach to address this problem. In order to compute the convex hull of an unsorted point set, it is necessary to store all the points during the computation, and thus the capacity of a on-chip memory is likely to be a major constraint for efficient FPGA implementation. On the other hand, approximate convex hulls are often sufficient for practical applications. Therefore, we propose a hardware oriented approximate convex hull algorithm, which can process the input points as a stream without storing all the points in the memory. We also propose some computation reduction techniques for efficient FPGA implementation. Then, we present FPGA implementation of the proposed algorithm, which is parallelized both in temporal and spatial domains, and evaluate its effectiveness in terms of performance and accuracy. As a result, we demonstrated 11 to 30 times faster performance compared to the widely-used convex hull software library Qhull. In addition, accuracy assessment revealed that the maximum approximation error normalized to the diameters of point sets was 0.038%, which was reasonably small for practical use cases.

  • FPGA Implementation and Evaluation of a Real-Time Image-Based Vibration Detection System with Adaptive Filtering

    Taito MANABE  Kazuya UETSUHARA  Akane TAHARA  Yuichiro SHIBATA  

     
    PAPER

      Vol:
    E103-A No:12
      Page(s):
    1472-1480

    This paper shows design and implementation of an image-based vibration detection system on a field-programmable gate array (FPGA), aiming at application to tremor suppression for microsurgery assistance systems. The system can extract a vibration component within a user-specified frequency band from moving images in real-time. For fast and robust detection, we employ a statistical approach using dense optical flow to derive vibration component, and design a custom hardware based on the Lucas-Kanade (LK) method to compute optical flow. And for band-pass filtering without phase delay, we implement the band-limited multiple Fourier linear combiner (BMFLC), a sort of adaptive band-pass filter which can recompose an input signal as a mixture of sinusoidal signals with multiple frequencies within the specified band, with no phase delay. The whole system is implemented as a deep pipeline on a Xilinx Kintex-7 XC7K325T FPGA without using any external memory. We employ fixed-point arithmetic to reduce resource utilization while maintaining accuracy close to double-precision floating-point arithmetic. Empirical experiments reveal that the proposed system extracts a high-frequency tremor component from hand motions, with intentional low-frequency motions successfully filtered out. The system can process VGA moving images at 60fps, with a delay of less than 1 µs for the BMFLC, suggesting effectiveness of the deep pipelined architecture. In addition, we are planning to integrate a CNN-based segmentation system for improving detection accuracy, and show preliminary software evaluation results.

  • Implementation of a GPU-Oriented Absorbing Boundary Condition for 3D-FDTD Electromagnetic Simulation

    Keisuke DOHI  Yuichiro SHIBATA  Kiyoshi OGURI  Takafumi FUJIMOTO  

     
    PAPER-Parallel and Distributed Computing

      Vol:
    E95-D No:12
      Page(s):
    2787-2795

    In this paper, we propose and discuss efficient GPU implementation techniques of absorbing boundary conditions (ABCs) for a 3D finite-difference time-domain (FDTD) electromagnetic field simulation for antenna design. In view of architectural nature of GPUs, the idea of a periodic boundary condition is introduced to implementation of perfect matched layers (PMLs) as well as a transformation technique of PML equations for partial boundaries. We also present efficient implementation method of a non-uniform grid. The evaluation results with a typical simulation model reveal that our proposed technique almost double the simulation performance and eventually achieve the 55.8% of the peak memory bandwidth of a target GPU.

  • FPGA Implementation of Human Detection by HOG Features with AdaBoost

    Keisuke DOHI  Kazuhiro NEGI  Yuichiro SHIBATA  Kiyoshi OGURI  

     
    PAPER-Application

      Vol:
    E96-D No:8
      Page(s):
    1676-1684

    We implement external memory-free deep pipelined FPGA implementation including HOG feature extraction and AdaBoost classification. To construct our design by compact FPGA, we introduce some simplifications of the algorithm and aggressive use of stream oriented architectures. We present comparison results between our simplified fixed-point scheme and an original floating-point scheme in terms of quality of results, and the results suggest the negative impact of the simplified scheme for hardware implementation is limited. We empirically show that, our system is able to detect human from 640480 VGA images at up to 112 FPS on a Xilinx Virtex-5 XC5VLX50 FPGA.

  • Performance Modeling of Stencil Computing on a Stream-Based FPGA Accelerator for Efficient Design Space Exploration

    Keisuke DOHI  Koji OKINA  Rie SOEJIMA  Yuichiro SHIBATA  Kiyoshi OGURI  

     
    PAPER-Application

      Pubricized:
    2014/11/19
      Vol:
    E98-D No:2
      Page(s):
    298-308

    In this paper, we discuss performance modeling of 3-D stencil computing on an FPGA accelerator with a high-level synthesis environment, aiming for efficient exploration of user-space design parameters. First, we analyze resource utilization and performance to formulate these relationships as mathematical models. Then, in order to evaluate our proposed models, we implement heat conduction simulations as a benchmark application, by using MaxCompiler, which is a high-level synthesis tool for FPGAs, and MaxGenFD, which is a domain specific framework of the MaxCompiler for finite-difference equation solvers. The experimental results with various settings of architectural design parameters show the best combination of design parameters for pipeline structure can be systematically found by using our models. The effects of changing arithmetic accuracy and using data stream compression are also discussed.

  • FPGA Implementation of a Real-Time Super-Resolution System Using Flips and an RNS-Based CNN

    Taito MANABE  Yuichiro SHIBATA  Kiyoshi OGURI  

     
    PAPER

      Vol:
    E101-A No:12
      Page(s):
    2280-2289

    The super-resolution technology is one of the solutions to fill the gap between high-resolution displays and lower-resolution images. There are various algorithms to interpolate the lost information, one of which is using a convolutional neural network (CNN). This paper shows an FPGA implementation and a performance evaluation of a novel CNN-based super-resolution system, which can process moving images in real time. We apply horizontal and vertical flips to input images instead of enlargement. This flip method prevents information loss and enables the network to make the best use of its patch size. In addition, we adopted the residue number system (RNS) in the network to reduce FPGA resource utilization. Efficient multiplication and addition with LUTs increased a network scale that can be implemented on the same FPGA by approximately 54% compared to an implementation with fixed-point operations. The proposed system can perform super-resolution from 960×540 to 1920×1080 at 60fps with a latency of less than 1ms. Despite resource restriction of the FPGA, the system can generate clear super-resolution images with smooth edges. The evaluation results also revealed the superior quality in terms of the peak signal-to-noise ratio (PSNR) and the structural similarity (SSIM) index, compared to systems with other methods.

  • Real-Time Image-Based Vibration Extraction with Memory-Efficient Optical Flow and Block-Based Adaptive Filter

    Taito MANABE  Yuichiro SHIBATA  

     
    PAPER

      Pubricized:
    2022/09/05
      Vol:
    E106-A No:3
      Page(s):
    504-513

    In this paper, we propose a real-time vibration extraction system, which extracts vibration component within a given frequency range from videos in real time, for realizing tremor suppression used in microsurgery assistance systems. To overcome the problems in our previous system based on the mean Lucas-Kanade (LK) optical flow of the whole frame, we have introduced a new architecture combining dense optical flow calculated with simple feature matching and block-based band-pass filtering using band-limited multiple Fourier linear combiner (BMFLC). As a feature of optical flow calculation, we use the simplified rotation-invariant histogram of oriented gradients (RIHOG) based on a gradient angle quantized to 1, 2, or 3 bits, which greatly reduces the usage of memory resources for a frame buffer. An obtained optical flow map is then divided into multiple blocks, and BMFLC is applied to the mean optical flow of each block independently. By using the L1-norm of adaptive weight vectors in BMFLC as a criterion, blocks belonging to vibrating objects can be isolated from background at low cost, leading to better extraction accuracy compared to the previous system. The whole system for 480p and 720p resolutions can be implemented on a single Xilinx Zynq-7000 XC7Z020 FPGA without any external memory, and can process a video stream supplied directly from a camera at 60fps.

  • Pipelined ADPCM Compression for HDR Synthesis on an FPGA

    Masahiro NISHIMURA  Taito MANABE  Yuichiro SHIBATA  

     
    PAPER-VLSI Design Technology and CAD

      Pubricized:
    2023/08/31
      Vol:
    E107-A No:3
      Page(s):
    531-539

    This paper presents an FPGA implementation of real-time high dynamic range (HDR) synthesis, which expresses a wide dynamic range by combining multiple images with different exposures using image pyramids. We have implemented a pipeline that performs streaming processing on images without using external memory. However, implementation for high-resolution images has been difficult due to large memory usage for line buffers. Therefore, we propose an image compression algorithm based on adaptive differential pulse code modulation (ADPCM). Compression modules based on the algorithm can be easily integrated into the pipeline. When the image resolution is 4K and the pyramid depth is 7, memory usage can be halved from 168.48% to 84.32% by introducing the compression modules, resulting in better quality.

  • Evaluation and Comparison of Implementation Alternatives for Look-up Tables for Plastic Cell Architecture

    Jun'ichiro TAKEMOTO  Toshihiro GOTO  Yuichiro SHIBATA  Kiyoshi OGURI  

     
    PAPER

      Vol:
    E86-D No:5
      Page(s):
    850-858

    In this paper, the efficient structure of an LUT (look-up table) for an asynchronous reconfigurable PCA (Plastic Cell Architecture) device is investigated. A total of 15 types of implementation alternatives for LUTs are evaluated and compared in an empirical manner in which full custom layout design is developed and simulated. The evaluation results show that by introducing transmission gates in memory cells in an LUT, read time can be improved by 14.3% at the cost of 13.6% area increase compared to a conventional speed oriented implementation. It is also shown that use of transmission gates reduces 6.4% of area and 19.2% of read time against a conventional area oriented LUT implementation.

  • Comparative Evaluation of FPGA Implementation Alternatives for Real-Time Robust Ellipse Estimation based on RANSAC Algorithm

    Theint Theint THU  Jimpei HAMAMURA  Rie SOEJIMA  Yuichiro SHIBATA  Kiyoshi OGURI  

     
    PAPER

      Vol:
    E100-A No:7
      Page(s):
    1409-1417

    Field Programmable Gate Array (FPGA) based robust model fitting enjoys immense popularity in image processing because of its high efficiency. This paper focuses on the tradeoff analysis of real-time FPGA implementation of robust circle and ellipse estimations based on the random sample consensus (RANSAC) algorithm, which estimates parameters of a statistical model from a data set of feature points which contains outliers. In particular, this paper mainly highlights implementation alternatives for solvers of simultaneous equations and compares Gauss-Jordan elimination and Cramer's rule by changing matrix size and arithmetic processes. Experimental evaluation shows a Cramer's rule approach coupled with long integer arithmetic can reduce most hardware resources without unacceptable degradation of estimation accuracy compared to floating point versions.

  • FPGA Implementation of a Stream-Based Real-Time Hardware Line Segment Detector

    Taito MANABE  Taichi KATAYAMA  Yuichiro SHIBATA  

     
    PAPER

      Pubricized:
    2021/09/02
      Vol:
    E105-A No:3
      Page(s):
    468-477

    Line detection is the fundamental image processing technique which has various applications in the field of computer vision. For example, lane keeping required to realize autonomous vehicles can be implemented based on line detection technique. For such purposes, however, low detection latency and power consumption are essential. Using hardware-based stream processing is considered as an effective way to achieve such properties since it eliminates the need of storing the whole frame into energy-consuming external memory. In addition, adopting FPGAs enables us to keep flexibility of software processing. The line segment detector (LSD) is the algorithm based on intensity gradient, and performs better than the well-known Hough transform in terms of processing speed and accuracy. However, implementing the original LSD on FPGAs as a pipeline structure is difficult mainly because of its iterative region growing approach. Therefore, we propose a simple and stream-friendly line segment detection algorithm based on the concept of LSD. The whole system is implemented on a Xilinx Zynq-7000 XC7Z020-1CLG400C FPGA without any external memory. Evaluation results reveal that the implemented system is able to detect line segments successfully and is compact with 7.5% of Block RAM and less than 7.0% of the other resources used, while maintaining 60 fps throughput for VGA videos. It is also shown that the system is power-efficient compared to software processing on CPUs.