The search functionality is under construction.
The search functionality is under construction.

Author Search Result

[Author] Kohei YAMAMOTO(3hit)

1-3hit
  • In Search of the Performance- and Energy-Efficient CNN Accelerators Open Access

    Stanislav SEDUKHIN  Yoichi TOMIOKA  Kohei YAMAMOTO  

     
    PAPER

      Pubricized:
    2021/12/03
      Vol:
    E105-C No:6
      Page(s):
    209-221

    In this paper, starting from the algorithm, a performance- and energy-efficient 3D structure or shape of the Tensor Processing Engine (TPE) for CNN acceleration is systematically searched and evaluated. An optimal accelerator's shape maximizes the number of concurrent MAC operations per clock cycle while minimizes the number of redundant operations. The proposed 3D vector-parallel TPE architecture with an optimal shape can be very efficiently used for considerable CNN acceleration. Due to implemented support of inter-block image data independency, it is possible to use multiple of such TPEs for the additional CNN acceleration. Moreover, it is shown that the proposed TPE can also be uniformly used for acceleration of the different CNN models such as VGG, ResNet, YOLO, and SSD. We also demonstrate that our theoretical efficiency analysis is matched with the result of a real implementation for an SSD model to which a state-of-the-art channel pruning technique is applied.

  • Local Memory Mapping of Multicore Processors on an Automatic Parallelizing Compiler

    Yoshitake OKI  Yuto ABE  Kazuki YAMAMOTO  Kohei YAMAMOTO  Tomoya SHIRAKAWA  Akimasa YOSHIDA  Keiji KIMURA  Hironori KASAHARA  

     
    PAPER

      Vol:
    E103-C No:3
      Page(s):
    98-109

    Utilization of local memory from real-time embedded systems to high performance systems with multi-core processors has become an important factor for satisfying hard deadline constraints. However, challenges lie in the area of efficiently managing the memory hierarchy, such as decomposing large data into small blocks to fit onto local memory and transferring blocks for reuse and replacement. To address this issue, this paper presents a compiler optimization method that automatically manage local memory of multi-core processors. The method selects and maps multi-dimensional data onto software specified memory blocks called Adjustable Blocks. These blocks are hierarchically divisible with varying sizes defined by the features of the input application. Moreover, the method introduces mapping structures called Template Arrays to maintain the indices of the decomposed multi-dimensional data. The proposed work is implemented on the OSCAR automatic parallelizing compiler and evaluations were performed on the Renesas RP2 8-core processor. Experimental results from NAS Parallel Benchmark, SPEC benchmark, and multimedia applications show the effectiveness of the method, obtaining maximum speed-ups of 20.44 with 8 cores utilizing local memory from single core sequential versions that use off-chip memory.

  • Complex-Valued Fully Convolutional Networks for MIMO Radar Signal Segmentation

    Motoko TACHIBANA  Kohei YAMAMOTO  Kurato MAENO  

     
    LETTER-Pattern Recognition

      Pubricized:
    2018/02/20
      Vol:
    E101-D No:5
      Page(s):
    1445-1448

    Radar is expected in advanced driver-assistance systems for environmentally robust measurements. In this paper, we propose a novel radar signal segmentation method by using a complex-valued fully convolutional network (CvFCN) that comprises complex-valued layers, real-valued layers, and a bidirectional conversion layer between them. We also propose an efficient automatic annotation system for dataset generation. We apply the CvFCN to two-dimensional (2D) complex-valued radar signal maps (r-maps) that comprise angle and distance axes. An r-maps is a 2D complex-valued matrix that is generated from raw radar signals by 2D Fourier transformation. We annotate the r-maps automatically using LiDAR measurements. In our experiment, we semantically segment r-map signals into pedestrian and background regions, achieving accuracy of 99.7% for the background and 96.2% for pedestrians.