IEICE global.ieice.org Site

Author Search Result

[Author] Motoki AMAGASAKI(13hit)

1-13hit

Reconfigurable Neural Network Accelerator and Simulator for Model Implementation
Yasuhiro NAKAHARA Masato KIYAMA Motoki AMAGASAKI Qian ZHAO Masahiro IIDA

PAPER

Pubricized:
2021/09/21
Vol:
E105-A No:3
Page(s):
448-458
Low power consumption is important in edge artificial intelligence (AI) chips, where power supply is limited. Therefore, we propose reconfigurable neural network accelerator (ReNA), an AI chip that can process both a convolutional layer and fully connected layer with the same structure by reconfiguring the circuit. In addition, we developed tools for pre-evaluation of the performance when a deep neural network (DNN) model is implemented on ReNA. With this approach, we established the flow for the implementation of DNN models on ReNA and evaluated its power consumption. ReNA achieved 1.51TOPS/W in the convolutional layer and 1.38TOPS/W overall in a VGG16 model with a 70% pruning rate.
Fault-Tolerant FPGA: Architectures and Design for Programmable Logic Intellectual Property Core in SoC
Motoki AMAGASAKI Qian ZHAO Masahiro IIDA Morihiro KUGA Toshinori SUEYOSHI

PAPER-Architecture

Pubricized:
2014/11/19
Vol:
E98-D No:2
Page(s):
252-261
In this paper, we propose fault-tolerant field-programmable gate array (FPGA) architectures and their design framework for intellectual property (IP) cores in system-on-chip (SoC). Unlike discrete FPGAs, in which the integration scale can be made relatively large, programmable IP cores must correspond to arrays of various sizes. The key features of our architectures are a regular tile structure, spare modules and bypass wires for fault avoidance, and a configuration mechanism for single-cycle reconfiguration. In addition, we utilize routing tools, namely EasyRouter for proposed architecture. This tool can handle various array sizes corresponding to developed programmable IP cores. In this evaluation, we compared the performances of conventional FPGAs and the proposed fault-tolerant FPGA architectures. On average, our architectures have less than 1.82 times the area and 1.11 times the delay compared with traditional island-style FPGAs. At the same time, our FPGA shows a higher fault tolerant performance.
Fault-Injection Analysis to Estimate SEU Failure in Time by Using Frame-Based Partial Reconfiguration
Yoshihiro ICHINOMIYA Tsuyoshi KIMURA Motoki AMAGASAKI Morihiro KUGA Masahiro IIDA Toshinori SUEYOSHI

PAPER-High-Level Synthesis and System-Level Design

Vol:
E95-A No:12
Page(s):
2347-2356
SRAM-based field programmable gate arrays (FPGAs) are vulnerable to a soft-error induced by radiation. Techniques for designing dependable circuits, such as triple modular redundancy (TMR) with scrubbing, have been studied extensively. However, currently available evaluation techniques that can be used to check the dependability of these circuits are inadequate. Further, their results are restrictive because they do not represent the result in terms of general reliability indicator to decide whether the circuit is dependable. In this paper, we propose an evaluation method that provides results in terms of the realistic failure in time (FIT) by using reconfiguration-based fault-injection analysis. Current fault-injection analyses do not consider fault accumulation, and hence, they are not suitable for evaluating the dependability of a circuit such as a TMR circuit. Therefore, we configure an evaluation system that can handle fault-accumulation by using frame-based partial reconfiguration and the bootstrap method. By using the proposed method, we successfully evaluated a TMR circuit and could discuss the result in terms of realistic FIT data. Our method can evaluate the dependability of an actual system, and help with the tuning and selection in dependable system design.
SLM: A Scalable Logic Module Architecture with Less Configuration Memory
Motoki AMAGASAKI Ryo ARAKI Masahiro IIDA Toshinori SUEYOSHI

LETTER

Vol:
E99-A No:12
Page(s):
2500-2506
Most modern field programmable gate arrays (FPGAs) use a lookup table (LUT) as their basic logic cell. LUT resource requirements increase as O(2k) with an increasing number of inputs, k, so LUTs with more than six inputs negatively affect the overall FPGA performance. To address this problem, we propose a scalable logic module (SLM), which is a logic cell with less configuration memory, by using partial functions of the Shannon expansion for logics that appear frequently. In addition, we develop a technology mapping tool for SLM. The key feature of our tool is to combine a function decomposition process with traditional cut-based mapping. Experimental results show that an SLM-based FPGA with our mapping method uses much fewer configuration memory bits and has a smaller area than conventional LUT-based FPGAs.
A Genuine Power-Gatable Reconfigurable Logic Chip with FeRAM Cells
Masahiro IIDA Masahiro KOGA Kazuki INOUE Motoki AMAGASAKI Yoshinobu ICHIDA Mitsuro SAJI Jun IIDA Toshinori SUEYOSHI

PAPER

Vol:
E94-C No:4
Page(s):
548-556
An advantage of an RLD (reconfigurable logic device) such as an FPGA (field programmable gate array) is that it can be customized after being manufactured. Due to the aggressive technology scaling, device density is increasing, and it has become a serious problem in power consumption accordingly. In SoC of embedded systems, power gating is one of the major power reduction techniques. However, it is difficult to adopt SRAM-based RLDs because of the high overhead and SRAM being volatile. In this paper, we describe a TEG (test element group) chip of a reconfigurable logic based FeRAM (ferroelectric random access memory) technology. FeRAM brings reconfigurable logic devices the advantage of being a genuine power gater. The chip employs island-style routing architecture and uses a variable grain logic cell as a logic block. A NV-FF (non-volatile flip-flop), which contains FeRAM, a FF, and power-gating control circuits, is used as both configuration memories and FFs in a logic block. The NV-FF can transmit data between FeRAM and FF automatically when a power source is turned off/on. Thus chip-level power gating is possible. The hibernate/restore time is less than 1 ms. The chip has 1818 logic blocks and an area of 54.76 mm2.
FPGA Design Framework Combined with Commercial VLSI CAD
Qian ZHAO Kazuki INOUE Motoki AMAGASAKI Masahiro IIDA Morihiro KUGA Toshinori SUEYOSHI

PAPER-Design Methodology

Vol:
E96-D No:8
Page(s):
1602-1612
The most widely used open-source field programmable gate array (FPGA) placement and routing tool is the Versatile Packing, Placement and Routing (VPR) software developed at the University of Toronto, Canada. VPR calculates area and timing using target FPGA architecture and physical information. However, it cannot be used in FPGA IP design efficiently for two reasons. First, VPR cannot directly support most newly developed FPGA architectures, and modifying the C-coded VPR so that it can be used to evaluate a number of new architectures is time consuming. Second, the accuracy of the VPR performance results is inadequate for the evaluation of a complete FPGA IP in a design that targets the production of LSI. We propose an FPGA design framework that is focused on improving FPGA IP design efficiency. A novel FPGA routing tool is developed in this framework, namely the EasyRouter which uses the C# language. When an object-oriented programming method is used, there is less source code and it is easier to manage compared to VPR, thus shortening the development time. By using simple HDL code templates, EasyRouter can automatically generate the entire HDL code for a chip and the configuration bitstream. With these files, the FPGA IP can be evaluated with commercial VLSI CAD systems with high accuracy and reliability.
Relationship between Recognition Accuracy and Numerical Precision in Convolutional Neural Network Models
Yasuhiro NAKAHARA Masato KIYAMA Motoki AMAGASAKI Masahiro IIDA

LETTER-Computer System

Pubricized:
2020/08/13
Vol:
E103-D No:12
Page(s):
2528-2529
Quantization is an important technique for implementing convolutional neural networks on edge devices. Quantization often requires relearning, but relearning sometimes cannot be always be applied because of issues such as cost or privacy. In such cases, it is important to know the numerical precision required to maintain accuracy. We accurately simulate calculations on hardware and accurately measure the relationship between accuracy and numerical precision.
Physical Fault Detection and Recovery Methods for System-LSI Loaded FPGA-IP Core Open Access
Motoki AMAGASAKI Yuki NISHITANI Kazuki INOUE Masahiro IIDA Morihiro KUGA Toshinori SUEYOSHI

INVITED PAPER

Pubricized:
2017/01/13
Vol:
E100-D No:4
Page(s):
633-644
Fault tolerance is an important feature for the system LSIs used in reliability-critical systems. Although redundancy techniques are generally used to provide fault tolerance, these techniques have significantly hardware costs. However, FPGAs can easily provide high reliability due to their reconfiguration ability. Even if faults occur, the implemented circuit can perform correctly by reconfiguring to a fault-free region of the FPGA. In this paper, we examine an FPGA-IP core loaded in SoC and introduce a fault-tolerant technology based on fault detection and recovery as a CAD-level approach. To detect fault position, we add a route to the manufacturing test method proposed in earlier research and identify fault areas. Furthermore, we perform fault recovery at the logic tile and multiplexer levels using reconfiguration. The evaluation results for the FPGA-IP core loaded in the system LSI demonstrate that it was able to completely identify and avoid fault areas relative to the faults in the routing area.
Three Dimensional FPGA Architecture with Fewer TSVs
Motoki AMAGASAKI Masato IKEBE Qian ZHAO Masahiro IIDA Toshinori SUEYOSHI

PAPER-Device and Architecture

Pubricized:
2017/11/17
Vol:
E101-D No:2
Page(s):
278-287
Three-dimensional (3D) field-programmable gate arrays (FPGAs) are expected to offer higher logic density as well as improved delay and power performance by utilizing 3D integrated circuit technology. However, because through-silicon-vias (TSVs) for conventional 3D FPGA interlayer connections have a large area overhead, there is an inherent tradeoff between connectivity and small size. To find a balance between cost and performance, and to explore 3D FPGAs with realistic 3D integration processes, we propose two types of 3D FPGA and construct design tool sets for architecture exploration. In previous research, we created a TSV-free 3D FPGA with a face-down integration method; however, this was limited to two layers. In this paper, we discuss the face-up stacking of several face-down stacked FPGAs. To minimize the number of TSVs, we placed TSVs peripheral to the FPGAs for 3D-FPGA with 4 layers. According to our results, a 2-layer 3D FPGA has reasonable performance when limiting the design to two layers, but a 4-layer 3D FPGA is a better choice when area is emphasized.
An Easily Testable Routing Architecture and Prototype Chip
Kazuki INOUE Masahiro KOGA Motoki AMAGASAKI Masahiro IIDA Yoshinobu ICHIDA Mitsuro SAJI Jun IIDA Toshinori SUEYOSHI

PAPER-Architecture

Vol:
E95-D No:2
Page(s):
303-313
Generally, a programmable LSI such as an FPGA is difficult to test compared to an ASIC. There are two major reasons for this. The first is that an automatic test pattern generator (ATPG) cannot be used because of the programmability of the FPGA. The other reason is that the FPGA architecture is very complex. In this paper, we propose a new FPGA architecture that will simplify the testing of the device. The base of our architecture is general island-style FPGA architecture, but it consists of a few types of circuit blocks and orderly wire connections. This paper also presents efficient test configurations for our proposed architecture. We evaluated our architecture and test configurations using a prototype chip. As a result, the chip was fully tested using our configurations in a short test time. Moreover, our architecture can provide comparable performance to a conventional FPGA architecture.
An eFPGA Generation Suite with Customizable Architecture and IDE
Morihiro KUGA Qian ZHAO Yuya NAKAZATO Motoki AMAGASAKI Masahiro IIDA

PAPER

Pubricized:
2022/10/07
Vol:
E106-A No:3
Page(s):
560-574
From edge devices to cloud servers, providing optimized hardware acceleration for specific applications has become a key approach to improve the efficiency of computer systems. Traditionally, many systems employ commercial field-programmable gate arrays (FPGAs) to implement dedicated hardware accelerator as the CPU's co-processor. However, commercial FPGAs are designed in generic architectures and are provided in the form of discrete chips, which makes it difficult to meet increasingly diversified market needs, such as balancing reconfigurable hardware resources for a specific application, or to be integrated into a customer's system-on-a-chip (SoC) in the form of embedded FPGA (eFPGA). In this paper, we propose an eFPGA generation suite with customizable architecture and integrated development environment (IDE), which covers the entire eFPGA design generation, testing, and utilization stages. For the eFPGA design generation, our intellectual property (IP) generation flow can explore the optimal logic cell, routing, and array structures for given target applications. For the testability, we employ a previously proposed shipping test method that is 100% accurate at detecting all stuck-at faults in the entire FPGA-IP. In addition, we propose a user-friendly and customizable Web-based IDE framework for the generated eFPGA based on the NODE-RED development framework. In the case study, we show an eFPGA architecture exploration example for a differential privacy encryption application using the proposed suite. Then we show the implementation and evaluation of the eFPGA prototype with a 55nm test element group chip design.
Enabling FPGA-as-a-Service in the Cloud with hCODE Platform
Qian ZHAO Motoki AMAGASAKI Masahiro IIDA Morihiro KUGA Toshinori SUEYOSHI

PAPER-Design Methodology and Platform

Pubricized:
2017/11/17
Vol:
E101-D No:2
Page(s):
335-343
Major cloud service providers, including Amazon and Microsoft, have started employing field-programmable gate arrays (FPGAs) to build high-performance and low-power-consumption cloud capability. However, utilizing an FPGA-enabled cloud is still challenging because of two main reasons. First, the introduction of software and hardware co-design leads to high development complexity. Second, FPGA virtualization and accelerator scheduling techniques are not fully researched for cluster deployment. In this paper, we propose an open-source FPGA-as-a-service (FaaS) platform, the hCODE, to simplify the design, management and deployment of FPGA accelerators at cluster scale. The proposed platform implements a Shell-and-IP design pattern and an open accelerator repository to reduce design and management costs of FPGA projects. Efficient FPGA virtualization and accelerator scheduling techniques are proposed to deploy accelerators on the FPGA-enabled cluster easily. With the proposed hCODE, hardware designers and accelerator users can be organized on one platform to efficiently build open-hardware ecosystem.
COGRE: A Novel Compact Logic Cell Architecture for Area Minimization
Masahiro IIDA Motoki AMAGASAKI Yasuhiro OKAMOTO Qian ZHAO Toshinori SUEYOSHI

PAPER-Architecture

Vol:
E95-D No:2
Page(s):
294-302
Because of numerous circuit resources of FPGAs, there is a performance gap between FPGAs and ASICs. In this paper, we propose a small-memory logic cell, COGRE, to reduce the FPGA area. Our approach is to investigate the appearance ratio of the logic functions in a circuit implementation. Moreover, we group the logic functions on the basis of the NPN-equivalence class. The results of our investigation show that only small portions of the NPN-equivalence class can cover large portions of the logic functions used to implement circuits. Further, we found that NPN-equivalence classes with a high appearance ratio can be implemented by using a small number of AND gates, OR gates, and NOT gates. On the basis of this analysis, we develop COGRE architectures composed of several NAND gates and programmable inverters. The experimental results show that the logic area of 4-COGRE is smaller than that of 4-LUT and 5-LUT by approximately 35.79% and 54.70%, respectively. The logic area of 8-COGRE is 75.19% less than that of 8-LUT. Further, the total number of configuration memory bits of 4-COGRE is 8.26% less than the number of configuration memory bits of 4-LUT. The total number of configuration memory bits of 8-COGRE is 68.27% less than the number of configuration memory bits of 8-LUT.

Author Search Result

[Author] Motoki AMAGASAKI(13hit)

Reconfigurable Neural Network Accelerator and Simulator for Model Implementation

Fault-Tolerant FPGA: Architectures and Design for Programmable Logic Intellectual Property Core in SoC

Fault-Injection Analysis to Estimate SEU Failure in Time by Using Frame-Based Partial Reconfiguration

SLM: A Scalable Logic Module Architecture with Less Configuration Memory

A Genuine Power-Gatable Reconfigurable Logic Chip with FeRAM Cells

FPGA Design Framework Combined with Commercial VLSI CAD

Relationship between Recognition Accuracy and Numerical Precision in Convolutional Neural Network Models

Physical Fault Detection and Recovery Methods for System-LSI Loaded FPGA-IP Core Open Access

Three Dimensional FPGA Architecture with Fewer TSVs

An Easily Testable Routing Architecture and Prototype Chip

An eFPGA Generation Suite with Customizable Architecture and IDE

Enabling FPGA-as-a-Service in the Cloud with hCODE Platform

COGRE: A Novel Compact Logic Cell Architecture for Area Minimization

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles