1-16hit |
Motoki AMAGASAKI Masato IKEBE Qian ZHAO Masahiro IIDA Toshinori SUEYOSHI
Three-dimensional (3D) field-programmable gate arrays (FPGAs) are expected to offer higher logic density as well as improved delay and power performance by utilizing 3D integrated circuit technology. However, because through-silicon-vias (TSVs) for conventional 3D FPGA interlayer connections have a large area overhead, there is an inherent tradeoff between connectivity and small size. To find a balance between cost and performance, and to explore 3D FPGAs with realistic 3D integration processes, we propose two types of 3D FPGA and construct design tool sets for architecture exploration. In previous research, we created a TSV-free 3D FPGA with a face-down integration method; however, this was limited to two layers. In this paper, we discuss the face-up stacking of several face-down stacked FPGAs. To minimize the number of TSVs, we placed TSVs peripheral to the FPGAs for 3D-FPGA with 4 layers. According to our results, a 2-layer 3D FPGA has reasonable performance when limiting the design to two layers, but a 4-layer 3D FPGA is a better choice when area is emphasized.
Kazuki INOUE Masahiro KOGA Motoki AMAGASAKI Masahiro IIDA Yoshinobu ICHIDA Mitsuro SAJI Jun IIDA Toshinori SUEYOSHI
Generally, a programmable LSI such as an FPGA is difficult to test compared to an ASIC. There are two major reasons for this. The first is that an automatic test pattern generator (ATPG) cannot be used because of the programmability of the FPGA. The other reason is that the FPGA architecture is very complex. In this paper, we propose a new FPGA architecture that will simplify the testing of the device. The base of our architecture is general island-style FPGA architecture, but it consists of a few types of circuit blocks and orderly wire connections. This paper also presents efficient test configurations for our proposed architecture. We evaluated our architecture and test configurations using a prototype chip. As a result, the chip was fully tested using our configurations in a short test time. Moreover, our architecture can provide comparable performance to a conventional FPGA architecture.
Yoshimasa OHNISHI Yoshinari SUGIMOTO Toshinori SUEYOSHI
We conducted research and development of Distributed Supercomputing Environment (DSE) based on distributed shared memory model to serve as a cluster computing environment to provide parallel processing facilities. Shared memory model and message passing model are well-known typical models of parallel processing. It is desired that hybrid programming environment will make the best use of the prominent features of both models. Consequently, we add a new message passing mechanism to present DSE, and create a prototype called Hybrid DSE as a hybrid model based cluster computing environment. In this paper, we describe the implementation of a message passing mechanism on DSE and performance evaluation of Hybrid DSE.
Qian ZHAO Motoki AMAGASAKI Masahiro IIDA Morihiro KUGA Toshinori SUEYOSHI
Major cloud service providers, including Amazon and Microsoft, have started employing field-programmable gate arrays (FPGAs) to build high-performance and low-power-consumption cloud capability. However, utilizing an FPGA-enabled cloud is still challenging because of two main reasons. First, the introduction of software and hardware co-design leads to high development complexity. Second, FPGA virtualization and accelerator scheduling techniques are not fully researched for cluster deployment. In this paper, we propose an open-source FPGA-as-a-service (FaaS) platform, the hCODE, to simplify the design, management and deployment of FPGA accelerators at cluster scale. The proposed platform implements a Shell-and-IP design pattern and an open accelerator repository to reduce design and management costs of FPGA projects. Efficient FPGA virtualization and accelerator scheduling techniques are proposed to deploy accelerators on the FPGA-enabled cluster easily. With the proposed hCODE, hardware designers and accelerator users can be organized on one platform to efficiently build open-hardware ecosystem.
Masahiro IIDA Motoki AMAGASAKI Yasuhiro OKAMOTO Qian ZHAO Toshinori SUEYOSHI
Because of numerous circuit resources of FPGAs, there is a performance gap between FPGAs and ASICs. In this paper, we propose a small-memory logic cell, COGRE, to reduce the FPGA area. Our approach is to investigate the appearance ratio of the logic functions in a circuit implementation. Moreover, we group the logic functions on the basis of the NPN-equivalence class. The results of our investigation show that only small portions of the NPN-equivalence class can cover large portions of the logic functions used to implement circuits. Further, we found that NPN-equivalence classes with a high appearance ratio can be implemented by using a small number of AND gates, OR gates, and NOT gates. On the basis of this analysis, we develop COGRE architectures composed of several NAND gates and programmable inverters. The experimental results show that the logic area of 4-COGRE is smaller than that of 4-LUT and 5-LUT by approximately 35.79% and 54.70%, respectively. The logic area of 8-COGRE is 75.19% less than that of 8-LUT. Further, the total number of configuration memory bits of 4-COGRE is 8.26% less than the number of configuration memory bits of 4-LUT. The total number of configuration memory bits of 8-COGRE is 68.27% less than the number of configuration memory bits of 8-LUT.
Toshinori SUEYOSHI Masahiro IIDA
Recent DSP applications have many significant issues such as higher system performance, lower power consumption, higher design flexibility, faster time-to-market, and so on. Neither a conventional ASIC nor a conventional DSP can necessarily satisfy all the requirements at once nowadays. Therefore, an alternate for DSP applications will be needed to complement the drawbacks of ASICs and DSPs. This paper introduces a new computing paradigm called configurable computing or reconfigurable computing, which has more potential in terms of performance and flexibility. Conventional silicon platforms will not satisfy the conflicting demands of standard products and customization. However, silicon platforms such as FPGAs for configurable or reconfigurable computing are standardized in manufacturing but customized in application. This paper also presents a brief survey of the existing silicon platforms that support configuration or reconfiguration in the application domain of digital signal processing such as image processing, communication processing, audio and speech processing. Finally, we show some promising reconfigurable architectures for the digital signal processing and discuss the future of reconfigurable computing.
Motoki AMAGASAKI Qian ZHAO Masahiro IIDA Morihiro KUGA Toshinori SUEYOSHI
In this paper, we propose fault-tolerant field-programmable gate array (FPGA) architectures and their design framework for intellectual property (IP) cores in system-on-chip (SoC). Unlike discrete FPGAs, in which the integration scale can be made relatively large, programmable IP cores must correspond to arrays of various sizes. The key features of our architectures are a regular tile structure, spare modules and bypass wires for fault avoidance, and a configuration mechanism for single-cycle reconfiguration. In addition, we utilize routing tools, namely EasyRouter for proposed architecture. This tool can handle various array sizes corresponding to developed programmable IP cores. In this evaluation, we compared the performances of conventional FPGAs and the proposed fault-tolerant FPGA architectures. On average, our architectures have less than 1.82 times the area and 1.11 times the delay compared with traditional island-style FPGAs. At the same time, our FPGA shows a higher fault tolerant performance.
Yoshihiro ICHINOMIYA Tsuyoshi KIMURA Motoki AMAGASAKI Morihiro KUGA Masahiro IIDA Toshinori SUEYOSHI
SRAM-based field programmable gate arrays (FPGAs) are vulnerable to a soft-error induced by radiation. Techniques for designing dependable circuits, such as triple modular redundancy (TMR) with scrubbing, have been studied extensively. However, currently available evaluation techniques that can be used to check the dependability of these circuits are inadequate. Further, their results are restrictive because they do not represent the result in terms of general reliability indicator to decide whether the circuit is dependable. In this paper, we propose an evaluation method that provides results in terms of the realistic failure in time (FIT) by using reconfiguration-based fault-injection analysis. Current fault-injection analyses do not consider fault accumulation, and hence, they are not suitable for evaluating the dependability of a circuit such as a TMR circuit. Therefore, we configure an evaluation system that can handle fault-accumulation by using frame-based partial reconfiguration and the bootstrap method. By using the proposed method, we successfully evaluated a TMR circuit and could discuss the result in terms of realistic FIT data. Our method can evaluate the dependability of an actual system, and help with the tuning and selection in dependable system design.
Motoki AMAGASAKI Ryo ARAKI Masahiro IIDA Toshinori SUEYOSHI
Most modern field programmable gate arrays (FPGAs) use a lookup table (LUT) as their basic logic cell. LUT resource requirements increase as O(2k) with an increasing number of inputs, k, so LUTs with more than six inputs negatively affect the overall FPGA performance. To address this problem, we propose a scalable logic module (SLM), which is a logic cell with less configuration memory, by using partial functions of the Shannon expansion for logics that appear frequently. In addition, we develop a technology mapping tool for SLM. The key feature of our tool is to combine a function decomposition process with traditional cut-based mapping. Experimental results show that an SLM-based FPGA with our mapping method uses much fewer configuration memory bits and has a smaller area than conventional LUT-based FPGAs.
Masahiro IIDA Masahiro KOGA Kazuki INOUE Motoki AMAGASAKI Yoshinobu ICHIDA Mitsuro SAJI Jun IIDA Toshinori SUEYOSHI
An advantage of an RLD (reconfigurable logic device) such as an FPGA (field programmable gate array) is that it can be customized after being manufactured. Due to the aggressive technology scaling, device density is increasing, and it has become a serious problem in power consumption accordingly. In SoC of embedded systems, power gating is one of the major power reduction techniques. However, it is difficult to adopt SRAM-based RLDs because of the high overhead and SRAM being volatile. In this paper, we describe a TEG (test element group) chip of a reconfigurable logic based FeRAM (ferroelectric random access memory) technology. FeRAM brings reconfigurable logic devices the advantage of being a genuine power gater. The chip employs island-style routing architecture and uses a variable grain logic cell as a logic block. A NV-FF (non-volatile flip-flop), which contains FeRAM, a FF, and power-gating control circuits, is used as both configuration memories and FFs in a logic block. The NV-FF can transmit data between FeRAM and FF automatically when a power source is turned off/on. Thus chip-level power gating is possible. The hibernate/restore time is less than 1 ms. The chip has 1818 logic blocks and an area of 54.76 mm2.
Qian ZHAO Kazuki INOUE Motoki AMAGASAKI Masahiro IIDA Morihiro KUGA Toshinori SUEYOSHI
The most widely used open-source field programmable gate array (FPGA) placement and routing tool is the Versatile Packing, Placement and Routing (VPR) software developed at the University of Toronto, Canada. VPR calculates area and timing using target FPGA architecture and physical information. However, it cannot be used in FPGA IP design efficiently for two reasons. First, VPR cannot directly support most newly developed FPGA architectures, and modifying the C-coded VPR so that it can be used to evaluate a number of new architectures is time consuming. Second, the accuracy of the VPR performance results is inadequate for the evaluation of a complete FPGA IP in a design that targets the production of LSI. We propose an FPGA design framework that is focused on improving FPGA IP design efficiency. A novel FPGA routing tool is developed in this framework, namely the EasyRouter which uses the C# language. When an object-oriented programming method is used, there is less source code and it is easier to manage compared to VPR, thus shortening the development time. By using simple HDL code templates, EasyRouter can automatically generate the entire HDL code for a chip and the configuration bitstream. With these files, the FPGA IP can be evaluated with commercial VLSI CAD systems with high accuracy and reliability.
Motoki AMAGASAKI Yuki NISHITANI Kazuki INOUE Masahiro IIDA Morihiro KUGA Toshinori SUEYOSHI
Fault tolerance is an important feature for the system LSIs used in reliability-critical systems. Although redundancy techniques are generally used to provide fault tolerance, these techniques have significantly hardware costs. However, FPGAs can easily provide high reliability due to their reconfiguration ability. Even if faults occur, the implemented circuit can perform correctly by reconfiguring to a fault-free region of the FPGA. In this paper, we examine an FPGA-IP core loaded in SoC and introduce a fault-tolerant technology based on fault detection and recovery as a CAD-level approach. To detect fault position, we add a route to the manufacturing test method proposed in earlier research and identify fault areas. Furthermore, we perform fault recovery at the logic tile and multiplexer levels using reconfiguration. The evaluation results for the FPGA-IP core loaded in the system LSI demonstrate that it was able to completely identify and avoid fault areas relative to the faults in the routing area.
Hiroshi SHINOHARA Hideaki MONJI Masahiro IIDA Toshinori SUEYOSHI
High power consumption is a constraining factor for the growth of programmable logic devices. We propose two techniques in order to reduce power consumption. The first is a technique for creating contexts. This technique uses data-dependent circuits and wire sharing between contexts. The second is a technique for switching the contexts. In this paper, we evaluate the capability of the two techniques to reduce power consumption using a multi-context logic device. As a result, as compared with the original circuit, our multi-context circuits reduced the power consumption by 9.1% on an average and by a maximum of 19.0%. Furthermore, applying our resource sharing technique to these circuits, we achieved a reduction of 10.6% on an average and a maximum reduction of 18.8%.