1-8hit |
Morihiro KUGA Qian ZHAO Yuya NAKAZATO Motoki AMAGASAKI Masahiro IIDA
From edge devices to cloud servers, providing optimized hardware acceleration for specific applications has become a key approach to improve the efficiency of computer systems. Traditionally, many systems employ commercial field-programmable gate arrays (FPGAs) to implement dedicated hardware accelerator as the CPU's co-processor. However, commercial FPGAs are designed in generic architectures and are provided in the form of discrete chips, which makes it difficult to meet increasingly diversified market needs, such as balancing reconfigurable hardware resources for a specific application, or to be integrated into a customer's system-on-a-chip (SoC) in the form of embedded FPGA (eFPGA). In this paper, we propose an eFPGA generation suite with customizable architecture and integrated development environment (IDE), which covers the entire eFPGA design generation, testing, and utilization stages. For the eFPGA design generation, our intellectual property (IP) generation flow can explore the optimal logic cell, routing, and array structures for given target applications. For the testability, we employ a previously proposed shipping test method that is 100% accurate at detecting all stuck-at faults in the entire FPGA-IP. In addition, we propose a user-friendly and customizable Web-based IDE framework for the generated eFPGA based on the NODE-RED development framework. In the case study, we show an eFPGA architecture exploration example for a differential privacy encryption application using the proposed suite. Then we show the implementation and evaluation of the eFPGA prototype with a 55nm test element group chip design.
Kazunari TAKASAKI Ryoichi KIDA Nozomu TOGAWA
With the widespread use of Internet of Things (IoT) devices in recent years, we utilize a variety of hardware devices in our daily life. On the other hand, hardware security issues are emerging. Power analysis is one of the methods to detect anomalous behaviors, but it is hard to apply it to IoT devices where an operating system and various software programs are running. In this paper, we propose an anomalous behavior detection method for an IoT device by extracting application-specific power behaviors. First, we measure power consumption of an IoT device, and obtain the power waveform. Next, we extract an application-specific power waveform by eliminating a steady factor from the obtained power waveform. Finally, we extract feature values from the application-specific power waveform and detect an anomalous behavior by utilizing the local outlier factor (LOF) method. We conduct two experiments to show how our proposed method works: one runs three application programs and an anomalous application program randomly and the other runs three application programs in series and an anomalous application program very rarely. Application programs on both experiments are implemented on a single board computer. The experimental results demonstrate that the proposed method successfully detects anomalous behaviors by extracting application-specific power behaviors, while the existing approaches cannot.
Hao XIAO Ning WU Fen GE Guanyu ZHU Lei ZHOU
This paper presents a synchronization mechanism to effectively implement the lock and barrier protocols in a decentralized manner through explicit message passing. In the proposed solution, a simple and efficient synchronization control mechanism is proposed to support queued synchronization without contention. By using state-of-the-art Application-Specific Instruction-set Processor (ASIP) technology, we embed the synchronization functionality into a baseline processor, making the proposed mechanism feature ultra-low overhead. Experimental results show the proposed synchronization achieves ultra-low latency and almost ideal scalability when the number of processors increases.
Huaxi GU Zheng CHEN Yintang YANG Hui DING
Optical Network-on-Chip (ONoC) is a promising emerging technology, which can solve the bottlenecks faced by electrical on-chip interconnection. However, the existing proposals of ONoC are mostly built on fixed topologies, which are not flexible enough to support various applications. To make full use of the limited resource and provide a more efficient approach for resource allocation, RONoC (Reconfigurable Optical Network-on-Chip) is proposed in this letter. The topology can be reconfigured to meet the requirement of different applications. An 8×8 nonblocking router is also designed, together with the communication mechanism. The simulation results show that the saturation load of RONoC is 2 times better than mesh, and the energy consumption is 25% lower than mesh.
Shouyi YIN Yang HU Zhen ZHANG Leibo LIU Shaojun WEI
Hybrid wired/wireless on-chip network is a promising communication architecture for multi-/many-core SoC. For application-specific SoC design, it is important to design a dedicated on-chip network architecture according to the application-specific nature. In this paper, we propose a heuristic wireless link allocation algorithm for creating hybrid on-chip network architecture. The algorithm can eliminate the performance bottleneck by replacing multi-hop wired paths by high-bandwidth single-hop long-range wireless links. The simulation results show that the hybrid on-chip network designed by our algorithm improves the performance in terms of both communication delay and energy consumption significantly.
A systolic array is an ideal for ASICs because of its massive parallelism with a minimum communication overhead, regularity and modularity. Most of commercial FPGAs cannot handle systolic structure with fast sampling rate for their general-purpose architecture nature. This paper presents a new PLD architecture targeting a super-systolic array for application-specific arithmetic operations such as MAC. This architecture combines the high performance of ASICs with the flexibility of PLDs and it offers a significant alternative view on the programmable logic devices. The super-systolic array is ideal for a newly proposed PLD architecture when it comes to area-efficiency, P&R and clock speed.
This paper presents a novel system-level design methodology, called quality-driven design, by which application-specific optimization can be achieved; furthermore the entire functionality can be shared to maximize design reuse. As a case of study, this paper focuses on quality-driven design for video applications and introduces an output quality adaptive approach based on variable bitwidth optimization to explore a new design space. MPEG2 video is used as the driver application to illustrate the potential of the presented methodology. Experimental results show the effectiveness of the methodology.
This paper presents a novel low-energy memory design technique based on variable analysis for on-chip data memory (RAM) in application-specific systems, which called VAbM technique. It targets the exploitation of both data locality and effective data width of variables to reduce energy consumed by data transfer and storage. Variables with higher access frequency and smaller effective data width are assigned into a smaller low-energy memory with fewer bit lines and word lines, placed closer the processor. Under constraints of the number of memory banks, VAbM technique use variable analysis results to perform allocating and assigning on-chip RAM into multiple banks, which have different size with different number of word lines and different number of bit lines tailored to each application requirements. Experimental results with several real embedded applications demonstrate significant energy reduction up to 64.8% over monolithic memory, and 27.7% compared to memory designed by memory banking technique.