Hiromu MIYAZAKI Takuto KANAMORI Md Ashraful ISLAM Kenji KISE
RISC-V is a RISC based open and loyalty free instruction set architecture which has been developed since 2010, and can be used for cost-effective soft processors on FPGAs. The basic 32-bit integer instruction set in RISC-V is defined as RV32I, which is sufficient to support the operating system environment and suits for embedded systems. In this paper, we propose an optimized RV32I soft processor named RVCoreP adopting five-stage pipelining. Three effective methods are applied to the processor to improve the operating frequency. These methods are instruction fetch unit optimization, ALU optimization, and data memory optimization. We implement RVCoreP in Verilog HDL and verify the behavior using Verilog simulation and an actual Xilinx Atrix-7 FPGA board. We evaluate IPC (instructions per cycle), operating frequency, hardware resource utilization, and processor performance. From the evaluation results, we show that RVCoreP achieves 30.0% performance improvement compared with VexRiscv, which is a high-performance and open source RV32I processor selected from some related works.
David ALEDO Benjamin CARRION SCHAFER Félix MORENO
This paper describes the advantages and disadvantages observed when describing complex parameterizable Artificial Neural Networks (ANNs) at the behavioral level using SystemC and at the Register Transfer Level (RTL) using VHDL. ANNs are complex to parameterize because they have a configurable number of layers, and each one of them has a unique configuration. This kind of structure makes ANNs, a priori, challenging to parameterize using Hardware Description Languages (HDL). Thus, it seems intuitively that ANNs would benefit from the raise in level of abstraction from RTL to behavioral level. This paper presents the results of implementing an ANN using both levels of abstractions. Results surprisingly show that VHDL leads to better results and allows a much higher degree of parameterization than SystemC. The implementation of these parameterizable ANNs are made open source and are freely available online. Finally, at the end of the paper we make some recommendation for future HLS tools to improve their parameterization capabilities.
Jung-Lin YANG Shin-Nung LU Pei-Hsuan YU
Developing a rapid prototyping environment utilizing hardware description languages (HDLs) and conventional FPGAs can help ease and conquer the difficulties caused by the complexity of asynchronous digital systems and the advance of VLSI technology recently. We proposed a design flow and a FPGA template for implementing generalized C-element (gC) style asynchronous controllers. Utilizing conventional FPGA synthesis tools, self-timed bundled-data function modules can be realized with some effort on timing validation. The proposed design flow with FPGA-based realization approach is a very effective design methodology for rapid prototyping and functionality validation. This work could be useful for the early stage of performance estimation, power reduction exploration, circuits design training, and many other applications regarded asynchronous circuits. In this paper, the proposed FPGA-based asynchronous circuit design flow, a hands-on design tutorial, a generalized C-element template, and a list of synthesized benchmark circuits are documented and discussed in detail.
Jung-Lin YANG Jau-Cheng WEI Shin-Nung LU
A hardware description languages (HDLs) based modeling technique for asynchronous circuits is presented in this paper. A HDLs handshake package has been developed for expressing handshake-style digital systems in both VHDL and Verilog. Burst-mode and extended burst-mode (BM/XBM) circuits were used to demonstrate the usefulness of this work. This research successfully prototyped comparators, adders, RSA encoder/decoder, and several self-timed circuits for the full-custom IC and FPGAs designs. Furthermore, the HDLs handshake package implemented by this research can be utilized to develop behavioral test benches for studying and analyzing asynchronous designs. Extracting detailed timing information from asynchronous finite state machines (AFSMs), detecting delay faults for synthesized self-timed functional modules, and locating fundamental mode violation within realized AFSMs are proven applications. The anticipated HDL modeling technique and the transformation procedure are detailed in the rest of this paper.
Youngsun HAN Seok Joong HWANG Seon Wook KIM
In this paper, we present a reconfigurable processor infrastructure to accelerate Java applications, called Jaguar. The Jaguar infrastructure consists of a compiler framework and a runtime environment support. The compiler framework selects a group of Java methods to be translated into hardware for delivering the best performance under limited resources, and translates the selected Java methods into Verilog synthesizable code modules. The runtime environment support includes the Java virtual machine (JVM) running on a host processor to provide Java execution environment to the generated Java accelerator through communication interface units while preserving Java semantics. Our compiler infrastructure is a tightly integrated and solid compiler-aided solution for Java reconfigurable computing. There is no limitation in generating synthesizable Verilog modules from any Java application while preserving Java semantics. In terms of performance, our infrastructure achieves the speedup by 5.4 times on average and by up to 9.4 times in measured benchmarks with respect to JVM-only execution. Furthermore, two optimization schemes such as an instruction folding and a live buffer removal can reduce 24% on average and up to 39% of the resource consumption.
Nobuhiro DOI Takashi HORIYAMA Masaki NAKANISHI Shinji KIMURA
High-level synthesis is a novel method to generate a RT-level hardware description automatically from a high-level language such as C, and is used at recent digital circuit design. Floating-point to fixed-point conversion with bit-length optimization is one of the key issues for the area and speed optimization in high-level synthesis. However, the conversion task is a rather tedious work for designers. This paper introduces automatic bit-length optimization method on floating-point to fixed-point conversion for high-level synthesis. The method estimates computational errors statistically, and formalizes an optimization problem as a non-linear problem. The application of NLP technique improves the balancing between computational accuracy and total hardware cost. Various constraints such as unit sharing, maximum bit-length of function units can be modeled easily, too. Experimental result shows that our method is fast compared with typical one, and reduces the hardware area.
Ulkuhan EKINCIEL Hiroaki YAMAOKA Hiroaki YOSHIDA Makoto IKEDA Kunihiro ASADA
This paper describes the design and development of a module generator for a dual-rail PLA with embedded 2-input logic cells for 0.35 µm CMOS technology. In order to automatically generate logic-cell based PLA layouts from circuit specifications, a module generator as a design automation tool of logic-cell based PLA is developed with a structural improvement. This module generator is based on a timing-driven design methodology and consists of logic synthesis, transistor sizing and logic cell generation, stimulus generation, HDL model generation parts. This generator uses a design constraint to achieve a flexible transistor sizing in a logic cell generation part. In addition, generated logic cells can be easily adapted to a layout generator. The layout is generated by using 0.35 µm, 3-metal-layer CMOS technology. Moreover, an HDL model generator is developed to create delay behavior models easily and quickly with precise timing parameters. The design complexity which is becoming an important issue for VLSI circuits can be reduced partially and human caused errors are minimized by module generator. A PLA layout in GDS-II form and an HDL model behavior of a Boolean function which has 64-bit input, 1-bit output and 220 product term can be generated within 8 minutes on a SunUltraSPARC-III 900 MHz processor. A very short time is required to compile the module, and this makes it feasible for designers to try many different design configurations in order to get the better one.
Joaquín GRACIA Juan C. BARAZA Daniel GIL Pedro J. GIL
Nowadays, the use of dependable systems is generalising, and diagnosis is an important step during their design . A diagnosis in early phases of the design cycle allows to save time and money. Fault injection can be used during the design process of the system, and using Hardware Description Languages, particularly VHDL, it is possible to accomplish this early diagnosis. During last years, the Time-Triggered Architecture (TTA) has emerged as a hard real-time fault-tolerant architecture for embedded systems. This novel architecture is gaining adepts mainly in the avionics and automotive industries ( x-by-wire ). The TTA implements a synchronous protocol with static scheduling that has been specifically targeted at hard real-time fault-tolerant distributed system. In this work, we present the study of the VHDL model of a communication controller based on the TTA, where a number of fault injection campaigns have been carried out. We comment the results produced and suggest some solutions to problems detected.
This paper presents the implementation of sfl2vl, a new free tool for SFL to Verilog conversion. Also this paper will discuss the performance of the conversion and the logic simulation of the sfl2vl+Icarus Verilog (free-ware compiler) versus PARTHENON with some MPU designs.
Nobuhiro DOI Takashi HORIYAMA Masaki NAKANISHI Shinji KIMURA Katsumasa WATANABE
In the hardware synthesis from a high-level language such as C, the bit length of variables is one of the key issues for the area and speed optimization. Usually, designers are required to optimize the bit-length of each variable manually using the time-consuming simulation on huge-data. In this paper, we propose an optimization method of the fractional bit length in the conversion from floating-point variables to fixed-point variables. The method is based on error propagation and the backward propagation of the accuracy limitation. The method is fully analytical and fast compared to simulation based methods.
Wichai BOONKUMKLAO Yoshikazu MIYANAGA Kobchai DEJHAN
In this paper, we introduce a flexible design for intellectual property(IP) which has become important to design system LSI. The proposed IPs which have high flexibility for user requirement. The design priority is determined by setting parameters as the number of arithmetic unit, internal bitlength, clock speed and so on. The design time can thus be reduced. Designed IP is based on the reconfigurable architecture in which many structures can be dynamically selected. This paper shows a implementation of Frequency Response Masking digital filter(FRM) and Principal Components Analysis(PCA) using a reconfigurable architecture. We show the method to realize the designed circuit and the results of experiments using field programmable gate array(FPGA).
Jinku CHOI Nozomu TOGAWA Masao YANAGISAWA Tatsuo OHTSUKI
The motion estimation can choose the most suitable algorithm for different kinds of motion types, formats, and characteristics. The video encoding system can be optimized for quality, speed, and power consumption. In this paper, we propose a reconfigurable approach to a motion estimation algorithm and hardware architecture. The proposed algorithm determines motion type and then selects adapted block-matching algorithm for different kinds of motion sequences. The quality of our algorithm is better than that of the TSS and the BBGDS algorithm, or comparable to the performance of the better of the two, and the computational complexity of our algorithm is significantly less than that of the TSS. We also propose hardware architecture for realizing two kinds of motion estimations in the same hardware. We implemented the flexible and reconfigurable hardware architecture by using address generator unit, delay unit, and parameters and by using the hardware description language (VHDL) and the SYNOPSYS synthesis design tools. We analyze the performance of the algorithm and present adapted algorithm for a low cost real time application.
Minsuk HONG Jinsung OH Chan Young PARK Wooseok KANG Sehyeon RHEE Sang-Hui PARK
In this paper, we present the design and implementation of a cost effective Ethernet over HDLC forwarding VLSI for network access system. It supports 10/100 Mbps Ethernet PHYs and up to 50 Mbps HDLC interface directly applied to Modem or transceiver. The maximum forwarding/filtering rate is 90,000 pps with a throughput latency of 1 frame, which supports high speed applications. It can also support both master mode for Ethernet PHY and slave mode for switching chip by the pin configuration. It has been implemented as a single chip based on 0.5 µm CMOS technology. Field test shows that the wire-speed packet forwarding and processing using by the implemented chip can be achieved.
Pisana PLACIDI Leonardo VERDUCCI Guido MATRELLA Luca ROSELLI Paolo CIAMPOLINI
In this paper, characteristics of a digital system dedicated to the fast execution of the FDTD algorithm, widely used for electromagnetic simulation, are presented. Such system is conceived as a module communicating with a host personal computer via a PCI bus, and is based on a VLSI ASIC, which implements the "field-update" engine. The system structure is defined by means of a hardware description language, allowing to keep high-level system specification independent of the actual fabrication technology. A virtual implementation of the system has been carried out, by mapping such description in a standard-cell style on a commercial 0.35 µm technology. Simulations show that significant speed-up can be achieved, with respect to state-of-the-art software implementations of the same algorithm.
Kenichi SUZUKI Mitsuhiro TAKEDA Atsushi KAMO Hideki ASAI
This letter presents a novel application of the Verilog-A, which is a hardware description language for analog circuits, to the modeling and simulation of high-speed interconnects in time/frequency transform-domain for signal integrity problems. This modeling method with the Verilog-A language would handle the transfer function approximation and admittance matrices, which are expressed by the dominant poles and residues as used in AWE technique. Finally, it is shown that modeling and simulation of the high-speed interconnects with nonlinear terminations can be done easily.
For FA (factory automation) and ATE (automatic test equipment) in the industrial area, the standard bus is required to increase the system performance of multiprocessor environment. VME (versa module european package format) bus is appropriated to the standard bus but has the features that is the small of package and the low density of board. Beside, the density of board and semiconductor have grown to become a significant issues that affect the development time, project cost and field diagnostics. To fit this trend, in this paper, the author composed Revision C. 1 (IEEE Std. P1014-1987) of the integrated environment for the main function such as arbitration, interrupt and interface between VMEbus and several control modules. Also the designed VME system controller is implemented on FPGA that can be located even into Slot 1. The control and function modules are coded with VHDL mid-fixed description method and then those operations are verified by simulation. As a result of experiment, the author confirmed that the most important about the operation of Bus timer that Bus error signal should occur within 56 µs, and both control and function modules have the reciprocal operation correctly. Thus, the constructed VHDL library will be able to apply the system based VMEbus and ASIC design.
Makiko ITOH Yoshinori TAKEUCHI Masaharu IMAI Akichika SHIOMI
A synthesizable HDL generation method for pipelined processors is proposed. By using the proposed method, data-path and control logic descriptions of a target processor is generated from a clock based instruction set specification. From the experimental results, feasibility of the proposed method is evaluated and the amount of processor design time was drastically reduced than that of conventional RT level manual design in HDL.
Osamu OGAWA Kazuyoshi TAKAGI Yasufumi ITOH Shinji KIMURA Katsumasa WATANABE
In the hardware synthesis methods with high level languages such as C language, optimization quality of the compilers has a great influence on the area and speed of the synthesized circuits. Among hardware-oriented optimization methods required in such compilers, minimization of the bit length of the data-paths is one of the most important issues. In this paper, we propose an estimation algorithm of the necessary bit length of variables for this aim. The algorithm analyzes the control/data-flow graph translated from C programs and decides the bit length of each variable. On several experiments, the bit length of variables can be reduced by half with respect to the declared length. This method is effective not only for reducing the circuit area but also for reducing the delay of the operation units such as adders.
This paper presents a new approach to simulation of Dynamically Reconfigurable Logic (DRL) systems, which offers better accuracy of modelling dynamic reconfiguration than previously reported techniques. Our method, named Clock Morphing (CM), is based on modelling dynamic reconfiguration via a reconfigured module clock signal, while using a dedicated signal value to indicate dynamic reconfiguration. We discuss problems associated with the other approaches to DRL simulation and describe the main principles behind the proposed technique. We further demonstrate feasibility of a CM DRL simulation on its example implementation in VHDL.
Crosstalk from digital to analog circuits can be causative of operation fails in analog-digital mixed LSIs. This paper describes modeling techniques and simulation strategies of the substrate coupling noise. A macroscopic substrate noise model that expresses the noise as a function of logic state transition frequencies among digital blocks is proposed. A simulation system based on the model is implemented in the mixed signal simulation environment, where performance degradation of the 2nd order ΔΣADC coupled to digital noise sources is clearly simulated. These results indicate that the proposed behavioral modeling approach allows practicable full chip substrate noise simulation measures.