The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] fpga(330hit)

161-180hit(330hit)

  • Energy Minimum Operation with Self Synchronous Gate-Level Autonomous Power Gating and Voltage Scaling

    Benjamin DEVLIN  Makoto IKEDA  Kunihiro ASADA  

     
    PAPER

      Vol:
    E95-C No:4
      Page(s):
    546-554

    A 65 nm self synchronous field programmable gate array (SSFPGA) which uses autonomous gate-level power gating with minimal control circuitry overhead for energy minimum operation is presented. The use of self synchronous signaling allows the FPGA to operate at voltages down to 370 mV without any parameter tuning. We show both 2.6x total energy reduction and 6.4x performance improvement at the same time for energy minimum operation compared to the non-power gated SSFPGA, and compared to the latest research 1.8x improvement in power-delay product (PDP) and 2x performance improvement. When compared to a synchronous FPGA in a similar process we are able to show up to 84.6x PDP improvement. We also show energy minimum operation for maximum throughput on the power gated SSFPGA is achieved at 0.6 V, 27 fJ/operation at 264 MHz.

  • Asynchronous Circuit Design on Field Programmable Gate Array Devices

    Jung-Lin YANG  Shin-Nung LU  Pei-Hsuan YU  

     
    PAPER

      Vol:
    E95-C No:4
      Page(s):
    516-522

    Developing a rapid prototyping environment utilizing hardware description languages (HDLs) and conventional FPGAs can help ease and conquer the difficulties caused by the complexity of asynchronous digital systems and the advance of VLSI technology recently. We proposed a design flow and a FPGA template for implementing generalized C-element (gC) style asynchronous controllers. Utilizing conventional FPGA synthesis tools, self-timed bundled-data function modules can be realized with some effort on timing validation. The proposed design flow with FPGA-based realization approach is a very effective design methodology for rapid prototyping and functionality validation. This work could be useful for the early stage of performance estimation, power reduction exploration, circuits design training, and many other applications regarded asynchronous circuits. In this paper, the proposed FPGA-based asynchronous circuit design flow, a hands-on design tutorial, a generalized C-element template, and a list of synthesized benchmark circuits are documented and discussed in detail.

  • A Design Method of a Regular Expression Matching Circuit Based on Decomposed Automaton

    Hiroki NAKAHARA  Tsutomu SASAO  Munehiro MATSUURA  

     
    PAPER-Design Methodology

      Vol:
    E95-D No:2
      Page(s):
    364-373

    This paper shows a design method for a regular expression matching circuit based on a decomposed automaton. To implement a regular expression matching circuit, first, we convert a regular expression into a non-deterministic finite automaton (NFA). Then, to reduce the number of states, we convert the NFA into a merged-states non-deterministic finite automaton with unbounded string transition (MNFAU) using a greedy algorithm. Next, to realize it by a feasible amount of hardware, we decompose the MNFAU into a deterministic finite automaton (DFA) and an NFA. The DFA part is implemented by an off-chip memory and a simple sequencer, while the NFA part is implemented by a cascade of logic cells. Also, in this paper, we show that the MNFAU based implementation has lower area complexity than the DFA and the NFA based ones. Experiments using regular expressions form SNORT shows that, as for the embedded memory size per a character, the MNFAU is 17.17-148.70 times smaller than DFA methods. Also, as for the number of LCs (Logic Cells) per a character, the MNFAU is 1.56-5.12 times smaller than NFA methods. This paper describes detail of the MEMOCODE2010 HW/SW co-design contest for which we won the first place award.

  • Region-Oriented Placement Algorithm for Coarse-Grained Power-Gating FPGA Architecture

    Ce LI  Yiping DONG  Takahiro WATANABE  

     
    PAPER-Design Methodology

      Vol:
    E95-D No:2
      Page(s):
    314-323

    An FPGA plays an essential role in industrial products due to its fast, stable and flexible features. But the power consumption of FPGAs used in portable devices is one of critical issues. Top-down hierarchical design method is commonly used in both ASIC and FPGA design. But, in the case where plural modules are integrated in an FPGA and some of them might be in sleep-mode, current FPGA architecture cannot be fully effective. In this paper, coarse-grained power gating FPGA architecture is proposed where a whole area of an FPGA is partitioned into several regions and power supply is controlled for each region, so that modules in sleep mode can be effectively power-off. We also propose a region oriented FPGA placement algorithm fitted to this user's hierarchical design based on VPR [1]. Simulation results show that this proposed method could reduce power consumption of FPGA by 38% on average by setting unused modules or regions in sleep mode.

  • A Physical Design Method for a New Memory-Based Reconfigurable Architecture without Switch Blocks

    Masatoshi NAKAMURA  Masato INAGI  Kazuya TANIGAWA  Tetsuo HIRONAKA  Masayuki SATO  Takashi ISHIGURO  

     
    PAPER-Design Methodology

      Vol:
    E95-D No:2
      Page(s):
    324-334

    In this paper, we propose a placement and routing method for a new memory-based programmable logic device (MPLD) and confirm its capability by placing and routing benchmark circuits. An MPLD consists of multiple-output look-up tables (MLUTs) that can be used as logic and/or routing elements, whereas field programmable gate arrays (FPGAs) consist of LUTs (logic elements) and switch blocks (routing elements). MPLDs contain logic circuits more efficiently than FPGAs because of their flexibility and area efficiency. However, directly applying the existing placement and routing algorithms of FPGAs to MPLDs overcrowds the placed logic cells and causes a shortage of routing domains between logic cells. Our simulated annealing-based method considers the detailed wire congestion and nearness between logic cells based on the cost function and reserves the area for routing. In the experiments, our method reduced wire congestion and successfully placed and routed 27 out of 31 circuits, 13 of which could not be placed or routed using the versatile place and route tool (VPR), a well-known method for FPGAs.

  • Hierarchical MFMO Circuit Modules for an Energy-Efficient SDR DBF

    Jeich MAR  Chi-Cheng KUO  Shin-Ru WU  You-Rong LIN  

     
    PAPER-Application

      Vol:
    E95-D No:2
      Page(s):
    413-425

    The hierarchical multi-function matrix operation (MFMO) circuit modules are designed using coordinate rotations digital computer (CORDIC) algorithm for realizing the intensive computation of matrix operations. The paper emphasizes that the designed hierarchical MFMO circuit modules can be used to develop a power-efficient software-defined radio (SDR) digital beamformer (DBF). The formulas of the processing time for the scalable MFMO circuit modules implemented in field programmable gate array (FPGA) are derived to allocate the proper logic resources for the hardware reconfiguration. The hierarchical MFMO circuit modules are scalable to the changing number of array branches employed for the SDR DBF to achieve the purpose of power saving. The efficient reuse of the common MFMO circuit modules in the SDR DBF can also lead to energy reduction. Finally, the power dissipation and reconfiguration function in the different modes of the SDR DBF are observed from the experiment results.

  • FPGA Implementation of Metastability-Based True Random Number Generator

    Hisashi HATA  Shuichi ICHIKAWA  

     
    PAPER-Application

      Vol:
    E95-D No:2
      Page(s):
    426-436

    True random number generators (TRNGs) are important as a basis for computer security. Though there are some TRNGs composed of analog circuit, the use of digital circuits is desired for the application of TRNGs to logic LSIs. Some of the digital TRNGs utilize jitter in free-running ring oscillators as a source of entropy, which consume large power. Another type of TRNG exploits the metastability of a latch to generate entropy. Although this kind of TRNG has been mostly implemented with full-custom LSI technology, this study presents an implementation based on common FPGA technology. Our TRNG is comprised of logic gates only, and can be integrated in any kind of logic LSI. The RS latch in our TRNG is implemented as a hard-macro to guarantee the quality of randomness by minimizing the signal skew and load imbalance of internal nodes. To improve the quality and throughput, the output of 64–256 latches are XOR'ed. The derived design was verified on a Xilinx Virtex-4 FPGA (XC4VFX20), and passed NIST statistical test suite without post-processing. Our TRNG with 256 latches occupies 580 slices, while achieving 12.5 Mbps throughput.

  • Efficient Sequential Architecture of AES CCM for the IEEE 802.16e

    Jae Deok JI  Seok Won JUNG  Jongin LIM  

     
    LETTER-Privacy

      Vol:
    E95-D No:1
      Page(s):
    185-187

    In this paper, we propose efficient sequential AES CCM architecture for the IEEE 802.16e. In the proposed architecture, only one AES encryption core is used and the operation of the CTR and the CBC-MAC is processed concurrently within one round. With this design approach, we can design sequential AES CCM architecture having 570 Mbps@102.4 MHz throughput and 1,397 slices at a Spartan3 3s5000 device.

  • Development and Outdoor Evaluation of an Experimental Platform in an 80-MHz Bandwidth 22 MIMO-OFDM System in 5.2-GHz Band

    Hisayoshi KANO  Shingo YOSHIZAWA  Takashi GUNJI  Shougo OKAMOTO  Morio TAWARAYAMA  Yoshikazu MIYANAGA  

     
    PAPER-Computer System

      Vol:
    E94-D No:12
      Page(s):
    2400-2408

    The IEEE802.11ac task group has announced the use of a wider channel that extends the channel bandwidth to more than 80 MHz. We present an experimental platform consisting of a baseband and a RF unit in a 22 MIMO-OFDM system for the wider channel and report its system performance results from a field experiment. The MIMO-OFDM transceiver in the baseband unit has been designed to detect real-time MIMO and provides a maximum data rate of 600 Mbps. OFDM tends to cause high peak PAPR for wider channels and distorts the power amplifier performance in the RF unit. We have improved the non-linear distortion by optimizing the OFDM preamble and evaluated its performance by conducting a simulation integrated with baseband processing and a RF. In the field experiment, our platform tested the communication performance in a farm and a passage environment.

  • A Graph Rewriting Approach for Converting Asynchronous ROMs into Synchronous Ones

    Md. Nazrul Islam MONDAL  Koji NAKANO  Yasuaki ITO  

     
    PAPER

      Vol:
    E94-D No:12
      Page(s):
    2378-2388

    Most of FPGAs have Configurable Logic Blocks (CLBs) to implement combinational and sequential circuits and block RAMs to implement Random Access Memories (RAMs) and Read Only Memories (ROMs). Circuit design that minimizes the number of clock cycles is easy if we use asynchronous read operations. However, most of FPGAs support synchronous read operations, but do not support asynchronous read operations. The main contribution of this paper is to provide one of the potent approaches to resolve this problem. We assume that a circuit using asynchronous ROMs designed by a non-expert or quickly designed by an expert is given. Our goal is to convert this circuit with asynchronous ROMs into an equivalent circuit with synchronous ones. The resulting circuit with synchronous ROMs can be embedded into FPGAs. We also discuss several techniques to decrease the latency and increase the clock frequency of the resulting circuits.

  • Low Power Placement and Routing for the Coarse-Grained Power Gating FPGA Architecture

    Ce LI  Yiping DONG  Takahiro WATANABE  

     
    PAPER-Physical Level Design

      Vol:
    E94-A No:12
      Page(s):
    2519-2527

    Since the power consumption of FPGA is larger than that of ASIC under the condition to perform the same function using the same scaling, the application of FPGA is limited especially in portable electronic devices. In this paper, we propose a novel low-power FPGA architecture based on coarse-grained power gating to reduce power consumption. The new placement algorithm and routing resource graph for sleep regions is also presented. After enhancing the CAD framework, a detailed discussion is given under different region size supported by the new FPGA architecture. As a result, our proposed FPGA architecture combined with the new placement and routing algorithm can reduce 19.4% in the total power consumption compared with the traditional FPGA. By using our proposed method, FPGA is promising to be widely applied to portable devices.

  • Compact Architecture for ASIC and FPGA Implementation of the KASUMI Block Cipher

    Dai YAMAMOTO  Kouichi ITOH  Jun YAJIMA  

     
    PAPER-High-Level Synthesis and System-Level Design

      Vol:
    E94-A No:12
      Page(s):
    2628-2638

    Compact design is very important for embedded systems such as wireless sensor nodes, RFID tags and mobile devices because of their limited hardware (H/W) resources. This paper proposes a compact H/W implementation for the KASUMI block cipher, which is the 3GPP standard encryption algorithm. In [8] and [9], Yamamoto et al. proposed a method of reducing the register size for the MISTY1 FO function (YYI-08), and implemented very compact MISTY1 H/W. In this paper we aim to implement the smallest KASUMI H/W to date by applying a YYI-08 configuration to KASUMI, whose FO function has a similar structure to that of MISTY1. However, we discovered that straightforward application of YYI-08 raises problems. We therefore propose a new YYI-08 configuration improved for KASUMI and the compact H/W architecture. The new YYI-08 configuration consists of new FL function calculation schemes and a suitable calculation order. According to our logic synthesis on a 0.11-µm ASIC process, the gate size is 2.99 K gates, which, to our knowledge, is the smallest to date.

  • Multi-Operand Adder Synthesis Targeting FPGAs

    Taeko MATSUNAGA  Shinji KIMURA  Yusuke MATSUNAGA  

     
    PAPER-Logic Synthesis, Test and Verification

      Vol:
    E94-A No:12
      Page(s):
    2579-2586

    Multi-operand adders, which calculates the summation of more than two operands, usually consist of compressor trees which reduce the number of operands to two without any carry propagation, and a carry-propagate adder for the two operands in ASIC implementation. The former part is usually realized using full adders or (3;2) counters like Wallace-trees in ASIC, while adder trees or dedicated hardware are used in FPGA. In this paper, an approach to realize compression trees on FPGAs is proposed. In case of FPGA with m-input LUT, any counters with up to m inputs can be realized with one LUT per an output. Our approach utilizes generalized parallel counters (GPCs) with up to m inputs and synthesizes high-performance compressor trees by setting some intermediate height limits in the compression process like Dadda's multipliers. Experimental results show that the number of GPCs are reduced by up to 22% compared to the existing heuristic. Its effectivity on reduction of delay is also shown against existing approaches on Altera's Stratix III.

  • FPGA-Specific Custom VLIW Architecture for Arbitrary Precision Floating-Point Arithmetic

    Yuanwu LEI  Yong DOU  Jie ZHOU  

     
    PAPER-Computer System

      Vol:
    E94-D No:11
      Page(s):
    2173-2183

    Many scientific applications require efficient variable-precision floating-point arithmetic. This paper presents a special-purpose Very Large Instruction Word (VLIW) architecture for variable precision floating-point arithmetic (VV-Processor) on FPGA. The proposed processor uses a unified hardware structure, equipped with multiple custom variable-precision arithmetic units, to implement various variable-precision algebraic and transcendental functions. The performance is improved through the explicitly parallel technology of VLIW instruction and by dynamically varying the precision of intermediate computation. We take division and exponential function as examples to illustrate the design of variable-precision elementary algorithms in VV-Processor. Finally, we create a prototype of VV-Processor unit on a Xilinx XC6VLX760-2FF1760 FPGA chip. The experimental results show that one VV-Processor unit, running at 253 MHz, outperforms the approach of a software-based library running on an Intel Core i3 530 CPU at 2.93 GHz by a factor of 5X-37X for basic variable-precision arithmetic operations and elementary functions.

  • 3D-DCT Processor and Its FPGA Implementation

    Yuki IKEGAKI  Toshiaki MIYAZAKI  Stanislav G. SEDUKHIN  

     
    PAPER-Computer System

      Vol:
    E94-D No:7
      Page(s):
    1409-1418

    Conventional array processors randomly access input/coefficient data stored in memory many times during three-dimensional discrete cosine transform (3D-DCT) calculations. This causes a calculation bottleneck. In this paper, a 3D array processor dedicated to 3D-DCT is proposed. The array processor drastically reduces data swapping or replacement during the calculation and thus improves performance. The time complexity of the proposed NNN array processor is O(N) for an N3-size input data cube, and that of the 3D-DCT sequential calculation is O(N4). A specific I/O architecture, throughput-improved architectures, and more scalable architecture are also discussed in terms of practical implementation. Experimental results of implementation on FPGA (field-programmable gate array) suggest that our architecture provides good performance for real-time 3D-DCT calculations.

  • ROM-Less Phase to Amplitude Converter Using Sine Wave Approximation Based on Harmonic Removal from Trapezoid Wave

    Hiroomi HIKAWA  

     
    LETTER-Cryptography and Information Security

      Vol:
    E94-A No:7
      Page(s):
    1581-1584

    This paper proposes a new sine wave approximation method for the PAC of DDFS. Sine wave is approximated by removing the harmonic components from trapezoid waveform. Experimental results show that the proposed PAC is advantageous in the SFDR range less than 60 dBc due to its small hardware cost.

  • A Proposition of 600 Mbps WLAN-Like System with Low-Complexity MIMO Decoder for FPGA Implementation

    Wahyul Amien SYAFEI  Yuhei NAGAO  Ryuta IMASHIOYA  Masayuki KUROSAKI  Baiko SAI  Hiroshi OCHI  

     
    PAPER-Wireless Communication Technologies

      Vol:
    E94-B No:2
      Page(s):
    491-498

    This paper deals with our works on developing a high-throughput wireless LAN using a group layered space-time (GLST) system with low-complexity MIMO decoder. It achieves the throughput of 600 Mbps for 30 meter propagation distance by utilizing 80 MHz bandwidth in the 5 GHz frequency band. Run test under channel model B of IEEE802.11TGn demonstrates its excellent performance. The register transfer level results show that the developed system is synthesized successfully and the prototyping in the target FPGA chips of Stratix II EP2S180F1508C4 gives the expected results.

  • How to Maximize the Potential of FPGA-Based DSPs for Modular Exponentiation

    Daisuke SUZUKI  Tsutomu MATSUMOTO  

     
    PAPER-Implementation

      Vol:
    E94-A No:1
      Page(s):
    211-222

    This paper describes a modular exponentiation processing method and circuit architecture that can exhibit the maximum performance of FPGA resources. The modular exponentiation architecture proposed by us comprises three main techniques. The first one is to improve the Montgomery multiplication algorithm in order to maximize the performance of the multiplication unit in an FPGA. The second one is to balance and improve the circuit delay. The third one is to ensure scalability of the circuit. Our architecture can perform fast operations using small-scale resources; in particular, it can complete a 512-bit modular exponentiation as fast as in 0.26 ms with the smallest Virtex-4 FPGA, XC4VF12-10SF363. In fact the number of SLICEs used is approx. 4200, which proves the compactness of our design. Moreover, the scalability of our design also allows 1024-, 1536-, and 2048-bit modular exponentiations to be processed in the same circuit.

  • A Domain Partition Model Approach to the Online Fault Recovery of FPGA-Based Reconfigurable Systems

    Lihong SHANG  Mi ZHOU  Yu HU  Erfu YANG  

     
    PAPER-Nonlinear Problems

      Vol:
    E94-A No:1
      Page(s):
    290-299

    Field programmable gate arrays (FPGAs) are widely used in reliability-critical systems due to their reconfiguration ability. However, with the shrinking device feature size and increasing die area, nowadays FPGAs can be deeply affected by the errors induced by electromigration and radiation. To improve the reliability of FPGA-based reconfigurable systems, a permanent fault recovery approach using a domain partition model is proposed in this paper. In the proposed approach, the fault-tolerant FPGA recovery from faults is realized by reloading a proper configuration from a pool of multiple alternative configurations with overlaps. The overlaps are presented as a set of vectors in the domain partition model. To enhance the reliability, a technical procedure is also presented in which the set of vectors are heuristically filtered so that the corresponding small overlaps can be merged into big ones. Experimental results are provided to demonstrate the effectiveness of the proposed approach through applying it to several benchmark circuits. Compared with previous approaches, the proposed approach increased MTTF by up to 18.87%.

  • A VGA 30 fps Affine Motion Model Estimation VLSI for Real-Time Video Segmentation

    Yoshiki YUNBE  Masayuki MIYAMA  Yoshio MATSUDA  

     
    PAPER-Computer System

      Vol:
    E93-D No:12
      Page(s):
    3284-3293

    This paper describes an affine motion estimation processor for real-time video segmentation. The processor estimates the dominant motion of a target region with affine parameters. The processor is based on the Pseudo-M-estimator algorithm. Introduction of an image division method and a binary weight method to the original algorithm reduces data traffic and hardware costs. A pixel sampling method is proposed that reduces the clock frequency by 50%. The pixel pipeline architecture and a frame overlap method double throughput. The processor was prototyped on an FPGA; its function and performance were subsequently verified. It was also implemented as an ASIC. The core size is 5.05.0 mm2 in 0.18 µm process, standard cell technology. The ASIC can accommodate a VGA 30 fps video with 120 MHz clock frequency.

161-180hit(330hit)