The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] reconfigurable architectures(7hit)

1-7hit
  • Range Limiter Using Connection Bounding Box for SA-Based Placement of Mixed-Grained Reconfigurable Architecture

    Takashi KISHIMOTO  Wataru TAKAHASHI  Kazutoshi WAKABAYASHI  Hiroyuki OCHI  

     
    PAPER

      Vol:
    E99-A No:12
      Page(s):
    2328-2334

    In this paper, we propose a novel placement algorithm for mixed-grained reconfigurable architectures (MGRAs). MGRA consists of coarse-grained and fine-grained clusters, in order to implement a combined digital systems of high-speed data paths with multi-bit operands and random logic circuits for state machines and bit-wise operations. For accelerating simulated annealing based FPGA placement algorithm, range limiter has been proposed to control the distance of two blocks to be interchanged. However, it is not applicable to MGRAs due to the heterogeneous structure of MGRAs. Proposed range limiter using connection bounding box effectively keeps the size of range limiter to encourage moves across fine-grain blocks in non-adjacent clusters. From experimental results, the proposed method achieved 47.8% reduction of cost in the best case compared with conventional methods.

  • A Tightly Coupled General Purpose Reconfigurable Accelerator LAPP and Its Power States for HotSpot-Based Energy Reduction

    Jun YAO  Yasuhiko NAKASHIMA  Naveen DEVISETTI  Kazuhiro YOSHIMURA  Takashi NAKADA  

     
    PAPER-Architecture

      Vol:
    E97-D No:12
      Page(s):
    3092-3100

    General purpose many-core architecture (MCA) such as GPGPU has recently been used widely to continue the performance scaling when the continuous increase in the working frequency has approached the manufacturing limitation. However, both the general purpose MCA and its building block general purpose processor (GPP) lack a tuning capability to boost energy efficiency for individual applications, especially computation intensive applications. As an alternative to the above MCA platforms, we propose in this paper our LAPP (Linear Array Pipeline) architecture, which takes a special-purpose reconfigurable structure for an optimal MIPS/W. However, we also keep the backward binary compatibility, which is not featured in most special hardware. More specifically, we used a general purpose VLIW processor, interpreting a commercial VLIW ISA, as the baseline frontend part to provide the backward binary compatibility. We also extended the functional unit (FU) stage into an FU array to form the reconfigurable backend for efficient execution of program hotspots to exploit parallelism. The hardware modules in this general purpose reconfigurable architecture have been locally zoned into several groups to apply preferable low-power techniques according to the module hardware features. Our results show that under a comparable performance, the tightly coupled general/special purpose hardware, which is based on a 180nm cell library, can achieve 10.8 times the MIPS/W of MCA architecture of the same technology features. When a 65 technology node is assumed, a similar 9.4x MIPS/W can be achieved by using the LAPP without changing program binaries.

  • A Data Prefetch and Reuse Strategy for Coarse-Grained Reconfigurable Architectures

    Wei GE  Zhi QI  Yue DU  Lu MA  Longxing SHI  

     
    PAPER-Computer System

      Vol:
    E96-D No:3
      Page(s):
    616-623

    The Coarse Grained Reconfigurable Architectures (CGRAs) are proposed as new choices for enhancing the ability of parallel processing. Data transfer throughput between Reconfigurable Cell Array (RCA) and on-chip local memory is usually the main performance bottleneck of CGRAs. In order to release this stress, we propose a novel data transfer strategy that is called Heuristic Data Prefetch and Reuse (HDPR), for the first time in the case of explicit CGRAs. The HDPR strategy provides not only the flexible data access schedule but also the high data throughput needed to realize fast pipelined implementations of various loop kernels. To improve the data utilization efficiency, a dual-bank cache-like data reuse structure is proposed. Furthermore, a heuristic data prefetch is also introduced to decrease the data access latency. Experimental results demonstrate that when compared with conventional explicit data transfer strategies, our work achieves a significant speedup improvement of, on average, 1.73 times at the expense of only 5.86% increase in area.

  • Design of a Direct Sampling Mixer with a Complex Coefficient Transfer Function

    Yohei MORISHITA  Noriaki SAITO  Koji TAKINAMI  Kiyomichi ARAKI  

     
    PAPER

      Vol:
    E95-C No:6
      Page(s):
    999-1007

    The Direct Sampling Mixer (DSM) with a complex coefficient transfer function is demonstrated. The operation theory and the detail design methodology are discussed for the high order complex DSM, which can achieve large image rejection ratio by introducing the attenuation pole at the image frequency band. The proposed architecture was fabricated in a 65 nm CMOS process. The measured results agree well with the theoretical calculation, which proves the validity of the proposed architecture and the design methodology. By using the proposed design method, it will be possible for circuit designers to design the DSM with large image rejection ratio without repeated lengthy simulations.

  • Configuration Sharing to Reduce Reconfiguration Overhead Using Static Partial Reconfiguration

    Sungjoon JUNG  Tag Gon KIM  

     
    PAPER-Computer Systems

      Vol:
    E91-D No:11
      Page(s):
    2675-2684

    Reconfigurable architectures are one of the most promising solutions satisfying both performance and flexibility. However, reconfiguration overhead in those architectures makes them inappropriate for repetitive reconfigurations. In this paper, we introduce a configuration sharing technique to reduce reconfiguration overhead between similar applications using static partial reconfiguration. Compared to the traditional resource sharing that configures multiple temporal partitions simultaneously and employs a time-multiplexing technique, the proposed configuration sharing reconfigures a device incrementally as an application changes and requires a backend adaptation to reuse configurations between applications. Adopting a data-flow intermediate representation, our compiler framework extends a min-cut placer and a negotiation-based router to deal with the configuration sharing. The results report that the framework could reduce 20% of configuration time at the expense of 1.9% of computation time on average.

  • Accelerating the CKY Parsing Using FPGAs

    Jacir L. BORDIM  Yasuaki ITO  Koji NAKANO  

     
    PAPER

      Vol:
    E86-D No:5
      Page(s):
    803-810

    The main contribution of this paper is to present an FPGA-based implementation of an instance-specific hardware which accelerates the CKY (Cocke-Kasami-Younger) parsing for context-free grammars. Given a context-free grammar G and a string x, the CKY parsing determines whether G derives x. We have developed a hardware generator that creates a Verilog HDL source to perform the CKY parsing for any given context-free grammar G. The generated source is embedded in an FPGA using the design software provided by the FPGA vendor. We evaluated the instance-specific hardware, generated by our hardware generator, using a timing analyzer and tested it using the Altera FPGAs. The generated hardware attains a speed-up factor of approximately 750 over the software CKY parsing algorithm.

  • Remarkable Cycles Reduction in GSM Voice Coding by Reconfigurable Coprocessor with Standard Interface

    Salvatore M. CARTA  Luigi RAFFO  

     
    PAPER-Architecture and Algorithms

      Vol:
    E86-C No:4
      Page(s):
    546-552

    A reconfigurable coprocessor for ETSI-GSM voice coding application domain is presented, synthesized and tested. An average overall reduction of more than 55% cycles with respect to standard RISC processors with DSP features is obtained. Such improvement together with locality and temporal correlation allows a reduction of power consumption, while standard interfacing technique ensures maximum flexibility.