The search functionality is under construction.

Keyword Search Result

[Keyword] dynamically reconfigurable processor(8hit)

1-8hit
  • Acceleration of Block Matching on a Low-Power Heterogeneous Multi-Core Processor Based on DTU Data-Transfer with Data Re-Allocation

    Yoshitaka HIRAMATSU  Hasitha Muthumala WAIDYASOORIYA  Masanori HARIYAMA  Toru NOJIRI  Kunio UCHIYAMA  Michitaka KAMEYAMA  

     
    PAPER-Integrated Electronics

      Vol:
    E95-C No:12
      Page(s):
    1872-1882

    The large data-transfer time among different cores is a big problem in heterogeneous multi-core processors. This paper presents a method to accelerate the data transfers exploiting data-transfer-units together with complex memory allocation. We used block matching, which is very common in image processing, to evaluate our technique. The proposed method reduces the data-transfer time by more than 42% compared to the earlier works that use CPU-based data transfers. Moreover, the total processing time is only 15 ms for a VGA image with 1616 pixel blocks.

  • Iterative Synthesis Methods Estimating Programmable-Wire Congestion in a Dynamically Reconfigurable Processor

    Takao TOI  Takumi OKAMOTO  Toru AWASHIMA  Kazutoshi WAKABAYASHI  Hideharu AMANO  

     
    PAPER-High-Level Synthesis and System-Level Design

      Vol:
    E94-A No:12
      Page(s):
    2619-2627

    Iterative synthesis methods for making aware of wire congestion are proposed for a multi-context dynamically reconfigurable processor (DRP) with a large number of processing elements (PEs) and programmable-wire connections. Although complex data-paths can be synthesized using the programmable-wire, its delay is long especially when wire connections are congested. We propose two iterative synthesis techniques between a high-level synthesizer (HLS) and the place & route tool to shorten the prolonged wire delay. First, we feed back wire delays for each context to a scheduler in the HLS. The experimental results showed that a critical-path delay was shorten by 21% on average for applications with timing closure problems. Second, we skip the routing and estimate wire delays based on the congestion. The synthesis time was shorten to 1/3 causing delay improvement rate degradation at two points on average.

  • Resource Minimization Method Satisfying Delay Constraint for Replicating Large Contents

    Sho SHIMIZU  Hiroyuki ISHIKAWA  Yutaka ARAKAWA  Naoaki YAMANAKA  Kosuke SHIBA  

     
    PAPER-Fundamental Theories for Communications

      Vol:
    E92-B No:10
      Page(s):
    3102-3110

    How to minimize the number of mirroring resources under a QoS constraint (resource minimization problem) is an important issue in content delivery networks. This paper proposes a novel approach that takes advantage of the parallelism of dynamically reconfigurable processors (DRPs) to solve the resource minimization problem, which is NP-hard. Our proposal obtains the optimal solution by running an exhaustive search algorithm suitable for DRP. Greedy algorithms, which have been widely studied for tackling the resource minimization problem, cannot always obtain the optimal solution. The proposed method is implemented on an actual DRP and in experiments reduces the execution time by a factor of 40 compared to the conventional exhaustive search algorithm on a Pentium 4 (2.8 GHz).

  • A Preemption Algorithm for a Multitasking Environment on Dynamically Reconfigurable Processors

    Vu Manh TUAN  Hideharu AMANO  

     
    PAPER-Computer Systems

      Vol:
    E91-D No:12
      Page(s):
    2793-2803

    Task preemption is a critical mechanism for building an effective multi-tasking environment on dynamically reconfigurable processors. When a task is preempted, its necessary state information must be correctly preserved in order for the task to be resumed later. Not only do coarse-grained Dynamically Reconfigurable Processing Array (DRPAs) devices have different architectures using a variety of development tools, but the great amount of state data of hardware tasks executing on such devices are usually distributed on many different storage elements. To address these difficulties, this paper aims at studying a general method for capturing the state data of hardware tasks targeting coarse-grained DRPAs. Based on resource usage, algorithms for identifying preemption points and inserting preemption states subject to user-specified preemption latency are proposed. Moreover, a modification to automatically incorporate proposed steps into the system design flow is also discussed. The performance degradation caused by additional preemption states is minimized by allowing preemption only at predefined points where demanded resources are small. The evaluation result using a model based on NEC Electronics' DRP-1 shows that the proposed method can produce preemption points satisfying a given preemption latency with reasonable hardware overhead (from 6% to 15%).

  • A Mapping Method for Multi-Process Execution on Dynamically Reconfigurable Processors

    Vu MANH TUAN  Hideharu AMANO  

     
    PAPER-Computer Systems

      Vol:
    E91-D No:9
      Page(s):
    2312-2322

    The multi-process execution in dynamically reconfigurable processors is a technique to enhance throughput by trying to exploit more inherent parallelism of applications. Basically, a total process for an application is divided into small processes, assigned into limited areas of a reconfigurable array, and concurrently executed in a pipelined manner. In order to improve the efficiency of the multi-process execution, a systematic method for mapping processes onto a reconfigurable array consisting of multiple hardware execution units is essential. This paper proposes and investigates a systematic method for mapping an application modeled as a Kahn Process Network onto a dynamically reconfigurable processing array. In order to execute streaming applications in a pipelined manner, the size of Tiles, which is a unit area of dynamically reconfigurable array, and the grouping of processes are adjusted. Using real applications such as DCT, JPEG encoder and Turbo encoder, the impact of different versions mapped onto the NEC Dynamically Reconfigurable Processor on performance is evaluated. Evaluation results show that our proposed mapping algorithm achieves the best performance in terms of the throughput and the execution time.

  • A Self-Test of Dynamically Reconfigurable Processors with Test Frames

    Tomoo INOUE  Takashi FUJII  Hideyuki ICHIHARA  

     
    PAPER-High-Level Testing

      Vol:
    E91-D No:3
      Page(s):
    756-762

    This paper proposes a self-test method of coarse grain dynamically reconfigurable processors (DRPs) without hardware overhead. In the method, processor elements (PEs) compose a test frame, which consists of test pattern generators (TPGs), processor elements under test (PEUTs) and response analyzers (RAs), while testing themselves one another by changing test frames appropriately. We design several test frames with different structures, and discuss the relationship of the structures to the numbers of contexts and test frames for testing all the functions of PEs. A case study shows that there exists an optimal test frame which minimizes the test application time under a constraint.

  • A Survey on Dynamically Reconfigurable Processors Open Access

    Hideharu AMANO  

     
    INVITED PAPER

      Vol:
    E89-B No:12
      Page(s):
    3179-3187

    Dynamically reconfigurable processors are consisting of an array of processing elements whose functions and interconnections can be dynamically changed. 9 commercial systems are picked up, and their array structures, processing elements and interconnection architectures are classified.

  • Dynamically Reconfigurable Processor Implemented with IPFlex's DAPDNA Technology

    Takayuki SUGAWARA  Keisuke IDE  Tomoyoshi SATO  

     
    INVITED PAPER

      Vol:
    E87-D No:8
      Page(s):
    1997-2003

    The DAPDNA®-2 is the world's first general purpose dynamically reconfigurable processor for commercial usage. It is a dual-core processor consisting of a custom RISC core called the Digital Application Processor (DAP), and a two dimensional array of dynamically reconfigurable processing elements referred to as the Distributed Network Architecture (DNA). The DAP has a 32 bit instruction set architecture with an 8 KB instruction cache and 8 KB data cache that can be accessed in one clock cycle. It has an interrupt control function to detect data processing completion in the DNA-Matrix. The DNA-Matrix has different types of data processing elements such as ALU, delay, and memory elements to process fully parallel computations. The DNA-Matrix includes 32 independent 16 KB high speed SRAM elements (in total 512 KB). The DNA-Matrix, even with its parallel computational capability, can be synchronized and co-work at the same clock frequency as the DAP. The processor operates at a 166 MHz working frequency and fabricated with a 0.11 µm CMOS process. The DAPDNA-2 device can be connected directly with up to 16 units with linear scalability in processing performance, provided the bandwidth requirement is within the maximum communication speed between DNAs, which is 32 Gbps. The DAPDNA-2 performs at a level that is two orders of magnitude higher than conventional high performance processors.