The search functionality is under construction.

Keyword Search Result

[Keyword] hardware/software cosynthesis(10hit)

1-10hit
  • A SIMD Instruction Set and Functional Unit Synthesis Algorithm with SIMD Operation Decomposition

    Nozomu TOGAWA  Koichi TACHIKAKE  Yuichiro MIYAOKA  Masao YANAGISAWA  Tatsuo OHTSUKI  

     
    PAPER-Programmable Logic, VLSI, CAD and Layout

      Vol:
    E88-D No:7
      Page(s):
    1340-1349

    This paper focuses on SIMD processor synthesis and proposes a SIMD instruction set/functional unit synthesis algorithm. Given an initial assembly code and a timing constraint, the proposed algorithm synthesizes an area-optimized processor core with optimal SIMD functional units. It also synthesizes a SIMD instruction set. The input initial assembly code is assumed to run on a full-resource SIMD processor (virtual processor) which has all the possible SIMD functional units. In our algorithm, we introduce the SIMD operation decomposition and apply it to the initial assembly code and the full-resource SIMD processor. By gradually reducing SIMD operations or decomposing SIMD operations, we can finally find a processor core with small area under the given timing constraint. The promising experimental results are also shown.

  • Sub-operation Parallelism Optimization in SIMD Processor Core Synthesis

    Hideki KAWAZU  Jumpei UCHIDA  Yuichiro MIYAOKA  Nozomu TOGAWA  Masao YANAGISAWA  Tatsuo OHTSUKI  

     
    PAPER

      Vol:
    E88-A No:4
      Page(s):
    876-884

    A b-bit SIMD functional unit has n k-bit sub-functional units in itself, where b = k n. It can execute n-parallel k-bit operations. However, all the b-bit functional units in a processor core do not necessarily execute n-parallel operations. Depending on an application program, some of them just execute n/2-parallel operations or even n/4-parallel operations. This means that we can modify a b-bit SIMD functional unit so that it has n/2 k-bit sub-functional units or n/4 k-bit sub-functional units. The number of k-bit sub-functional units in a SIMD functional unit is called sub-operation parallelism. We incorporate a sub-operation parallelism optimization algorithm into SIMD functional unit optimization. Our proposed algorithm gradually reduces sub-operation parallelism of a SIMD functional unit while the timing constraint of execution time satisfied. Thereby, we can finally find a processor core with small area under the given timing constraint. We expect that we can obtain processor core configurations of smaller area in the same timing constraint rather than a conventional system. The promising experimental results are also shown.

  • A Hardware/Software Cosynthesis Algorithm for Processors with Heterogeneous Datapaths

    Yuichiro MIYAOKA  Nozomu TOGAWA  Masao YANAGISAWA  Tatsuo OHTSUKI  

     
    PAPER

      Vol:
    E87-A No:4
      Page(s):
    830-836

    This paper proposes a hardware/software cosynthesis algorithm for processors with heterogeneous registers. Given a CDFG corresponding to an application program and a timing constraint, the algorithm generates a processor configuration minimizing area of the processor and an assembly code on the processor. First, the algorithm configures a datapath which can execute several DFG nodes with data dependency at one cycle. The datapath can execute the application program at the least number of cycles. The branch and bound algorithm is applied and all the number of functional units and memory banks are tried. For an assumed number of functional units and memory banks, an appropriate number of heterogeneous registers and connections to functional units and registers are explored. The experimental results show effectiveness and efficiency of the algorithm.

  • A Hardware/Software Partitioning Algorithm for Processor Cores with Packed SIMD-Type Instructions

    Nozomu TOGAWA  Koichi TACHIKAKE  Yuichiro MIYAOKA  Masao YANAGISAWA  Tatsuo OHTSUKI  

     
    LETTER-Design Methodology

      Vol:
    E86-A No:12
      Page(s):
    3218-3224

    This letter proposes a new hardware/software partitioning algorithm for processor cores with SIMD instructions. Given a compiled assembly code including SIMD instructions and a timing constraint, the proposed algorithm synthesizes an area-optimized processor core with a new assembly code. Firstly, we assume for each operation type a super SIMD functional unit which can execute all the SIMD instructions. Secondly we reduce a SIMD instruction or "sub-function" of each super functional unit, one by one, while the timing constraint is satisfied. At the same time, we update the assembly code so that it can run on the new processor configuration. By repeating this process, we finally find SIMD functional unit configuration as well as a processor core architecture. The promising experimental results are also shown.

  • A Retargetable Simulator Generator for DSP Processor Cores with Packed SIMD-type Instructions

    Nozomu TOGAWA  Kyosuke KASAHARA  Yuichiro MIYAOKA  Jinku CHOI  Masao YANAGISAWA  Tatsuo OHTSUKI  

     
    PAPER-Simulation Accelerator

      Vol:
    E86-A No:12
      Page(s):
    3099-3109

    A packed SIMD type operation or a SIMD operation is n-parallel b/n-bit sub-operations executed by the modified n-bit functional unit. Such a functional unit is called a SIMD functional unit and a processor core which can execute SIMD operations is called a SIMD processor core. SIMD operations can be effectively applied to image processing applications. This paper focuses on hardware/software cosynthesis of SIMD processor cores and particularly proposes a new simulator generator which simulates pipelined instructions for a SIMD processor. Generally, a SIMD functional unit has many options and then we can have so many different SIMD functional unit instances. However, since our hardware/software cosynthesis system synthesizes a special-purpose processor core for an input application program, it uses very limited SIMD functional unit instances. In the proposed approach, we consider a SIMD operation to be a set of SIMD sub-operations. By adding up the appropriate SIMD sub-operations, we construct a single SIMD operation. Then a SIMD functional unit behavior can be characterized by a collection of SIMD operations. This approach has the advantage that: if we have a small number of behavior libraries for SIMD sub-operations, we can instantiate a particular SIMD functional unit behavior. Experimental results demonstrate the effectiveness of the proposed approach.

  • A Hardware/Software Cosynthesis System for Processor Cores with Content Addressable Memories

    Nozomu TOGAWA  Takao TOTSUKA  Tatsuhiko WAKUI  Masao YANAGISAWA  Tatsuo OHTSUKI  

     
    PAPER

      Vol:
    E86-A No:5
      Page(s):
    1082-1092

    Content addressable memory (CAM) is one of the functional memories which realize word-parallel equivalence search. Since a CAM unit is generally used in a particular application program, we consider that appropriate design for CAM units is required depending on the requirements for the application program. This paper proposes a hardware/software cosynthesis system for CAM processors. The input of the system is an application program written in C including CAM functions and a constraint for execution time (or CAM processor area). Its output is hardware descriptions of a synthesized processor and a binary code executed on it. Based on the branch-and-bound method, the system determines which CAM function is realized by a hardware and which CAM function is realized by a software with meeting the given timing constraint (or area constraint) and minimizing the CAM processor area (or execution time of the application program). We expect that we can realize optimal CAM processor design for an application program. Experimental results for several application programs show that we can obtain a CAM processor whose area is minimum with meeting the given timing constraint.

  • A New Hardware/Software Partitioning Algorithm for DSP Processor Cores with Two Types of Register Files

    Nozomu TOGAWA  Takashi SAKURAI  Masao YANAGISAWA  Tatsuo OHTSUKI  

     
    LETTER-Hardware/Software Codesign

      Vol:
    E84-A No:11
      Page(s):
    2802-2807

    This letter proposes a hardware/software partitioning algorithm for digital signal processor cores with two register files. Given a compiled assembly code and a timing constraint of execution time, the proposed algorithm generates a processor core configuration with a new assembly code running on the generated processor core. The proposed algorithm considers two register files and determines the number of registers in each of register files. Moreover the algorithm considers two or more types of functional units for each arithmetic or logical operation and assigns functional units with small area to a processor core without causing performance penalty. A generated processor core will have small area compared with processor cores which have a single register file or those which consider only one type of functional units for each operation. The experimental results demonstrate the effectiveness and efficiency of the proposed algorithm.

  • Area and Delay Estimation in Hardware/Software Cosynthesis for Digital Signal Processor Cores

    Nozomu TOGAWA  Yoshiharu KATAOKA  Yuichiro MIYAOKA  Masao YANAGISAWA  Tatsuo OHTSUKI  

     
    PAPER-Hardware/Software Codesign

      Vol:
    E84-A No:11
      Page(s):
    2639-2647

    Hardware/software partitioning is one of the key processes in a hardware/software cosynthesis system for digital signal processor cores. In hardware/software partitioning, area and delay estimation of a processor core plays an important role since the hardware/software partitioning process must determine which part of a processor core should be realized by hardware units and which part should be realized by a sequence of instructions based on execution time of an input application program and area of a synthesized processor core. This paper proposes area and delay estimation equations for digital signal processor cores. For area estimation, we show that total area for a processor core can be derived from the sum of area for a processor kernel and area for additional hardware units. Area for a processor kernel can be mainly obtained by minimum area for a processor kernel and overheads for adding hardware units and registers. Area for a hardware unit can be mainly obtained by its type and operation bit width. For delay estimation, we show that critical path delay for a processor core can be derived from the delay of a hardware unit which is on the critical path in the processor core. Experimental results demonstrate that errors of area estimation are less than 2% and errors of delay estimation are less than 2 ns when comparing estimated area and delay with logic-synthesized area and delay.

  • A Hardware/Software Cosynthesis System for Digital Signal Processor Cores with Two Types of Register Files

    Nozomu TOGAWA  Masao YANAGISAWA  Tatsuo OHTSUKI  

     
    PAPER

      Vol:
    E83-A No:3
      Page(s):
    442-451

    In digital signal processing, bit width of intermediate variables should be longer than that of input and output variables in order to execute intermediate operations with high precision. Then a processor core for digital signal processing is required to have two types of register files, one of which is used by input and output variables and the other one is used by intermediate variables. This paper proposes a hardware/software cosynthesis system for digital signal processor cores with two types of register files. Given an application program and its data, the system synthesizes a hardware description of a processor core, an object code running on the processor core, and software environments. A synthesized processor core can be composed of a processor kernel, multiple data memory buses, hardware loop units, addressing units, and multiple functional units. Furthermore it can have two types of register files RF1 and RF2. The bit width and number of registers in RF1 or RF2 will be determined based on a given application program. Thus a synthesized processor core will have small area with keeping high precision of intermediate operations compared with a processor core with only one register file. The experimental results demonstrate the effectiveness of the proposed system.

  • A Hardware/Software Cosynthesis System for Digital Signal Processor Cores

    Nozomu TOGAWA  Masao YANAGISAWA  Tatsuo OHTSUKI  

     
    PAPER

      Vol:
    E82-A No:11
      Page(s):
    2325-2337

    This paper proposes a hardware/software cosynthesis system for digital signal processor cores and a hardware/software partitioning algorithm which is one of the key issues for the system. The target processor has a VLIW-type core which can be composed of a processor kernel, multiple data memory buses (X-bus and Y-bus), hardware loop units, addressing units, and multiple functional units. The processor kernel includes five pipeline stages (RISC-type kernel) or three pipeline stages (DSP-type kernel). Given an application program written in the C language and a set of application data, the system synthesizes a processor core by selecting an appropriate kernel (RISC-type or DSP-type kernel) and required hardware units according to the application program/data and the hardware costs. The system also generates the object code for the application program and a software environment (compiler and simulator) for the processor core. The experimental results demonstrate that the system synthesizes processor cores effectively according to the features of an application program and the synthesized processor cores execute most application programs with the minimum number of clock cycles compared with several existing processors.