The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] codesign(14hit)

1-14hit
  • A Coarse-Grain Hierarchical Technique for 2-Dimensional FFT on Configurable Parallel Computers

    Xizhen XU  Sotirios G. ZIAVRAS  

     
    PAPER-Parallel/Distributed Algorithms

      Vol:
    E89-D No:2
      Page(s):
    639-646

    FPGAs (Field-Programmable Gate Arrays) have been widely used as coprocessors to boost the performance of data-intensive applications [1],[2]. However, there are several challenges to further boost FPGA performance: the communication overhead between the host workstation and the FPGAs can be substantial; large-scale applications cannot fit in a single FPGA because of its limited capacity; mapping an application algorithm to FPGAs still remains a daunting job in configurable system design. To circumvent these problems, we propose in this paper the FPGA-based Hierarchical-SIMD (H-SIMD) machine with its codesign of the Pyramidal Instruction Set Architecture (PISA). PISA comprises high-level instructions implemented as FPGA functions of coarse-grain SIMD (Single-Instruction, Multiple-Data) tasks to facilitate ease of program development, code portability across different H-SIMD implementations and high performance. We assume a multi-FPGA board where each FPGA is configured as a separate SIMD machine. Multiple FPGA chips can work in unison at a higher SIMD level, if needed, controlled by the host. Additionally, by using a memory switching scheme and the high-level PISA to partition applications into coarse-grain tasks, host-FPGA communication overheads can be hidden. We enlist the two-dimensional Fast Fourier Transform (2D FFT) to test the effectiveness of H-SIMD. The test results show sustained high performance for this problem. The H-SIMD machine even outperforms a Xeon processor for this problem.

  • Efficient Hardware-Software Partitioning for a Digital Dental X-Ray System

    Jong Dae KIM  Yong Up LEE  Seokyu KIM  

     
    PAPER-Systems and Control

      Vol:
    E86-A No:4
      Page(s):
    859-865

    This paper presents the design considerations for a digital dental X-ray system with a commercial CCD sensor. Especially the system should be able to work with several X-ray machines even with them for the classical film. The hardware-software co-design methodology is employed to optimize the system. The full digital implementation is assumed for the reliability of the system. The considered functions cover the pre-processing such as the exposure detection, clamping and the dark level correction and the post-processing such as gray level compensation. It is analyzed with some other constraints in order to make the final partition. The entire system based on the partition will be described.

  • Synthesising Application-Specific Heterogeneous Multiprocessors Using Differential Evolution

    Allan RAE  Sri PARAMESWARAN  

     
    PAPER-VLSI Design Technology and CAD

      Vol:
    E84-A No:12
      Page(s):
    3125-3131

    This paper presents an application-specific, heterogeneous multiprocessor synthesis system, named HeMPS, that combines a form of Evolutionary Computation known as Differential Evolution with a scheduling heuristic to search the design space efficiently. We demonstrate the effectiveness of our technique by comparing it to similar existing systems. The proposed strategy is shown to be faster than recent systems on large problems while providing equivalent or improved final solutions.

  • A System Level Optimization Technique for Application Specific Low Power Memories

    Tohru ISHIHARA  Kunihiro ASADA  

     
    PAPER-Optimization of Power and Timing

      Vol:
    E84-A No:11
      Page(s):
    2755-2761

    A system level approach for a memory power reduction is proposed in this paper. The basic idea is allocating frequently executed object codes into a small subprogram memory and optimizing supply voltage and threshold voltage of the subprogram memory. Since large scale memory contains a lot of direct paths from power supply to ground, power dissipation caused by subthreshold leakage current is more serious than dynamic power dissipation. Our approach optimizes the size of subprogram memory, supply voltage, and threshold voltage so as to minimize memory power dissipation including static power dissipation caused by leakage current. A heuristic algorithm which determines code allocation, supply voltage, and threshold voltage simultaneously so as to minimize power dissipation of memories is proposed as well. Our experiments with some benchmark programs demonstrate significant energy reductions up to 80% over a program memory which does not employ our approach.

  • DESC: A Hardware-Software Codesign Methodology for Distributed Embedded Systems

    Trong-Yen LEE  Pao-Ann HSIUNG  Sao-Jie CHEN  

     
    PAPER-VLSI Systems

      Vol:
    E84-D No:3
      Page(s):
    326-339

    The hardware-software codesign of distributed embedded systems is a more challenging task, because each phase of codesign, such as copartitioning, cosynthesis, cosimulation, and coverification must consider the physical restrictions imposed by the distributed characteristics of such systems. Distributed systems often contain several similar parts for which design reuse techniques can be applied. Object-oriented (OO) codesign approach, which allows physical restriction and object design reuse, is adopted in our newly proposed Distributed Embedded System Codesign (DESC) methodology. DESC methodology uses three types of models: Object Modeling Technique (OMT) models for system description and input, Linear Hybrid Automata (LHA) models for internal modeling and verification, and SES/workbench simulation models for performance evaluation. A two-level partitioning algorithm is proposed specifically for distributed systems. Software is synthesized by task scheduling and hardware is synthesized by system-level and object-oriented techniques. Design alternatives for synthesized hardware-software systems are then checked for design feasibility through rapid prototyping using hardware-software emulators. Through a case study on a Vehicle Parking Management System (VPMS), we depict each design phase of the DESC methodology to show benefits of OO codesign and the necessity of a two-level partitioning algorithm.

  • Synthesis of Application-Specific Coprocessor for Core-Based ASIC Design

    Dae-Hyun LEE  In-Cheol PARK  Chong-Min KYUNG  

     
    PAPER-VLSI Design Technology and CAD

      Vol:
    E84-A No:2
      Page(s):
    604-613

    This paper presents an efficient approach for a hardware/software partitioning problem: synthesis of an application-specific coprocessor which accelerates an embedded software running on a main processor. Given a set of data flow graphs (DFGs), most of previous hardware/software partitioning approaches have focused on mapping DFGs to hardware or software. Their common weaknesses are that 1) they ignore various implementation alternatives in realizing DFGs as hardware based on the assumption that only a single hardware implementation exists for a DFG, and that 2) they don't consider the effect of merging on hardware area when synthesizing a coprocessor by merging DFGs. To deal with the first issue, we formulate both the mapping of DFGs to hardware or software and the selection of the appropriate hardware implementation for each DFG as a single integer programming problem, and then apply an iterative algorithm based on the Kernighan and Lin's heuristic to solve the problem. To reduce the CPU time, we have devised data structures that quickly calculate costs of hardware implementations. To deal with the second issue, our method links DFGs with dummy nodes to produce a single large DFG, and then synthesizes a target coprocessor by globally scheduling the DFG and allocating its datapath. Experimental results demonstrate that our approach outperforms the previous approach based on genetic algorithm (GA) in both the coprocessor area and the CPU time.

  • Hardware-Software Multi-Level Partitioning for Distributed Embedded Multiprocessor Systems

    Trong-Yen LEE  Pao-Ann HSIUNG  Sao-Jie CHEN  

     
    PAPER-VLSI Design Technology and CAD

      Vol:
    E84-A No:2
      Page(s):
    614-626

    A novel Multi-Level Partitioning (MLP) technique taking into account real-world constraints for hardware-software partitioning in Distributed Embedded Multiprocessor Systems (DEMS) is proposed. This MLP algorithm uses a gradient metric based on hardware-software cost and performance as the core metric for selection of optimal partitions and consists of three nested levels. The innermost level is a simple binary search that allows quick evaluations of a large number of possible partitions. The middle level iterates over different possible allocations of processors (that execute software) to subsystems. The outermost level iterates over the number of processors and the hardware cost range. Heuristics are applied to each level to avoid the expensive exhaustive search. The application of MLP as a recently purposed Distributed Embedded System Codesign (DESC) methodology shows its feasibility. Comparisons between real-world examples partitioned using MLP and using other existing techniques demonstrate contrasting strengths of MLP. Sharing, clustering, and hierarchical system model are some important features of MLP, which contribute towards producing more optimal partition results.

  • Multicriteria Codesign Optimization for Embedded Multimedia Communication System

    I-Horng JENG  Feipei LAI  

     
    PAPER-Co-design and High-level Synthesis

      Vol:
    E83-A No:12
      Page(s):
    2474-2487

    In the beginning of the new century, many information appliance (IA) products will replace traditional electronic appliances to help people in smart, efficient, and low-cost ways. These successful products must be capable of communicating multimedia information, which is embedded into the electronic appliances with high integration, innovation, and power-throughput tradeoff. In this paper, we develop a codesign procedure to analyze, compare, and emulate the multimedia communication applications to find the candidate implementations under different criteria. The experimental results demonstrate that in general, memory technology dominates the optimal tradeoff and ALU improvements impact greatly on particular applications. The results also show that the proposed procedure is effective and quite efficient.

  • Hardware-Software Timing Coverification of Distributed Embedded Systems

    Jih-Ming FU  Trong-Yen LEE  Pao-Ann HSIUNG  Sao-Jie CHEN  

     
    PAPER-VLSI Systems

      Vol:
    E83-D No:9
      Page(s):
    1731-1740

    Most of current codesign tools or methodologies only support validation in the form of cosimulation and testing of design alternatives. The results of hardware-software codesign of a distributed system are often not verified, because they are not easily verifiable. In this paper, we propose a new formal coverification approach based on linear hybrid automata, and an algorithm for automatically converting codesign results to the linear hybrid automata framework. Our coverification approach allows automatic verification of real-time constraints such as hard deadlines. Another advantage is that the proposed approach is suitable for verifying distributed systems with arbitrary communication patterns and system architecture. The feasibility of our approach is demonstrated through several application examples. The proposed approach has also been successfully used in verifying deadline violations when there are inter-task communications between tasks with different period lengths.

  • System LSI Design Methods for Low Power LSIs

    Hiroto YASUURA  Tohru ISHIHARA  

     
    INVITED PAPER

      Vol:
    E83-C No:2
      Page(s):
    143-152

    Low Power design has emerged as a both practically and theoretically attractive theme in modern LSI system design. This paper presents system level power optimization techniques. A brief survey of system level low power design approaches and several examples in detail are described. It reviews some techniques that have been proposed to overcome the power issue and gives guideline for prospective system level solutions.

  • Hardware Synthesis from C Programs with Estimation of Bit Length of Variables

    Osamu OGAWA  Kazuyoshi TAKAGI  Yasufumi ITOH  Shinji KIMURA  Katsumasa WATANABE  

     
    PAPER

      Vol:
    E82-A No:11
      Page(s):
    2338-2346

    In the hardware synthesis methods with high level languages such as C language, optimization quality of the compilers has a great influence on the area and speed of the synthesized circuits. Among hardware-oriented optimization methods required in such compilers, minimization of the bit length of the data-paths is one of the most important issues. In this paper, we propose an estimation algorithm of the necessary bit length of variables for this aim. The algorithm analyzes the control/data-flow graph translated from C programs and decides the bit length of each variable. On several experiments, the bit length of variables can be reduced by half with respect to the declared length. This method is effective not only for reducing the circuit area but also for reducing the delay of the operation units such as adders.

  • A Memory Power Optimization Technique for Application Specific Embedded Systems

    Tohru ISHIHARA  Hiroto YASUURA  

     
    PAPER

      Vol:
    E82-A No:11
      Page(s):
    2366-2374

    In this paper, a novel application specific power optimization technique utilizing small instruction ROM which is placed between an instruction cache or a main program memory and CPU core is proposed. Our optimization technique targets embedded systems which assume the following: (i) instruction memories are organized by two on-chip memories, a main program memory and a subprogram memory, (ii) these two memories can be independently powered-up or powered-down by a special instruction of a core processor, and (iii) a compiler optimizes an allocation of object code into these two memories so as to minimize average of read energy consumption. In many application programs, only a few basic blocks are frequently executed. Therefore, allocating these frequently executed basic blocks into low power subprogram memory leads significant energy reduction. Our experiments with actual ROM (Read Only Memory) modules created with 0.5 µm CMOS process technology, and MPEG2 codec program demonstrate significant energy reductions up to more than 50% at best case over the previous approach that applies only divided bit and word lines structure.

  • Language and Compiler for Optimizing Datapath Widths of Embedded Systems

    Akihiko INOUE  Hiroyuki TOMIYAMA  Takanori OKUMA  Hiroyuki KANBARA  Hiroto YASUURA  

     
    PAPER-Co-design

      Vol:
    E81-A No:12
      Page(s):
    2595-2604

    The datapath width of a core processor has a strong effect on cost, power consumption, and performance of an embedded system integrated with memories into a single-chip. However, it is difficult for designers to appropriately determine the datapath width for each application because of the limited reusability of software and the lack of compilation techniques. The purpose of this paper is to clarify supports required from software for the optimal datapath width determination. As a solution, an embedded programming language, called Valen-C, and a retargetable Valen-C compiler are proposed. In this paper, the syntax and semantics of Valen-C along with the mechanism of the Valen-C retargetable compiler and how to preserve the accuracy of computation of programs in relation to various datapath widths are also described. Experiments with practical applications show that the total cost of the system including a core processor, ROM, and RAM is drastically reduced with little performance loss by reducing the datapath width.

  • PEAS-I: A Hardware/Software Codesign System for ASIP Development

    Jun SATO  Alauddin Y. ALOMARY  Yoshimichi HONMA  Takeharu NAKATA  Akichika SHIOMI  Nobuyuki HIKICHI  Masaharu IMAI  

     
    PAPER-Computer Aided Design (CAD)

      Vol:
    E77-A No:3
      Page(s):
    483-491

    This paper describes the current implementation and experimental results of a hardware/software codesign system for ASIP (Application Specific Integrated Processor) development: the PEAS-I System. The PEAS-I system accepts a set of application programs written in C language, associated data set, module database, and design constraints such as chip area and power consumption. The system then generates an optimized CPU core design in the form of an HDL as well as a set of application program development tools such as a C compiler, an assembler and a simulator. Another important feature of the PEAS-I system is that the system is able to give accurate estimations of chip area and performance before the detailed design of the ASIP is completed. According to the experimental results, the PEAS-I system has been found to be highly effective and efficient for ASIP development.