The search functionality is under construction.

Author Search Result

[Author] Hamid NOORI(4hit)

1-4hit
  • A Reconfigurable Functional Unit with Conditional Execution for Multi-Exit Custom Instructions

    Hamid NOORI  Farhad MEHDIPOUR  Koji INOUE  Kazuaki MURAKAMI  

     
    PAPER

      Vol:
    E91-C No:4
      Page(s):
    497-508

    Encapsulating critical computation subgraphs as application-specific instruction set extensions is an effective technique to enhance the performance of embedded processors. However, the addition of custom functional units to the base processor is required to support the execution of these custom instructions. Although automated tools have been developed to reduce the long design time needed to produce a new extensible processor for each application, short time-to-market, significant non-recurring engineering and design costs are issues. To address these concerns, we introduce an adaptive extensible processor in which custom instructions are generated and added after chip-fabrication. To support this feature, custom functional units (CFUs) are replaced by a reconfigurable functional unit (RFU). The proposed RFU is based on a matrix of functional units which is multi-cycle with the capability of conditional execution. A quantitative approach is utilized to propose an efficient architecture for the RFU and fix its constraints. To generate more effective custom instructions, they are extended over basic blocks and hence, multiple exits custom instructions are proposed. Conditional execution has been added to the RFU to support the multi-exit feature of custom instructions. Experimental results show that multi-exit custom instructions enhance the performance by an average of 67% compared to custom instructions limited to one basic block. A maximum speedup of 4.7, compared to a general embedded processor, and an average speedup of 1.85 was achieved on MiBench benchmark suite.

  • Temperature-Aware Configurable Cache to Reduce Energy in Embedded Systems

    Hamid NOORI  Maziar GOUDARZI  Koji INOUE  Kazuaki MURAKAMI  

     
    PAPER

      Vol:
    E91-C No:4
      Page(s):
    418-431

    Energy consumption is a major concern in embedded computing systems. Several studies have shown that cache memories account for 40% or more of the total energy consumed in these systems. Active power used to be the primary contributor to total power dissipation of CMOS designs, but with the technology scaling, the share of leakage in total power consumption of digital systems continues to grow. Moreover, temperature is another factor that exponentially increases the leakage current. In this paper, we show the effect of temperature on the optimal (minimum-energy-consuming) cache configuration for low energy embedded systems. Our results show that for a given application and technology, the optimal cache size moves toward smaller caches at higher temperatures, due to the larger leakage. Consequently, a Temperature-Aware Configurable Cache (TACC) is an effective way to save energy in finer technologies when the embedded system is used in different temperatures. Our results show that using a TACC, up to 61% energy can be saved for instruction cache and 77% for data cache compared to a configurable cache that has been configured for only the corner-case temperature (100). Furthermore, the TACC also enhances the performance by up to 28% for the instruction cache and up to 17% for the data cache.

  • Improving Performance and Energy Saving in a Reconfigurable Processor via Accelerating Control Data Flow Graphs

    Farhad MEHDIPOUR  Hamid NOORI  Morteza SAHEB ZAMANI  Koji INOUE  Kazuaki MURAKAMI  

     
    PAPER-Reconfigurable Device and Design Tools

      Vol:
    E90-D No:12
      Page(s):
    1956-1966

    Extracting frequently executed (hot) portions of the application and executing their corresponding data flow graph (DFG) on the hardware accelerator brings about more speedup and energy saving for embedded systems comprising a base processor integrated with a tightly coupled accelerator. Extending DFGs to support control instructions and using Control DFGs (CDFGs) instead of DFGs results in more coverage of application code portion are being accelerated hence, more speedup and energy saving. In this paper, motivations for extending DFGs to CDFGs and handling control instructions are introduced. In addition, basic requirements for an accelerator with conditional execution support are proposed. Then, two algorithms are presented for temporal partitioning of CDFGs considering the target accelerator architectural constraints. To demonstrate effectiveness of the proposed ideas, they are applied to the accelerator of a reconfigurable processor called AMBER. Experimental results approve the remarkable effectiveness of covering control instructions and using CDFGs versus DFGs in the aspects of performance and energy reduction.

  • Rapid Design Space Exploration of a Reconfigurable Instruction-Set Processor

    Farhad MEHDIPOUR  Hamid NOORI  Koji INOUE  Kazuaki MURAKAMI  

     
    PAPER-Embedded, Real-Time and Reconfigurable Systems

      Vol:
    E92-A No:12
      Page(s):
    3182-3192

    Multitude parameters in the design process of a reconfigurable instruction-set processor (RISP) may lead to a large design space and remarkable complexity. Quantitative design approach uses the data collected from applications to satisfy design constraints and optimize the design goals while considering the applications' characteristics; however it highly depends on designer observations and analyses. Exploring design space can be considered as an effective technique to find a proper balance among various design parameters. Indeed, this approach would be computationally expensive when the performance evaluation of the design points is accomplished based on the synthesis-and-simulation technique. A combined analytical and simulation-based model (CAnSO**) is proposed and validated for performance evaluation of a typical RISP. The proposed model consists of an analytical core that incorporates statistics collected from cycle-accurate simulation to make a reasonable evaluation and provide a valuable insight. CAnSO has clear speed advantages and therefore it can be used for easing a cumbersome design space exploration of a reconfigurable RISP processor and quick performance evaluation of slightly modified architectures.