The search functionality is under construction.
The search functionality is under construction.

Keyword Search Result

[Keyword] reorder buffer(4hit)

1-4hit
  • An Energy Efficient Instruction Window for Scalable Processor Architecture

    Min CHOI  Seungryoul MAENG  

     
    PAPER

      Vol:
    E91-C No:9
      Page(s):
    1427-1436

    Modern microprocessors achieve high application performance at the acceptable level of power dissipation. In terms of power to performance trade-off, the instruction window is particularly important. This is because enlarging the window size achieves high performance but naive scaling of the conventional instruction window can severely increase the complexity and power consumption. In this paper, we propose low-power instruction window techniques for contemporary microprocessors. First, the small reorder buffer (SROB) reduces power dissipation by deferred allocation and early release. The deferred allocation delays the SROB allocation of instructions until their all data dependencies are resolved. Then, the instructions are executed in program order and they are released faster from the SROB. This results in higher resource utilization and low power consumption. Second, we replace a conventional issue queue by a direct lookup table (DLT) with an efficient tag translation technique. The translation scheme resolves the instruction dependency, especially for the case of one producer to multiple consumers. The efficiency of the translation scheme stems from the fact that the vast majority of instruction dependency exists within a basic block. Experimental results show that our proposed design reduces the power consumption significantly for SPEC2000 benchmarks.

  • Asynchronous Reorder Buffer for Asynchronous On-Chip Bus

    Eun-Gu JUNG  Dong-Soo HAR  

     
    LETTER-Integrated Electronics

      Vol:
    E88-C No:12
      Page(s):
    2391-2394

    In this letter, a new asynchronous Re-Order Buffer (ROB) with fully distributed control is proposed for an asynchronous on-chip bus. Due to the fully distributed control by each dedicated controller, the proposed ROB has high modularity and scalability. Simulation results show that the proposed asynchronous ROB can operate on an asynchronous on-chip bus of 2.01 Gbit/s throughput and 0.232 nJ power consumption per bus transaction.

  • Reorder Buffer Structure with Shelter Buffer for Out-of-Order Issue Superscalar Processors

    Mun-Suek CHANG  Choung-Shik PARK  Sang-Bang CHOI  

     
    PAPER

      Vol:
    E83-A No:6
      Page(s):
    1091-1099

    The reorder buffer is usually employed to maintain the instruction execution in the correct order for a superscalar pipeline with out-of-order issue. In this paper, we propose a reorder buffer structure with shelter buffer for out-of-order issue superscalar processors not only to control stagnation efficiently, but also to reduce the buffer size. We can get remarkable performance improvement with only one or two buffers. Simulation results show that if the size of reorder buffer is between 8 and 32, performance gain obtained from the shelter is noticeable. For the shelter buffer of size 4, there is no performance improvement compared to that of size 2, which means that the shelter buffer of size 2 is large enough to handle most of the stagnation. If the shelter buffer of size 2 is employed, we can reduce the reorder buffer by 44% in Whetstone, 50% in FFT, 60% in FM, and 75% in Linpack benchmark program without loss of any throughput. Execution time is also improved by 19.78% in Whetstone, 19.67% in FFT, 23.93% in FM, and 8.65% in Linpack benchmark when the shelter buffer is used.

  • System Performance Analyses of Out-of-Order Superscalar Processors Using Analytical Method

    Hak-Jun KIM  Sun-Mo KIM  Sang-Bang CHOI  

     
    PAPER

      Vol:
    E82-A No:6
      Page(s):
    927-938

    This research presents a novel analytic model to predict the instruction execution rate of superscalar processors using the queuing model with finite-buffer size and synchronous operation mode. The proposed model is also able to analyze the performance relationship between cache and pipeline. The proposed model takes into account various kinds of architectural parameters such as instruction-level parallelism, branch probability, the accuracy of branch prediction, cache miss, and etc. To prove the correctness of the model, we performed extensive simulations and compared the results with the analytic model. Simulation results showed that the proposed model can estimate the average execution rate accurately within 10% error in most cases. The proposed model can explain the causes of performance bottleneck which cannot be uncovered by the simulation method only. The model is also able to show the effect of the cache miss on the performance of out-of-order issue superscalar processors, which can provide an valuable information in designing a balanced system.