IEICE global.ieice.org Site

Keyword Search Result

[Keyword] register renaming(4hit)

1-4hit

Improvement of Renamed Trace Cache through the Reduction of Dependent Path Length for High Energy Efficiency
Ryota SHIOYA Hideki ANDO

PAPER-Computer System

Pubricized:
2015/12/04
Vol:
E99-D No:3
Page(s):
630-640
Out-of-order superscalar processors rename register numbers to remove false dependencies between instructions. A renaming logic for register renaming is a high-cost module in a superscalar processor, and it consumes considerable energy. A renamed trace cache (RTC) was proposed for reducing the energy consumption of a renaming logic. An RTC caches and reuses renamed operands, and thus, register renaming can be omitted on RTC hits. However, conventional RTCs suffer from several performance, energy consumption, and hardware overhead problems. We propose a semi-global renamed trace cache (SGRTC) that caches only renamed operands that are short distance from producers outside traces, and solves the problems of conventional RTCs. Evaluation results show that SGRTC achieves 64% lower energy consumption for renaming with a 0.2% performance overhead as compared to a conventional processor.
Physical Register Sharing through Value Similarity Detection
In Pyo HONG Ha Young JEONG Yong Surk LEE

LETTER-Computer Systems

Vol:
E89-D No:10
Page(s):
2678-2681
Modern processors have large instruction windows to improve performance. They usually adopt register renaming, where every active instruction with a valid destination needs a physical register. As the instruction windows get larger, however, bigger physical register files are required. To solve this problem, we proposed a physical register sharing technique. It shares a physical register among multiple instructions based on a value similarity. As a result, we achieved performance improvement without increasing the size of the physical register file. In addition, the proposed technique can also be used to reduce the timing, complexity and area overhead of the physical register file.
A Low-Cost Recovery Mechanism for Processors with Large Instruction Windows
In Pyo HONG Byung In MOON Yong Surk LEE

LETTER-VLSI Systems

Vol:
E89-D No:6
Page(s):
1967-1970
The latest processors employ a large instruction window and longer pipelines to achieve higher performance. Although current branch predictors show high accuracy, the misprediction penalty is getting larger in proportion to the number of pipeline stages and pipeline width. This negative effect also happens in case of exceptions or interrupts. Therefore, it is important to recover processor state quickly and restart processing immediately. In this letter, we propose a low-cost recovery mechanism for processors with large instruction windows.
Dynamic Fast Issue (DFI) Mechanism for Dynamic Scheduled Processors
Abderazek BEN ABDALLAH Mudar SAREM Masahiro SOWA

PAPER-VLSI Architecture

Vol:
E83-A No:12
Page(s):
2417-2425
Superscalar processors can achieve increased performance by issuing instructions Out-of-Order (OoO) from the original instruction stream. Implementing an OoO instruction scheme requires a hardware mechanism to prevent incorrectly executed instructions from updating registers values. In addition, performance decreases if data dependencies, a branch or a trap among instructions appears. To this end we propose a new mechanism named Dynamic Fast Issue (DFI) mechanism to issue instructions in an OoO fashion to multiple parallel functional units without considerable hardware complexity. The above system, which will be implemented in our Superscalar Functional Assignments Register Microprocessor(FARM), solves data dependencies, supports precise interrupt and branch prediction, which are the main problems associated with the dynamic scheduling of instructions in superscalar machines. Results are written only once,Write-once, directly into the register file (RF). To ensure that results are written in order in their appropriate output registers, a record of instruction order and state is maintained by a status buffer (STB). A 64 entries integrated register file is implemented to hold both renamed and logical registers. To recover the processor state from an interrupt or a branch miss-prediction, a status buffer (STB) and a recovery list table (RLT) are implemented. Novel aspects of the above system architecture as well as the principle underlying this process and the constraints that must be met is presented. Performance evaluation results are performed through full-pipelined-level architectural simulator and SPECint95 benchmark programs.