Hideki ANDO Chikako NAKANISHI Hirohisa MACHIDA Tetsuya HARA Masao NAKAYA
Superscalar processors improve performance by exploiting instruction-level parallelism (ILP). ILP in a basic block is, however, not sufficient on non-numerical applications for gaining substantial speedup. Instructions across branches are required to be executed in parallel to dramatically improve performance. That is, speculative execution is strongly required. Boosting is a general solution to achieving speculative execution. Boosting labels an instruction to be speculatively executed, and the hardware handles side-effects. This paper describes the efficient implementation of boosting in terms of cost/performance trade-offs. Our policy in implementation is beneficial in code scheduling heuristics, penalties imposed by code duplication to maintain program semantics, and area cost. This paper also describes a branch scheme which minimizes branch penalty. Branch delay causes crucial penalties on the performance of superscalar processors since multiple delay slots exist even in a single delay cycle. Our scheme is the fetching of both sequential and target instructions, and either of them is selected on a branch. No delay cycle can be imposed. This scheme is realized by a combination of static code movement and hardware support. As a result, we reduce branch penalty with small cost. Simulation results show that our ideas are highly effective in improving the performance of a superscalar processor.
Alberto Palacios PAWLOVSKY Makoto HANAWA Osamu NISHII Tadahiko NISHIMUKAI
Advances in semiconductor technology have made it possible to develop an experimental 1000 MIPS superscalar RISC processor. The high performance of this processor was obtained using architectural concepts such as multiple CPU configuration, superscalar microarchitecture, and high-speed device technology. This paper focuses on the novel features of this RISC processor, its device technology, architectural characteristics and one technology that has been devised to make its integer CPU cores fault-tolerant.