Modern micro-architectures employ superscalar techniques to enhance system performance. Since the superscalar microprocessors must fetch at least one instruction cache line at a time to support high issue rate and large amount speculative executions. There are cases that multiple branches are often encountered in one cycle. And in practical implementation this would cause serious problem while there are variable number of instruction addresses that look up the Branch Target Buffer simultaneously. In this paper, we propose a Range Associative Branch Target Buffer (RABTB) that can recognize and predict multiple branches in the same instruction cache line for a wide-issue micro-architecture. Several configurations of the RABTB are simulated and compared using the SPECint95 benchmarks. We show that with a reasonable size of prediction scope, branch prediction can be improved by supporting multiple / up to 8 branch predictions in one cache line in one cycle. Our simulation results show that the optimal RABTB should be 2048 entry, 8-column range-associate and 8-entry modified ring buffer architecture using PAs prediction algorithm. It has an average 5.2 IPC_f and branch penalty per branch of 0.54 cycles. This is almost two times better than a mechanism that makes prediction only on the first encountered branch.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Shu-Lin HWANG, Che-Chun CHEN, Feipei LAI, "Multiple Branch Prediction for Wide-Issue Superscalar" in IEICE TRANSACTIONS on Information,
vol. E82-D, no. 8, pp. 1154-1166, August 1999, doi: .
Abstract: Modern micro-architectures employ superscalar techniques to enhance system performance. Since the superscalar microprocessors must fetch at least one instruction cache line at a time to support high issue rate and large amount speculative executions. There are cases that multiple branches are often encountered in one cycle. And in practical implementation this would cause serious problem while there are variable number of instruction addresses that look up the Branch Target Buffer simultaneously. In this paper, we propose a Range Associative Branch Target Buffer (RABTB) that can recognize and predict multiple branches in the same instruction cache line for a wide-issue micro-architecture. Several configurations of the RABTB are simulated and compared using the SPECint95 benchmarks. We show that with a reasonable size of prediction scope, branch prediction can be improved by supporting multiple / up to 8 branch predictions in one cache line in one cycle. Our simulation results show that the optimal RABTB should be 2048 entry, 8-column range-associate and 8-entry modified ring buffer architecture using PAs prediction algorithm. It has an average 5.2 IPC_f and branch penalty per branch of 0.54 cycles. This is almost two times better than a mechanism that makes prediction only on the first encountered branch.
URL: https://global.ieice.org/en_transactions/information/10.1587/e82-d_8_1154/_p
Copy
@ARTICLE{e82-d_8_1154,
author={Shu-Lin HWANG, Che-Chun CHEN, Feipei LAI, },
journal={IEICE TRANSACTIONS on Information},
title={Multiple Branch Prediction for Wide-Issue Superscalar},
year={1999},
volume={E82-D},
number={8},
pages={1154-1166},
abstract={Modern micro-architectures employ superscalar techniques to enhance system performance. Since the superscalar microprocessors must fetch at least one instruction cache line at a time to support high issue rate and large amount speculative executions. There are cases that multiple branches are often encountered in one cycle. And in practical implementation this would cause serious problem while there are variable number of instruction addresses that look up the Branch Target Buffer simultaneously. In this paper, we propose a Range Associative Branch Target Buffer (RABTB) that can recognize and predict multiple branches in the same instruction cache line for a wide-issue micro-architecture. Several configurations of the RABTB are simulated and compared using the SPECint95 benchmarks. We show that with a reasonable size of prediction scope, branch prediction can be improved by supporting multiple / up to 8 branch predictions in one cache line in one cycle. Our simulation results show that the optimal RABTB should be 2048 entry, 8-column range-associate and 8-entry modified ring buffer architecture using PAs prediction algorithm. It has an average 5.2 IPC_f and branch penalty per branch of 0.54 cycles. This is almost two times better than a mechanism that makes prediction only on the first encountered branch.},
keywords={},
doi={},
ISSN={},
month={August},}
Copy
TY - JOUR
TI - Multiple Branch Prediction for Wide-Issue Superscalar
T2 - IEICE TRANSACTIONS on Information
SP - 1154
EP - 1166
AU - Shu-Lin HWANG
AU - Che-Chun CHEN
AU - Feipei LAI
PY - 1999
DO -
JO - IEICE TRANSACTIONS on Information
SN -
VL - E82-D
IS - 8
JA - IEICE TRANSACTIONS on Information
Y1 - August 1999
AB - Modern micro-architectures employ superscalar techniques to enhance system performance. Since the superscalar microprocessors must fetch at least one instruction cache line at a time to support high issue rate and large amount speculative executions. There are cases that multiple branches are often encountered in one cycle. And in practical implementation this would cause serious problem while there are variable number of instruction addresses that look up the Branch Target Buffer simultaneously. In this paper, we propose a Range Associative Branch Target Buffer (RABTB) that can recognize and predict multiple branches in the same instruction cache line for a wide-issue micro-architecture. Several configurations of the RABTB are simulated and compared using the SPECint95 benchmarks. We show that with a reasonable size of prediction scope, branch prediction can be improved by supporting multiple / up to 8 branch predictions in one cache line in one cycle. Our simulation results show that the optimal RABTB should be 2048 entry, 8-column range-associate and 8-entry modified ring buffer architecture using PAs prediction algorithm. It has an average 5.2 IPC_f and branch penalty per branch of 0.54 cycles. This is almost two times better than a mechanism that makes prediction only on the first encountered branch.
ER -