1-2hit |
The new technique for reducing the load latency is presented. This technique, named tunneling-load, utilizes the register specifier buffer in order to reduce the load latency without fetching the data cache speculatively, and thus eliminates the drawback of any load address prediction techniques. As a consequence of the trend toward increasing clock frequency, the internal cache is no longer able to fill the speed gap between the processor and the external memory, and the data cache latency degrades the processor performance. In order to hide this latency, several techniques predicting the load address have been proposed. These techniques carry out the speculative data cache fetching, which causes the explosion of the memory traffic and the pollution of the data cache. The tunneling-load solves these problems. We have evaluated the effects of the tunneling-load, and found that in an in-order-issue superscalar platform the instruction level parallelism is increased by approximately 10%.
Toshinori SATO Hiroshige FUJII Seigo SUZUKI
A new prediction method for the effective address is presented. This method works with the buffer named the address prediction buffer, and allows the data cache to be accessed speculatively. As a consequence of the trend toward increasing clock frequency, the internal cache is no longer able to fill the speed gap between the processor and the external memory, and the data cache latency degrades the processor performance. In order to hide this latency, the prediction method is proposed. By this method, the load address is predicted, and the data is fetched earlier than the memory access stage. In the case that the prediction is correct, the latency is hidden. Even if the prediction is incorrect, the performance is not degraded by any miss penalties. We have found that the prediction accuracy is 81.9% on average, and thus the performance is improved by 6.6% on average and a maximum of 12.1% for the integer programs.