1-2hit |
Mariko SAKAMOTO Akira KATSUNO Aiichiro INOUE Takeo ASAKAWA Kuniki MORITA Tsuyoshi MOTOKURUMADA Yasunori KIMURA
We developed a SPARC-V9 processor, the SPARC64 V. It has an operating frequency of 1.35 GHz and contains 191 million transistors fabricated using 0.13-µm CMOS technology with eight-layer copper metallization. SPECjbb2000 (CPU# 32) is 492683, highest on the market and 42% higher than the next highest system. SPEC CPU2000 performance is 858 for SPECint and 1228 for SPECfp. The processor is designed to provide the high system performance and high reliability required of enterprise server systems. It is also designed to address the performance requirements of high-performance computing. During our development of several generations of mainframe processors, we conducted many related experiments, and obtained enterprise server system (EPS) development skills, an understanding of EPS workload characteristics, and technology that provides high reliability, availability, and serviceability. We used those as bases of the new processor development. The approach quite effectively moves beyond differences between mainframe and SPARC systems. At the beginning of development and before the start of hardware design, we developed a software performance simulator so we could understand the performance impacts of created specifications, thereby enabling us to make appropriate decisions about hardware design. We took this approach to solve performance problems before tape-out and avoid spending additional time on design update and physical machine reconstruction. We were successful, completing the high-performance processor development on schedule and in a short time. This paper describes the SPARC64 V microprocessor and performance analyses for development of its design.
Mariko SAKAMOTO Akira KATSUNO Go SUGIZAKI Toshio YOSHIDA Aiichiro INOUE Koji INOUE Kazuaki MURAKAMI
Broadcast and synchronization techniques are used for cache coherence control in conventional larger scale snoop-based SMP systems. The penalty for synchronization is directly proportional to system size. Meanwhile, advances in LSI technology now enable placing a memory controller on a CPU die. The latency to access directly linked memory is drastically reduced by an on-die controller. Developing an enterprise server system with these CPUs allows us an opportunity to achieve higher performance. Though the penalty of synchronization is counted whenever a cache miss occurs, it is necessary to improve the coherence method to receive the full benefit of this effect. In this paper, we demonstrate a coherence directory organization that fits into DSM enterprise server systems. Originally, a directory-based method was adopted in high performance computing systems because of its huge scalability in comparison with snoop-based method. Though directory capacity miss and long directory access latency are the major problems of this method, the relaxed scalability requirement of enterprise servers is advantageous to us to solve these problems along with an advanced LSI technology. Our proposed directory solves both problems by implementing a full bit vector level map of the coherence directory on an LSI chip. Our experimental results validate that a system controlled by our proposed directory can surpass a snoop-based system in performance even without applying data localization optimization to an online transaction processing (OLTP) workload.