1-2hit |
Takatsugu ONO Yotaro KONISHI Teruo TANIMOTO Noboru IWAMATSU Takashi MIYOSHI Jun TANAKA
Big data analysis and a data storing applications require a huge volume of storage and a high I/O performance. Applications can achieve high level of performance and cost efficiency by exploiting the high I/O performance of direct attached storages (DAS) such as internal HDDs. With the size of stored data ever increasing, it will be difficult to replace servers since internal HDDs contain huge amounts of data. Generally, the data is copied via Ethernet when transferring the data from the internal HDDs to the new server. However, the amount of data will continue to rapidly increase, and thus, it will be hard to make these types of transfers through the Ethernet since it will take a long time. A storage area network such as iSCSI can be used to avoid this problem because the data can be shared with the servers. However, this decreases the level of performance and increases the costs. Improving the flexibility without incurring I/O performance degradation is required in order to improve the DAS architecture. In response to this issue, we propose FlexDAS, which improves the flexibility of direct attached storage by using a disk area network (DAN) without degradation the I/O performance. A resource manager connects or disconnects the computation nodes to the HDDs via the FlexDAS switch, which supports the SAS or SATA protocols. This function enables for the servers to be replaced in a short period of time. We developed a prototype FlexDAS switch and quantitatively evaluated the architecture. Results show that the FlexDAS switch can disconnect and connect the HDD to the server in just 1.16 seconds. We also confirmed that the FlexDAS improves the performance of the data intensive applications by up to 2.84 times compared with the iSCSI.
Teruo TANIMOTO Takatsugu ONO Koji INOUE
Correctly understanding microarchitectural bottlenecks is important to optimize performance and energy of OoO (Out-of-Order) processors. Although CPI (Cycles Per Instruction) stack has been utilized for this purpose, it stacks architectural events heuristically by counting how many times the events occur, and the order of stacking affects the result, which may be misleading. It is because CPI stack does not consider the execution path of dynamic instructions. Critical path analysis (CPA) is a well-known method to identify the critical execution path of dynamic instruction execution on OoO processors. The critical path consists of the sequence of events that determines the execution time of a program on a certain processor. We develop a novel representation of CPCI stack (Cycles Per Critical Instruction stack), which is CPI stack based on CPA. The main challenge in constructing CPCI stack is how to analyze a large number of paths because CPA often results in numerous critical paths. In this paper, we show that there are more than ten to the tenth power critical paths in the execution of only one thousand instructions in 35 benchmarks out of 48 from SPEC CPU2006. Then, we propose a statistical method to analyze all the critical paths and show a case study using the benchmarks.