We developed a PC cluster system which consists of 100 PCs as a test bed for massively parallel query processing. Each PC employs the 200 MHz Pentium Pro CPU and is connected with others through an ATM switch. Because the query processing applications are insensitive to the communication latency and mainly perform integer operations, the ATM connected PC cluster approach can be considered a reasonable solution for high performance database servers with low costs. However, there has been no challenge to construct large scale PC clusters for database applications, as far as the authors know. Though we employed commodity components as much as possible, we developed the DBMS itself, because that was a key component for obtaining high performance in parallel query processing, and there seemed no system which could meet our demand. On each PC node, a server program which acts as a database kernel is running to process the queries in cooperation with other nodes. The kernel was designed to execute pipelined operators and handle voluminous data efficiently, to achieve high performance on complex decision support type queries. We used the standard benchmark, TPC-D, on a 100 GB database to verify the feasibility of our approach, through comparison of our system with commercial parallel systems. As a whole, our system exhibited sufficiently high performance which was competitive with the current TPC-D top records, in spite of not using indices. For some heavy queries in the benchmark, which have high selectivity and joinability, our system performed much better. In addition, we applied transposed file organization to the database for further performance improvement. The transposed file organization vertically partitions the tuples, enabling attribute-by-attribute access to the relations. This resulted in significant performance improvement by reducing the amount of disk I/O and shifting the bottleneck to computation.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Takayuki TAMURA, Masato OGUCHI, Masaru KITSUREGAWA, "High Performance Parallel Query Processing on a 100 Node ATM Connected PC Cluster" in IEICE TRANSACTIONS on Information,
vol. E82-D, no. 1, pp. 54-63, January 1999, doi: .
Abstract: We developed a PC cluster system which consists of 100 PCs as a test bed for massively parallel query processing. Each PC employs the 200 MHz Pentium Pro CPU and is connected with others through an ATM switch. Because the query processing applications are insensitive to the communication latency and mainly perform integer operations, the ATM connected PC cluster approach can be considered a reasonable solution for high performance database servers with low costs. However, there has been no challenge to construct large scale PC clusters for database applications, as far as the authors know. Though we employed commodity components as much as possible, we developed the DBMS itself, because that was a key component for obtaining high performance in parallel query processing, and there seemed no system which could meet our demand. On each PC node, a server program which acts as a database kernel is running to process the queries in cooperation with other nodes. The kernel was designed to execute pipelined operators and handle voluminous data efficiently, to achieve high performance on complex decision support type queries. We used the standard benchmark, TPC-D, on a 100 GB database to verify the feasibility of our approach, through comparison of our system with commercial parallel systems. As a whole, our system exhibited sufficiently high performance which was competitive with the current TPC-D top records, in spite of not using indices. For some heavy queries in the benchmark, which have high selectivity and joinability, our system performed much better. In addition, we applied transposed file organization to the database for further performance improvement. The transposed file organization vertically partitions the tuples, enabling attribute-by-attribute access to the relations. This resulted in significant performance improvement by reducing the amount of disk I/O and shifting the bottleneck to computation.
URL: https://global.ieice.org/en_transactions/information/10.1587/e82-d_1_54/_p
Copy
@ARTICLE{e82-d_1_54,
author={Takayuki TAMURA, Masato OGUCHI, Masaru KITSUREGAWA, },
journal={IEICE TRANSACTIONS on Information},
title={High Performance Parallel Query Processing on a 100 Node ATM Connected PC Cluster},
year={1999},
volume={E82-D},
number={1},
pages={54-63},
abstract={We developed a PC cluster system which consists of 100 PCs as a test bed for massively parallel query processing. Each PC employs the 200 MHz Pentium Pro CPU and is connected with others through an ATM switch. Because the query processing applications are insensitive to the communication latency and mainly perform integer operations, the ATM connected PC cluster approach can be considered a reasonable solution for high performance database servers with low costs. However, there has been no challenge to construct large scale PC clusters for database applications, as far as the authors know. Though we employed commodity components as much as possible, we developed the DBMS itself, because that was a key component for obtaining high performance in parallel query processing, and there seemed no system which could meet our demand. On each PC node, a server program which acts as a database kernel is running to process the queries in cooperation with other nodes. The kernel was designed to execute pipelined operators and handle voluminous data efficiently, to achieve high performance on complex decision support type queries. We used the standard benchmark, TPC-D, on a 100 GB database to verify the feasibility of our approach, through comparison of our system with commercial parallel systems. As a whole, our system exhibited sufficiently high performance which was competitive with the current TPC-D top records, in spite of not using indices. For some heavy queries in the benchmark, which have high selectivity and joinability, our system performed much better. In addition, we applied transposed file organization to the database for further performance improvement. The transposed file organization vertically partitions the tuples, enabling attribute-by-attribute access to the relations. This resulted in significant performance improvement by reducing the amount of disk I/O and shifting the bottleneck to computation.},
keywords={},
doi={},
ISSN={},
month={January},}
Copy
TY - JOUR
TI - High Performance Parallel Query Processing on a 100 Node ATM Connected PC Cluster
T2 - IEICE TRANSACTIONS on Information
SP - 54
EP - 63
AU - Takayuki TAMURA
AU - Masato OGUCHI
AU - Masaru KITSUREGAWA
PY - 1999
DO -
JO - IEICE TRANSACTIONS on Information
SN -
VL - E82-D
IS - 1
JA - IEICE TRANSACTIONS on Information
Y1 - January 1999
AB - We developed a PC cluster system which consists of 100 PCs as a test bed for massively parallel query processing. Each PC employs the 200 MHz Pentium Pro CPU and is connected with others through an ATM switch. Because the query processing applications are insensitive to the communication latency and mainly perform integer operations, the ATM connected PC cluster approach can be considered a reasonable solution for high performance database servers with low costs. However, there has been no challenge to construct large scale PC clusters for database applications, as far as the authors know. Though we employed commodity components as much as possible, we developed the DBMS itself, because that was a key component for obtaining high performance in parallel query processing, and there seemed no system which could meet our demand. On each PC node, a server program which acts as a database kernel is running to process the queries in cooperation with other nodes. The kernel was designed to execute pipelined operators and handle voluminous data efficiently, to achieve high performance on complex decision support type queries. We used the standard benchmark, TPC-D, on a 100 GB database to verify the feasibility of our approach, through comparison of our system with commercial parallel systems. As a whole, our system exhibited sufficiently high performance which was competitive with the current TPC-D top records, in spite of not using indices. For some heavy queries in the benchmark, which have high selectivity and joinability, our system performed much better. In addition, we applied transposed file organization to the database for further performance improvement. The transposed file organization vertically partitions the tuples, enabling attribute-by-attribute access to the relations. This resulted in significant performance improvement by reducing the amount of disk I/O and shifting the bottleneck to computation.
ER -