KIT | KIT-Bibliothek | Impressum

Communication Efficient Algorithms for Distributed OLAP Query Execution

Hespe, Demian

Abstract: As a result of the growing amounts of Data in todays Databases, one machine is often not sufficient to store and process these. The proper solution to this problem is to scale the system out on a cluster. However, the distribution of the data throughout the machines of the cluster results in a high percentage of communication time in the overall execution time of a query, especially for complex analytical queries. For this reason, we try to minimize the volume of communicated data to allow faster runtimes when a query cannot be executed on a single node of the cluster without any communication. We analyze techniques from previous work and propose improvements to them backed by a complexity analysis of the communication volume for both, our algorithms and the algorithms from the previous work. For the evaluation of our algorithms we implement them for chosen queries of the TPC-H benchmark and run them on a cluster of up to 128 nodes with a database of up to 30 terabytes of uncompressed data (128 TB if only a small proportion of the database is used). We provide both, scaling experiments and runtime comparisons to previous work and the current TPC-H record holder. The main contributions of this work are: • A technique to find a better partitioning of the tables in a database to allow the execution of joins without communication effort • An algorithm that selects the first k tuples of the result set of a query with a communication effort independent from the size of the database, given certain conditions of the partitioning • An analysis of the communication effort of a delayed join that can’t be evaluated locally on a node, in comparison to the communication effort when executing the join early • The application of our algorithms to solve complex queries of the TPC-H benchmark that can’t be executed without a high amount of communication effort • The implementation of the queries in a prototype and evaluation of our algorithms on a large cluster consisting of 128 nodes for a database with up to 30 terabytes of uncompressed data (or 128 TB if only a small proportion of the database is used)

Zugehörige Institution(en) am KIT Institut für Theoretische Informatik (ITI)
Publikationstyp Hochschulschrift
Jahr 2014
Sprache Englisch
Identifikator DOI(KIT): 10.5445/IR/1000058831
URN: urn:nbn:de:swb:90-588317
KITopen ID: 1000058831
Verlag Karlsruhe
Umfang 52 S.
Abschlussart Abschlussarbeit - Bachelor
KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft KITopen Landing Page