Communication Efficient Algorithms for Distributed OLAP Query Execution

Hespe, Demian

As a result of the growing amounts of Data in todays Databases, one machine is often not sufficient to store and process these. The proper solution to this problem is to scale the system out on a cluster. However, the distribution of the data throughout the machines of the cluster results in a high percentage of communication time in the overall execution time of a query, especially for complex analytical queries. For this reason, we try to minimize the volume of communicated data to allow faster runtimes when a query cannot be executed on a single node of the cluster without any communication. We analyze techniques from previous work and propose improvements to them backed by a complexity analysis of the communication volume for both, our algorithms and the algorithms from the previous work. For the evaluation of our algorithms we implement them for chosen queries of the TPC-H benchmark and run them on a cluster of up to 128 nodes with a database of up to 30 terabytes of uncompressed data (128 TB if only a small proportion of the database is used). We provide both, scaling experiments and runtime comparisons to previous work and th ... mehr

Zugehörige Institution(en) am KIT Institut für Theoretische Informatik (ITI)
Publikationstyp Hochschulschrift
Jahr 2014
Sprache Englisch
Identifikator DOI(KIT): 10.5445/IR/1000058831
URN: urn:nbn:de:swb:90-588317
KITopen ID: 1000058831
Verlag Karlsruhe
Umfang 52 S.
Abschlussart Abschlussarbeit - Bachelor
