Communication Efficient Algorithms for Distributed OLAP Query Execution

Hespe, Demian

doi:10.5445/IR/1000058831

Communication Efficient Algorithms for Distributed OLAP Query Execution

Abstract:

As a result of the growing amounts of Data in todays Databases, one machine is often not sufficient to store and process these. The proper solution to this problem is to scale the system out on a cluster. However, the distribution of the data throughout the machines of the cluster results in a high percentage of communication time in the overall execution time of a query, especially for complex analytical queries. For this reason, we try to minimize the volume of communicated data to allow faster runtimes when a query cannot be executed on a single node of the cluster without any communication. We analyze techniques from previous work and propose improvements to them backed by a complexity analysis of the communication volume for both, our algorithms and the algorithms from the previous work. For the evaluation of our algorithms we implement them for chosen queries of the TPC-H benchmark and run them on a cluster of up to 128 nodes with a database of up to 30 terabytes of uncompressed data (128 TB if only a small proportion of the database is used). We provide both, scaling experiments and runtime comparisons to previous work and the current TPC-H record holder. ... mehr

KITopen-Download

Volltext

DOI: 10.5445/IR/1000058831

Export

Statistiken

Seitenaufrufe: 249
seit 11.05.2018

Downloads: 227
seit 15.10.2016

Zugehörige Institution(en) am KIT	Institut für Theoretische Informatik (ITI)
Publikationstyp	Hochschulschrift
Publikationsjahr	2014
Sprache	Englisch
Identifikator	urn:nbn:de:swb:90-588317 KITopen-ID: 1000058831
Verlag	Karlsruher Institut für Technologie (KIT)
Umfang	52 S.
Art der Arbeit	Abschlussarbeit - Bachelor
Nachgewiesen in	OpenAlex

Repository KITopen

Communication Efficient Algorithms for Distributed OLAP Query Execution

Abstract: