[{"type":"thesis","title":"Compression Methods for Structured Floating-Point Data and their Application in Climate Research","issued":{"date-parts":[["2020"]]},"DOI":"10.5445\/IR\/1000105055","genre":"Dissertation","author":[{"family":"\u00c7ayo\u011flu","given":"U\u011fur"}],"publisher":"Karlsruhe","abstract":"The use of new technologies, such as GPU boosters, have led to a dramatic\r\nincrease in the computing power of High-Performance Computing (HPC)\r\ncentres. This development, coupled with new climate models that can better\r\nutilise this computing power thanks to software development and internal\r\ndesign, led to the bottleneck moving from solving the differential equations\r\ndescribing Earth\u2019s atmospheric interactions to actually storing the variables.\r\nThe current approach to solving the storage problem is inadequate: either\r\nthe number of variables to be stored is limited or the temporal resolution\r\nof the output is reduced. If it is subsequently determined that another vari-\r\nable is required which has not been saved, the simulation must run again.\r\nThis thesis deals with the development of novel compression algorithms\r\nfor structured floating-point data such as climate data so that they can be\r\nstored in full resolution.\r\nCompression is performed by decorrelation and subsequent coding of\r\nthe data. The decorrelation step eliminates redundant information in the\r\ndata. During coding, the actual compression takes place and the data is\r\nwritten to disk. A lossy compression algorithm additionally has an approx-\r\nimation step to unify the data for better coding. The approximation step\r\nreduces the complexity of the data for the subsequent coding, e.g. by using\r\nquantification. This work makes a new scientific contribution to each of the\r\nthree steps described above.\r\nThis thesis presents a novel lossy compression method for time-series\r\ndata using an Auto Regressive Integrated Moving Average (ARIMA) model\r\nto decorrelate the data. In addition, the concept of information spaces and\r\ncontexts is presented to use information across dimensions for decorrela-\r\ntion. Furthermore, a new coding scheme is described which reduces the\r\nweaknesses of the eXclusive-OR (XOR) difference calculation and achieves\r\na better compression factor than current lossless compression methods for\r\nfloating-point numbers. Finally, a modular framework is introduced that\r\nallows the creation of user-defined compression algorithms.\r\nThe experiments presented in this thesis show that it is possible to in-\r\ncrease the information content of lossily compressed time-series data by\r\napplying an adaptive compression technique which preserves selected data\r\nwith higher precision. An analysis for lossless compression of these time-\r\nseries has shown no success. However, the lossy ARIMA compression model\r\nproposed here is able to capture all relevant information. The reconstructed\r\ndata can reproduce the time-series to such an extent that statistically rele-\r\nvant information for the description of climate dynamics is preserved.\r\nExperiments indicate that there is a significant dependence of the com-\r\npression factor on the selected traversal sequence and the underlying data\r\nmodel. The influence of these structural dependencies on prediction-based\r\ncompression methods is investigated in this thesis. For this purpose, the\r\nconcept of Information Spaces (IS) is introduced. IS contributes to improv-\r\ning the predictions of the individual predictors by nearly 10% on average.\r\nPerhaps more importantly, the standard deviation of compression results is\r\non average 20% lower. Using IS provides better predictions and consistent\r\ncompression results.\r\nFurthermore, it is shown that shifting the prediction and true value leads\r\nto a better compression factor with minimal additional computational costs.\r\nThis allows the use of more resource-efficient prediction algorithms to\r\nachieve the same or better compression factor or higher throughput during\r\ncompression or decompression. The coding scheme proposed here achieves\r\na better compression factor than current state-of-the-art methods.\r\nFinally, this paper presents a modular framework for the development\r\nof compression algorithms. The framework supports the creation of user-\r\ndefined predictors and offers functionalities such as the execution of bench-\r\nmarks, the random subdivision of n-dimensional data, the quality evalua-\r\ntion of predictors, the creation of ensemble predictors and the execution of\r\nvalidity tests for sequential and parallel compression algorithms.\r\nThis research was initiated because of the needs of climate science, but\r\nthe application of its contributions is not limited to it. The results of this the-\r\nsis are of major benefit to develop and improve any compression algorithm\r\nfor structured floating-point data.","keyword":"Kompression, Klimadaten, Klimamodelle, Algorithmentechnik","number-of-pages":167,"kit-publication-id":"1000105055"}]