KIT | KIT-Bibliothek | Impressum

Assessment of Stability in Partitional Clustering Using Resampling Techniques

Mucha, Hans-Joachim

Abstract:
The assessment of stability in cluster analysis is strongly related to the main difficult problem of determining the number of clusters present in the data. The latter is subject of many investigations and papers considering different resampling techniques as practical tools. In this paper, we consider non-parametric resampling from the empirical distribution of a given dataset in order to investigate the stability of results of partitional clustering. In detail, we investigate here only the very popular K-means method. The estimation of the sampling distribution of the adjusted Rand index (ARI) and the averaged Jaccard index seems to be the most general way to do this. In addition, we compare bootstrapping with different subsampling schemes (i.e., with different cardinality of the drawn samples) with respect to their performance in finding the true number of clusters for both synthetic and real data.


Zugehörige Institution(en) am KIT Institut für Informationswirtschaft und Marketing (IISM)
Publikationstyp Zeitschriftenaufsatz
Jahr 2016
Sprache Englisch
Identifikator DOI: 10.5445/KSP/1000058747/02
ISSN: 2363-9881
URN: urn:nbn:de:swb:90-677602
KITopen ID: 1000067760
Erschienen in Archives of Data Science, Series A
Band 1
Heft 1
Seiten 21-39
Lizenz CC BY-SA 3.0 DE: Creative Commons Namensnennung – Weitergabe unter gleichen Bedingungen 3.0 Deutschland
KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft KITopen Landing Page