KIT | KIT-Bibliothek | Impressum | Datenschutz

Assessment of Stability in Partitional Clustering Using Resampling Techniques

Mucha, Hans-Joachim

Abstract:

The assessment of stability in cluster analysis is strongly related to the main difficult problem of determining the number of clusters present in the data. The latter is subject of many investigations and papers considering different resampling techniques as practical tools. In this paper, we consider non-parametric resampling from the empirical distribution of a given dataset in order to investigate the stability of results of partitional clustering. In detail, we investigate here only the very popular K-means method. The estimation of the sampling distribution of the adjusted Rand index (ARI) and the averaged Jaccard index seems to be the most general way to do this. In addition, we compare bootstrapping with different subsampling schemes (i.e., with different cardinality of the drawn samples) with respect to their performance in finding the true number of clusters for both synthetic and real data.


Volltext §
DOI: 10.5445/KSP/1000058747/02
Cover der Publikation
Zugehörige Institution(en) am KIT Fakultät für Wirtschaftswissenschaften – Institut für Informationswirtschaft und Marketing (IISM)
Publikationstyp Zeitschriftenaufsatz
Publikationsjahr 2016
Sprache Englisch
Identifikator ISSN: 2363-9881
urn:nbn:de:swb:90-677602
KITopen-ID: 1000067760
Erschienen in Archives of Data Science, Series A
Band 1
Heft 1
Seiten 21-39
KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft
KITopen Landing Page