Assessment of Stability in Partitional Clustering Using Resampling Techniques

Mucha, Hans-Joachim

doi:10.5445/KSP/1000058747/02

Assessment of Stability in Partitional Clustering Using Resampling Techniques

Mucha, Hans-Joachim

Abstract:

The assessment of stability in cluster analysis is strongly related to the main difficult problem of determining the number of clusters present in the data. The latter is subject of many investigations and papers considering different resampling techniques as practical tools. In this paper, we consider non-parametric resampling from the empirical distribution of a given dataset in order to investigate the stability of results of partitional clustering. In detail, we investigate here only the very popular K-means method. The estimation of the sampling distribution of the adjusted Rand index (ARI) and the averaged Jaccard index seems to be the most general way to do this. In addition, we compare bootstrapping with different subsampling schemes (i.e., with different cardinality of the drawn samples) with respect to their performance in finding the true number of clusters for both synthetic and real data.

KITopen-Download

Volltext

DOI: 10.5445/KSP/1000058747/02

Export

Statistiken

Seitenaufrufe: 498
seit 29.04.2018

Downloads: 247
seit 21.03.2017

Zugehörige Institution(en) am KIT	Fakultät für Wirtschaftswissenschaften – Institut für Informationswirtschaft und Marketing (IISM)
Publikationstyp	Zeitschriftenaufsatz
Publikationsjahr	2016
Sprache	Englisch
Identifikator	ISSN: 2363-9881 urn:nbn:de:swb:90-677602 KITopen-ID: 1000067760
Erschienen in	Archives of Data Science, Series A
Band	1
Heft	1
Seiten	21-39

Repository KITopen

Assessment of Stability in Partitional Clustering Using Resampling Techniques

Abstract: