Validation of K-means Clustering : Why is Bootstrapping Better Than Subsampling?

Mucha, Hans-Joachim; Bartel, Hans-Georg

doi:10.5445/KSP/1000058749/27

Validation of K-means Clustering : Why is Bootstrapping Better Than Subsampling?

Mucha, Hans-Joachim; Bartel, Hans-Georg

Abstract:

In simulation studies based on many synthetic and real datasets, we found out that subsampling has a weaker behavior in finding of the true number of clusters K than bootstrapping (Mucha and Bartel 2014, 2015, Mucha 2016). But why? Based on further investigations, here especially concerning the Kmeans clustering with the comparison of bootstrapping and a special version of subsampling named “Boot2Sub”, we try to answer this question. In subsampling, usually a parameter H, the cardinality of the drawn subsample, has to be pre-specified. Its specification means an additional serious problem. The way out would be to take the bootstrap sample but discard multiple points. We call such a special subsampling scheme “Boot2Sub”. Then, bootstrapping and subsampling “Boot2Sub” result exactly in the same subset of drawn observations. This way allows us to make fair direct comparisons of the performance of bootstrapping and subsampling. As a result of the assessment of applications to generated and real datasets, the conjecture arises that multiple points play an important role for the validation of the true number of clusters in K-means clustering.

KITopen-Download

Volltext

DOI: 10.5445/KSP/1000058749/27

Export

Statistiken

Seitenaufrufe: 932
seit 02.05.2018

Downloads: 2300
seit 03.07.2017

Zugehörige Institution(en) am KIT	Fakultät für Wirtschaftswissenschaften – Institut für Informationswirtschaft und Marketing (IISM)
Publikationstyp	Zeitschriftenaufsatz
Publikationsjahr	2017
Sprache	Englisch
Identifikator	ISSN: 2363-9881 urn:nbn:de:swb:90-713935 KITopen-ID: 1000071393
Erschienen in	Archives of Data Science, Series A (Online First)
Band	2
Heft	1
Seiten	14 S. online
Nachgewiesen in	OpenAlex

Repository KITopen

Validation of K-means Clustering : Why is Bootstrapping Better Than Subsampling?

Abstract: