Cluster Validation for Mixed-Type Data

Aschenbruck, Rabea; Szepannek, Gero

doi:10.5445/KSP/1000098011/02

Cluster Validation for Mixed-Type Data

Aschenbruck, Rabea; Szepannek, Gero

Abstract:

For cluster analysis based on mixed-type data (i.e. data consisting of numerical and categorical variables), comparatively few clustering methods are available. One popular approach to deal with this kind of problems is an extension of the k-means algorithm (Huang, 1998), the so-called k-prototype algorithm, which is implemented in the R package clustMixType (Szepannek and Aschenbruck, 2019).
It is further known that the selection of a suitable number of clusters k is particularly crucial in partitioning cluster procedures. Many implementations of cluster validation indices in R are not suitable for mixed-type data. This paper examines the transferability of validation indices, such as the Gamma index, Average Silhouette Width or Dunn index to mixed-type data. Furthermore, the R package clustMixType is extended by these indices and their application is demonstrated. Finally, the behaviour of the adapted indices is tested by a short simulation study using different data scenarios.

KITopen-Download

Verlagsausgabe

DOI: 10.5445/KSP/1000098011/02

Veröffentlicht am 23.06.2020

Export

Statistiken

Seitenaufrufe: 1543
seit 23.06.2020

Downloads: 3387
seit 23.06.2020

Zugehörige Institution(en) am KIT	Institut für Wirtschaftsinformatik und Marketing (IISM)
Publikationstyp	Zeitschriftenaufsatz
Publikationsjahr	2020
Sprache	Englisch
Identifikator	ISSN: 2363-9881 KITopen-ID: 1000120412
Erschienen in	Archives of Data Science, Series A
Band	6
Heft	1
Seiten	P02, 12 S. online
Nachgewiesen in	OpenAlex

Repository KITopen

Cluster Validation for Mixed-Type Data

Abstract: