KIT | KIT-Bibliothek | Impressum | Datenschutz
Open Access Logo
DOI: 10.5445/KSP/1000085952/05
Veröffentlicht am 26.02.2019

K-Means Clustering on Multiple Correspondence Analysis Coordinates

Phan, Le; Liu, Hongzhe; Tortora, Cristina

On April 18, 2017, the International Federation of Classification Societies (IFCS) issued a challenge to its members and the classification community to analyze a data set of 928 low back pain patients. In this paper, we present our contribution in terms of a cluster analysis of this data set. We will discuss our data cleaning process, which we view as a two-pronged approach: inferring values that are missing not at random and imputing values that are missing at random. We will also discuss the challenges in clustering mixed data types and the required data transformation prior to applying a clustering algorithm. We call our proposed data transformation process split-then-join. Finally, we offer our interpretation of the clustering results with respect to validation variables and we present some thoughts on selecting important variables to classify new observations.

Zugehörige Institution(en) am KIT Institut für Informationswirtschaft und Marketing (IISM)
Publikationstyp Zeitschriftenaufsatz
Jahr 2019
Sprache Englisch
Identifikator ISSN: 2510-0564
URN: urn:nbn:de:swb:90-916726
KITopen-ID: 1000091672
Erschienen in Archives of Data Science, Series B (Online First)
Band 1
Heft 1
Seiten 17 S. online
KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft KITopen Landing Page