K-Means Clustering on Multiple Correspondence Analysis Coordinates

Phan, Le; Liu, Hongzhe; Tortora, Cristina

doi:10.5445/KSP/1000085952/05

K-Means Clustering on Multiple Correspondence Analysis Coordinates

Phan, Le; Liu, Hongzhe; Tortora, Cristina

Abstract:

On April 18, 2017, the International Federation of Classification Societies (IFCS) issued a challenge to its members and the classification community to analyze a data set of 928 low back pain patients. In this paper, we present our contribution in terms of a cluster analysis of this data set. We will discuss our data cleaning process, which we view as a two-pronged approach: inferring values that are missing not at random and imputing values that are missing at random. We will also discuss the challenges in clustering mixed data types and the required data transformation prior to applying a clustering algorithm. We call our proposed data transformation process split-then-join. Finally, we offer our interpretation of the clustering results with respect to validation variables and we present some thoughts on selecting important variables to classify new observations.

KITopen-Download

Verlagsausgabe

DOI: 10.5445/KSP/1000085952/05

Veröffentlicht am 26.02.2019

Export

Statistiken

Seitenaufrufe: 992
seit 26.02.2019

Downloads: 3402
seit 26.02.2019

Zugehörige Institution(en) am KIT	Fakultät für Wirtschaftswissenschaften – Institut für Informationswirtschaft und Marketing (IISM)
Publikationstyp	Zeitschriftenaufsatz
Publikationsjahr	2019
Sprache	Englisch
Identifikator	ISSN: 2510-0564 urn:nbn:de:swb:90-916726 KITopen-ID: 1000091672
Erschienen in	Archives of Data Science, Series B (Online First)
Band	1
Heft	1
Seiten	B05, 17 S. online
Nachgewiesen in	OpenAlex
Globale Ziele für nachhaltige Entwicklung

Repository KITopen

K-Means Clustering on Multiple Correspondence Analysis Coordinates

Abstract: