Classification Method Performance in High Dimensions

Weihs, Claus; Kassner, Tobias

doi:10.5445/KSP/1000083488/03

Classification Method Performance in High Dimensions

Weihs, Claus; Kassner, Tobias

Abstract:

We discuss standard classification methods for high-dimensional data and a small number of observations. By means of designed simulations illustrating the practical relevance of theoretical results we show that in the 2-class case the following rules of thumb should be followed in such a situation to avoid the worst error rate, namely the probability $\pi$$_{1}$ of the smaller class: Avoid “complicated” classifiers: The independence rule (ir) might be adequate, the support vector machine (svm) should only be considered as an expensive alternative, which is additionally sensitive to noise factors. From the outset, look for stochastically independent dimensions and balanced classes. Only take into account features which influence class separation sufficiently. Variable selection might help, though filters might be too rough. Compare your result with the result of the data independent rule “Always predict the larger class”.

KITopen-Download

Verlagsausgabe

DOI: 10.5445/KSP/1000083488/03

Veröffentlicht am 17.05.2019

Export

Statistiken

Seitenaufrufe: 246
seit 17.05.2019

Downloads: 104
seit 17.05.2019

Zugehörige Institution(en) am KIT	Fakultät für Wirtschaftswissenschaften – Institut für Informationswirtschaft und Marketing (IISM)
Publikationstyp	Zeitschriftenaufsatz
Publikationsjahr	2018
Sprache	Englisch
Identifikator	ISSN: 2363-9881 KITopen-ID: 1000094749
Erschienen in	Archives of Data Science, Series A (Online First)
Band	3
Heft	1
Seiten	29 S. online

Repository KITopen

Classification Method Performance in High Dimensions

Abstract: