KIT | KIT-Bibliothek | Impressum | Datenschutz

How to Evaluate Clustering Techniques

Delling, Daniel; Gaertler, Marco; Görke, Robert; Nikoloski, Zoran; Wagner, Dorothea


The quality of clustering algorithms is often based on their
performance according to a specific quality index, in an
experimental evaluation. Experiments either use a limited number
of real-world instances or synthetic data. While real-world data
is crucial for testing such algorithms, it is scarcely available
and thus insufficient. Therefore, synthetic pre-clustered data
has to be assembled as a test bed by a generator. Evaluating
clustering techniques on the basis of synthetic data is highly
non trivial. Even worse, we reveal several hidden dependencies
between algorithms, indices, and generators that potentially
lead to counterintuitive results. In order to cope with these
dependencies, we present a framework for testing based on the
concept of unit-tests. Moreover, we show the feasibility and
the advantages of our approach in an experimental evaluation.

Open Access Logo

Volltext §
DOI: 10.5445/IR/1000007104
Zugehörige Institution(en) am KIT Institut für Theoretische Informatik (ITI)
Publikationstyp Forschungsbericht
Jahr 2007
Sprache Englisch
Identifikator ISSN: 1432-7864
KITopen-ID: 1000007104
Verlag Universität Karlsruhe, Karlsruhe
Serie Interner Bericht. Fakultät für Informatik, Universität Karlsruhe ; 2006,24
KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft
KITopen Landing Page