KIT | KIT-Bibliothek | Impressum | Datenschutz

How to Evaluate Clustering Techniques

Delling, Daniel; Gaertler, Marco; Görke, Robert; Nikoloski, Zoran; Wagner, Dorothea

Abstract:


The quality of clustering algorithms is often based on their
performance according to a specific quality index, in an
experimental evaluation. Experiments either use a limited number
of real-world instances or synthetic data. While real-world data
is crucial for testing such algorithms, it is scarcely available
and thus insufficient. Therefore, synthetic pre-clustered data
has to be assembled as a test bed by a generator. Evaluating
clustering techniques on the basis of synthetic data is highly
non trivial. Even worse, we reveal several hidden dependencies
between algorithms, indices, and generators that potentially
lead to counterintuitive results. In order to cope with these
dependencies, we present a framework for testing based on the
concept of unit-tests. Moreover, we show the feasibility and
the advantages of our approach in an experimental evaluation.


Volltext §
DOI: 10.5445/IR/1000007104
Cover der Publikation
Zugehörige Institution(en) am KIT Institut für Theoretische Informatik (ITI)
Publikationstyp Forschungsbericht/Preprint
Publikationsjahr 2007
Sprache Englisch
Identifikator ISSN: 1432-7864
urn:nbn:de:swb:90-71040
KITopen-ID: 1000007104
Verlag Universität Karlsruhe (TH)
Serie Interner Bericht. Fakultät für Informatik, Universität Karlsruhe ; 2006,24
KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft
KITopen Landing Page