On a Comprehensive Metadata Framework for Artificial Data in Unsupervised Learning

Dangl, Rainer; Leisch, Friedrich

doi:10.5445/KSP/1000058749/22

On a Comprehensive Metadata Framework for Artificial Data in Unsupervised Learning

Dangl, Rainer; Leisch, Friedrich

Abstract:

Evaluating new methods and algorithms in unsupervised learning obviously requires thorough benchmarking studies on data sets that most closely reflect performance in actual usage. Designing data sets that do exactly that is quite a challenging task in itself; standing up to the challenge in comparison to other methods is another point which poses a risk of compromising the goal of an objective benchmarking study. We want to address the latter by proposing a framework that standardizes the format of artificial data, or rather its metadata. We intend to introduce a web repository that functions as an exchange for metadata of artificial data and an accompanying R package that can generate actual data from the descriptions obtained from the repository. It is therefore much simpler to find data designed by others and which has been used in previous benchmarking studies. This removes some of the temptation to specifically design artificial data in a way so that a proposed method performs significantly better than existing ones, a claim that might not hold in real life applications.

KITopen-Download

Volltext

DOI: 10.5445/KSP/1000058749/22

Export

Statistiken

Seitenaufrufe: 399
seit 04.05.2018

Downloads: 174
seit 17.08.2017

Zugehörige Institution(en) am KIT	Fakultät für Wirtschaftswissenschaften – Institut für Informationswirtschaft und Marketing (IISM)
Publikationstyp	Zeitschriftenaufsatz
Publikationsjahr	2017
Sprache	Englisch
Identifikator	ISSN: 2363-9881 urn:nbn:de:swb:90-733784 KITopen-ID: 1000073378
Erschienen in	Archives of Data Science, Series A (Online First)
Band	2
Heft	1
Seiten	16 S. online

Repository KITopen

On a Comprehensive Metadata Framework for Artificial Data in Unsupervised Learning

Abstract: