Mining Taxonomies from Web Menus: Rule-Based Concepts and Algorithms

Keller, M.; Hartenstein, H.

doi:10.1007/978-3-642-39200-9_23

Mining Taxonomies from Web Menus: Rule-Based Concepts and Algorithms

Keller, M. ¹; Hartenstein, H. ¹
¹ Scientific Computing Center (SCC), Karlsruher Institut für Technologie (KIT)

Abstract:

The logical hierarchies of Web sites (i.e. Web site taxonomies) are obvious to humans, because humans can distinguish different menu levels and their relationships. But such accurate information about the logical structure is not yet available to machines. Many applications would benefit if Web site tax-onomies could be mined from menus, but it was an almost unsolvable problem in the past. While a tag newly introduced in HTML5 and novel mining methods allow to distinguish menus from other contents today, it has not yet been researched, how the underlying taxonomies can be extracted, given the menus. In this paper we present the first detailed analysis of the problem and introduce rule-based concepts for addressing each identified sub
problem. We report on a large-scale study on mining hierarchical menus of 350 randomly selected domains. Our methods allow extracting Web site taxonomy information that was not available before with high precision and high recall.

Externe Links

Download

Originalveröffentlichung
DOI: 10.1007/978-3-642-39200-9_23

Scopus

Dimensions
Zitationen: 2

Export

Statistiken

Seitenaufrufe: 498
seit 06.05.2018

Zugehörige Institution(en) am KIT	Institut für Telematik (TM) Scientific Computing Center (SCC) Universität Karlsruhe (TH) – Zentrale Einrichtungen (Zentrale Einrichtungen)
Publikationstyp	Proceedingsbeitrag
Publikationsjahr	2013
Sprache	Englisch
Identifikator	ISBN: 978-3-642-39199-6 ISSN: 0302-9743 KITopen-ID: 1000036017
Erschienen in	13th International Conference on Web Engineering, ICWE 2013; Aalborg; Denmark; 8 July 2013 through 12 July 2013
Verlag	Springer-Verlag
Seiten	265-282
Serie	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) ; 7977
Nachgewiesen in	Dimensions OpenAlex Scopus

Repository KITopen

Mining Taxonomies from Web Menus: Rule-Based Concepts and Algorithms

Abstract: