KIT | KIT-Bibliothek | Impressum | Datenschutz

Mining Taxonomies from Web Menus: Rule-Based Concepts and Algorithms

Keller, M. 1; Hartenstein, H. 1
1 Scientific Computing Center (SCC), Karlsruher Institut für Technologie (KIT)

Abstract:

The logical hierarchies of Web sites (i.e. Web site taxonomies) are obvious to humans, because humans can distinguish different menu levels and their relationships. But such accurate information about the logical structure is not yet available to machines. Many applications would benefit if Web site tax-onomies could be mined from menus, but it was an almost unsolvable problem in the past. While a tag newly introduced in HTML5 and novel mining methods allow to distinguish menus from other contents today, it has not yet been researched, how the underlying taxonomies can be extracted, given the menus. In this paper we present the first detailed analysis of the problem and introduce rule-based concepts for addressing each identified sub
problem. We report on a large-scale study on mining hierarchical menus of 350 randomly selected domains. Our methods allow extracting Web site taxonomy information that was not available before with high precision and high recall.


Download
Originalveröffentlichung
DOI: 10.1007/978-3-642-39200-9_23
Dimensions
Zitationen: 2
Zugehörige Institution(en) am KIT Institut für Telematik (TM)
Scientific Computing Center (SCC)
Universität Karlsruhe (TH) – Zentrale Einrichtungen (Zentrale Einrichtungen)
Publikationstyp Proceedingsbeitrag
Publikationsjahr 2013
Sprache Englisch
Identifikator ISBN: 978-3-642-39199-6
ISSN: 0302-9743
KITopen-ID: 1000036017
Erschienen in 13th International Conference on Web Engineering, ICWE 2013; Aalborg; Denmark; 8 July 2013 through 12 July 2013
Verlag Springer-Verlag
Seiten 265-282
Serie Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) ; 7977
Nachgewiesen in Dimensions
Scopus
KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft
KITopen Landing Page