KIT | KIT-Bibliothek | Impressum | Datenschutz

MenuMiner: Revealing the Information Architecture of Large Web Sites by Analyzing Maximal Cliques

Keller, Matthias 1; Nussbaumer, Martin ORCID iD icon 1
1 Steinbuch Centre for Computing (SCC), Karlsruher Institut für Technologie (KIT)


The foundation of almost all web sites' information architecture is a hierarchical content organization. Thus information architects put much effort in designing taxonomies that structure the content in a comprehensible and sound way. The taxonomies are obvious to human users from the site's system of main and sub menus. But current methods of web structure mining are not able to extract these central aspects of the information architecture. This is because they cannot interpret the visual encoding to recognize menus and their rank as humans do. In this paper we show that a web site's main navigation system can not only be distinguished by visual features but also by certain structural characteristics of the HTML tree and the web graph. We have developed a reliable and scalable solution that solves the problem of extracting menus for mining the information architecture. The novel MenuMiner-algorithm allows retrieving the original content organization of large-scale web sites. These data are very valuable for many applications, e.g. the presentation of search results. In an experiment we applied the method for finding site boundaries within a large domain. ... mehr

DOI: 10.1145/2187980.2188237
Zitationen: 10
Zitationen: 8
Zugehörige Institution(en) am KIT Institut für Telematik (TM)
Steinbuch Centre for Computing (SCC)
Universität Karlsruhe (TH) – Zentrale Einrichtungen (Zentrale Einrichtungen)
Publikationstyp Proceedingsbeitrag
Publikationsjahr 2012
Sprache Englisch
Identifikator ISBN: 978-1-4503-1230-1
KITopen-ID: 1000028788
Erschienen in WWW 2012, International World Wide Web Conference : 21st World Wide Web Conference 2012, April 16 - 20, 2012, Lyon
Verlag Association for Computing Machinery (ACM)
Seiten 1025-1034
Nachgewiesen in Dimensions
KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft
KITopen Landing Page