KIT | KIT-Bibliothek | Impressum
Originalveröffentlichung
DOI: 10.1145/2187980.2188237

MenuMiner: Revealing the Information Architecture of Large Web Sites by Analyzing Maximal Cliques

Keller, Matthias; Nussbaumer, Martin

Abstract:
The foundation of almost all web sites' information architecture is a hierarchical content organization. Thus information architects put much effort in designing taxonomies that structure the content in a comprehensible and sound way. The taxonomies are obvious to human users from the site's system of main and sub menus. But current methods of web structure mining are not able to extract these central aspects of the information architecture. This is because they cannot interpret the visual encoding to recognize menus and their rank as humans do. In this paper we show that a web site's main navigation system can not only be distinguished by visual features but also by certain structural characteristics of the HTML tree and the web graph. We have developed a reliable and scalable solution that solves the problem of extracting menus for mining the information architecture. The novel MenuMiner-algorithm allows retrieving the original content organization of large-scale web sites. These data are very valuable for many applications, e.g. the presentation of search results. In an experiment we applied the method for finding site boundaries ... mehr


Zugehörige Institution(en) am KIT Institut für Telematik (TM)
Steinbuch Centre for Computing (SCC)
Publikationstyp Proceedingsbeitrag
Jahr 2012
Sprache Englisch
Identifikator ISBN: 978-1-4503-1230-1
KITopen ID: 1000028788
Erschienen in WWW 2012, International World Wide Web Conference : 21st World Wide Web Conference 2012, April 16 - 20, 2012, Lyon
Verlag ACM, New York (NY)
Seiten 1025-1034
KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft KITopen Landing Page