KIT | KIT-Bibliothek | Impressum

MenuMiner: Revealing the Information Architecture of Large Web Sites by Analyzing Maximal Cliques

Keller, M.; Nussbaumer, M.

Abstract:
The foundation of almost all web sites' information architecture is a hierarchical content organization. Thus information architects put much effort in designing taxonomies that structure the content in a comprehensible and sound way. The taxonomies are obvious to human users from the site's system of main and sub menus. But current methods of web structure mining are not able to extract these central aspects of the information architecture. This is because they cannot interpret the visual encoding to recognize menus and their rank as humans do. In this paper we show that a web site's main navigation system can not only be distinguished by visual features but also by certain structural characteristics of the HTML tree and the web graph. We have developed a reliable and scalable solution that solves the problem of extracting menus for mining the information architecture. The novel MenuMiner-algorithm allows retrieving the original content organization of large-scale web sites. These data are very valuable for many applications, e.g. the presentation of search results. In an experiment we applied the method for finding site boundaries ... mehr


Zugehörige Institution(en) am KIT Institut für Technische Mechanik (ITM)
Steinbuch Centre for Computing (SCC)
Publikationstyp Proceedingsbeitrag
Jahr 2012
Sprache Englisch
Identifikator KITopen ID: 1000028788
Erschienen in 1st International Workshop on Large Scale Network Analysis, Lyon, France, April 2012
Verlag ACM, New York (NY)
Seiten 1025-1034
KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft KITopen Landing Page