MenuMiner: Revealing the Information Architecture of Large Web Sites by Analyzing Maximal Cliques

Keller, Matthias; Nussbaumer, Martin

doi:10.1145/2187980.2188237

MenuMiner: Revealing the Information Architecture of Large Web Sites by Analyzing Maximal Cliques

Keller, Matthias ¹; Nussbaumer, Martin

¹
¹ Scientific Computing Center (SCC), Karlsruher Institut für Technologie (KIT)

Abstract:

The foundation of almost all web sites' information architecture is a hierarchical content organization. Thus information architects put much effort in designing taxonomies that structure the content in a comprehensible and sound way. The taxonomies are obvious to human users from the site's system of main and sub menus. But current methods of web structure mining are not able to extract these central aspects of the information architecture. This is because they cannot interpret the visual encoding to recognize menus and their rank as humans do. In this paper we show that a web site's main navigation system can not only be distinguished by visual features but also by certain structural characteristics of the HTML tree and the web graph. We have developed a reliable and scalable solution that solves the problem of extracting menus for mining the information architecture. The novel MenuMiner-algorithm allows retrieving the original content organization of large-scale web sites. These data are very valuable for many applications, e.g. the presentation of search results. In an experiment we applied the method for finding site boundaries within a large domain. ... mehr

Externe Links

Originalveröffentlichung
DOI: 10.1145/2187980.2188237

Scopus
Zitationen: 11

Dimensions
Zitationen: 9

Export

Statistiken

Seitenaufrufe: 467
seit 28.04.2018

Zugehörige Institution(en) am KIT	Institut für Telematik (TM) Scientific Computing Center (SCC) Universität Karlsruhe (TH) – Zentrale Einrichtungen (Zentrale Einrichtungen)
Publikationstyp	Proceedingsbeitrag
Publikationsjahr	2012
Sprache	Englisch
Identifikator	ISBN: 978-1-4503-1230-1 KITopen-ID: 1000028788
Erschienen in	WWW 2012, International World Wide Web Conference : 21st World Wide Web Conference 2012, April 16 - 20, 2012, Lyon
Verlag	Association for Computing Machinery (ACM)
Seiten	1025-1034
Nachgewiesen in	OpenAlex Dimensions Scopus

Repository KITopen

MenuMiner: Revealing the Information Architecture of Large Web Sites by Analyzing Maximal Cliques

Abstract: