KIT | KIT-Bibliothek | Impressum | Datenschutz

GRABEX: A Graph-Based Method for Web Site Block Classification and its Application on Mining Breadcrumb Trails

Keller, M. 1; Hartenstein, H. 1
1 Scientific Computing Center (SCC), Karlsruher Institut für Technologie (KIT)


In order to interact with a Web site, humans must be able to distinguish and understand the purposes of different page blocks, e.g. header, navigation bar or content area. In case of navigational blocks, the block type
determines the functionality of the hyperlinks it contains. For example, the hyperlinks in the main menu block represent the main topics of a site while the hyperlinks in a breadcrumb trail show the location in the content
hierarchy. Hence, mining navigational blocks of specific types can provide valuable input for applications in the fields of crawling, ranking or presenting search results. However, analyzing visual features in order to identify specific navigational blocks as humans do is a difficult, resource-consuming task and a general solution does not exist yet. In this paper, we propose a novel approach to the problem and present the Graph-based block extraction method (GRABEX) that can be adapted to classify different types of navigational blocks. The fundamental concept is that a separate graph-based link-analysis is conducted for groups of blocks. Each block group consists of blocks from different pages that have similar CSS class attributes. ... mehr

DOI: 10.1109/WI-IAT.2013.42
Zitationen: 4
Zitationen: 3
Zugehörige Institution(en) am KIT Institut für Telematik (TM)
Scientific Computing Center (SCC)
Universität Karlsruhe (TH) – Zentrale Einrichtungen (Zentrale Einrichtungen)
Publikationstyp Proceedingsbeitrag
Publikationsjahr 2013
Sprache Englisch
Identifikator ISBN: 978-147992902-3
KITopen-ID: 1000036015
Erschienen in 2013 12th IEEE/WIC/ACM International Conference on Web Intelligence, WI 2013; Atlanta, GA; United States; 17 November 2013 through 20 November 2013
Verlag Institute of Electrical and Electronics Engineers (IEEE)
Seiten 290-297
Serie 1
Nachgewiesen in Scopus
KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft
KITopen Landing Page