KIT | KIT-Bibliothek | Impressum | Datenschutz

Hybrid Approach Combining Statistical and Rule-Based Models for the Automated Indexing of Bibliographic Metadata in the Area of Planning and Building Construction

Busch, Dimitri

Abstract:
ICONDA$^{®}$ Bibliographic (International Construction Database) is a bibliographic database, which contains English-language documents in the area of planning and building construction. The documents are indexed with descriptors from controlled vocabularies (FINDEX thesauri, an authority list). The manual assignment of the descriptors is time-consuming and expensive. To solve this problem, an automated indexing system was developed. The indexing system combines a statistical classifier that is based on the vector space model with a rule-based classifier. In the statistical classifier, descriptor profiles are automatically trained from already indexed documents. The results provided by the statistical classifier will be improved with the rule based classifier that filters incorrect and adds missing descriptors. The rules can be created manually or automatically from already indexed documents. The hybrid approach is particularly useful when a descriptor cannot be successfully trained by the statistical classifier. In this case, the system can be easily fine-tuned by adding specific rules for the descriptor.

Open Access Logo


Verlagsausgabe §
DOI: 10.5445/KSP/1000085951/15
Veröffentlicht am 11.12.2019
Cover der Publikation
Zugehörige Institution(en) am KIT Institut für Informationswirtschaft und Marketing (IISM)
Publikationstyp Zeitschriftenaufsatz
Publikationsjahr 2018
Sprache Englisch
Identifikator ISSN: 2363-9881
KITopen-ID: 1000100817
Erschienen in Archives of Data Science, Series A (Online First)
Band 4
Heft 1
Seiten A15, 17 S. online
KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft
KITopen Landing Page