# Hybrid Approach Combining Statistical and Rule-Based Models for the Automated Indexing of Bibliographic Metadata in the Area of Planning and Building Construction

Busch, Dimitri

##### Abstract:
ICONDA$^{®}$ Bibliographic (International Construction Database) is a bibliographic database, which contains English-language documents in the area of planning and building construction. The documents are indexed with descriptors from controlled vocabularies (FINDEX thesauri, an authority list). The manual assignment of the descriptors is time-consuming and expensive. To solve this problem, an automated indexing system was developed. The indexing system combines a statistical classifier that is based on the vector space model with a rule-based classifier. In the statistical classifier, descriptor profiles are automatically trained from already indexed documents. The results provided by the statistical classifier will be improved with the rule based classifier that filters incorrect and adds missing descriptors. The rules can be created manually or automatically from already indexed documents. The hybrid approach is particularly useful when a descriptor cannot be successfully trained by the statistical classifier. In this case, the system can be easily fine-tuned by adding specific rules for the descriptor.

 Zugehörige Institution(en) am KIT Institut für Informationswirtschaft und Marketing (IISM) Publikationstyp Zeitschriftenaufsatz Publikationsjahr 2018 Sprache Englisch Identifikator ISSN: 2363-9881 KITopen-ID: 1000100817 Erschienen in Archives of Data Science, Series A (Online First) Band 4 Heft 1 Seiten A15, 17 S. online
KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft
KITopen Landing Page