A Hybrid Approach to Assignment of Library of Congress Subject Headings

Wartena, Christian; Franke-Maier, Michael

doi:10.5445/KSP/1000085951/22

A Hybrid Approach to Assignment of Library of Congress Subject Headings

Wartena, Christian; Franke-Maier, Michael

Abstract:

Library of Congress Subject Headings (LCSH) are popular for indexing library records. We studied the possibility of assigning LCSH automatically by training classifiers for terms used frequently in a large collection of abstracts of the literature on hand and by extracting headings from those abstracts. The resulting classifiers reach an acceptable level of precision, but fail in terms of recall partly because we could only train classifiers for a small number of LCSH. Extraction, i.e., the matching of headings in the text, produces better recall but extremely low precision. We found that combining both methods leads to a significant improvement of recall and a slight improvement of F1 score with only a small decrease in precision.

KITopen-Download

Verlagsausgabe

DOI: 10.5445/KSP/1000085951/22

Veröffentlicht am 17.01.2020

Export

Statistiken

Seitenaufrufe: 302
seit 17.01.2020

Downloads: 731
seit 17.01.2020

Zugehörige Institution(en) am KIT	Fakultät für Wirtschaftswissenschaften – Institut für Informationswirtschaft und Marketing (IISM)
Publikationstyp	Zeitschriftenaufsatz
Publikationsjahr	2018
Sprache	Englisch
Identifikator	ISSN: 2363-9881 KITopen-ID: 1000105121
Erschienen in	Archives of Data Science, Series A (Online First)
Band	4
Heft	1
Seiten	A22, 13 S. online

Repository KITopen

A Hybrid Approach to Assignment of Library of Congress Subject Headings

Abstract: