KIT | KIT-Bibliothek | Impressum | Datenschutz

Domain Adaptation in Statistical Machine Translation using Factored Translation Models

Niehues, Jan ORCID iD icon; Waibel, Alexander

Abstract:

In recent years the performance of SMT increased in domains with enough training data. But under real-world conditions, it is often not possible to collect enough parallel data. We propose an approach to adapt an SMT system using small amounts of parallel in-domain data by introducing the corpus identifier (corpus id) as an additional target factor. Then we added features to model the generation of the tags and features to judge a sequence of tags. Using this approach we could improve the translation performance in two domains by up to 1 BLEU point when translating from German to English.


Verlagsausgabe §
DOI: 10.5445/IR/1000030254
Veröffentlicht am 13.06.2025
Scopus
Zitationen: 14
Cover der Publikation
Zugehörige Institution(en) am KIT Fakultät für Informatik – Institut für Anthropomatik (IFA)
Publikationstyp Proceedingsbeitrag
Publikationsjahr 2010
Sprache Englisch
Identifikator KITopen-ID: 1000030254
Erschienen in Proceedings of the 14th Annual Conference of the European Association for Machine Translation (EAMT'10), Saint-Raphaël, France, May 27-28 2010. Ed.: F. Yvon
Veranstaltung 14th Annual Conference of the European Association for Machine Translation (EAMT 2010), St. Raphael, Frankreich, 27.05.2010 – 28.05.2010
Verlag European Association for Machine Translation (EAMT)
Seiten 7 S.
Externe Relationen Siehe auch
Siehe auch
Nachgewiesen in Scopus
KIT – Die Universität in der Helmholtz-Gemeinschaft
KITopen Landing Page