KIT | KIT-Bibliothek | Impressum | Datenschutz

A Semi-supervised Word Alignment Algorithm with Partial Manual Alignments

Gao, Qin; Bach, Nguyen; Vogel, Stephan

Abstract:

We present a word alignment framework that can incorporate partial manual alignments. The core of the approach is a novel semi-supervised algorithm extending the widely used IBM Models with a constrained EM algorithm. The partial manual alignments can be obtained by human labelling or automatically by high-precision-low-recall heuristics. We demonstrate the usages of both methods by selecting alignment links from manually aligned corpus and apply links generated from bilingual dictionary on unlabelled data. For the first method, we conduct controlled experiments on Chinese-English and Arabic-English translation tasks to compare the quality of word alignment, and to measure effects of two different methods in selecting alignment links from manually aligned corpus. For the second method, we experimented with moderate-scale Chinese-English translation task. The experiment results show an average improvement of 0.33 BLEU point across 8 test sets.


Verlagsausgabe §
DOI: 10.5445/IR/1000030253
Veröffentlicht am 13.06.2025
Cover der Publikation
Zugehörige Institution(en) am KIT Fakultät für Informatik – Institut für Anthropomatik (IFA)
Publikationstyp Proceedingsbeitrag
Publikationsjahr 2010
Sprache Englisch
Identifikator ISBN: 978-1-617-38823-1
KITopen-ID: 1000030253
Erschienen in Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR (WMT'10), Uppsala, Sweden, July 15-16, 2010
Veranstaltung 5th Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR (WMT 2010), Uppsala, Schweden, 15.07.2010 – 16.07.2010
Verlag Curran
Seiten 1-10
Vorab online veröffentlicht am 15.07.2010
Externe Relationen Abstract/Volltext
Siehe auch
KIT – Die Universität in der Helmholtz-Gemeinschaft
KITopen Landing Page