KIT | KIT-Bibliothek | Impressum | Datenschutz

Wider Pipelines: N-Best Alignments and Parses in MT Training

Venugopal, Ashish; Zollmann, Andreas; Smith, Noah; Vogel, Stephan

Abstract:

State-of-the-art statistical machine translation systems use hypotheses from several maximum a posteriori inference steps, including word alignments and parse trees, to identify translational structure and estimate the parameters of translation models. While this approach leads to a modular pipeline of independently developed components, errors made in these “single-best” hypotheses can propagate to downstream estimation steps that treat these inputs as clean, trustworthy training data. In this work we integrate N-best alignments and parses by using a probability distribution over these alternatives to generate posterior fractional counts for use in downstream estimation. Using these fractional counts in a DOP-inspired syntax-based translation system, we show significant improvements in translation quality over a single-best trained baseline.


Verlagsausgabe §
DOI: 10.5445/IR/1000166368
Veröffentlicht am 19.02.2024
Cover der Publikation
Zugehörige Institution(en) am KIT Institut für Anthropomatik und Robotik (IAR)
Publikationstyp Proceedingsbeitrag
Publikationsjahr 2008
Sprache Englisch
Identifikator KITopen-ID: 1000166368
Erschienen in Proceedings of the 8th Conference of the Association for Machine Translation in the Americas: Research Papers
Veranstaltung 8th Conference of the Association for Machine Translation in the Americas (AMTA 2008), Waikiki, HI, USA, 21.10.2008 – 25.10.2008
Verlag Association for Machine Translation in the Americas (AMTA)
Seiten 192–201
KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft
KITopen Landing Page