KIT | KIT-Bibliothek | Impressum | Datenschutz

Improving spoken language translation by automatic disfluency removal: evidence from conversational speech transcripts

Rao, S.; Lane, I. R.; Schultz, T.

Abstract:

Machine translation of spoken language has made significant progress in recent years, however, translation quality is still limited due
to specific idiosyncrasies of spoken language; including the lack of well-formed sentences and the presence of disfluencies. In this
paper, we investigate the effect of disfluencies on Statistical Machine Translation (SMT) and introduce an Automatic Disfluency
Removal scheme as a pre-processing step prior to translation. On Broadcast Conversation (BC) transcripts the proposed approach
demonstrates that up to 8% relative improvement in BLEU can be obtained via Automatic Disfluency Removal. Furthermore, we show
that the detrimental effect of disfluencies on SMT differs across disfluency types.


Verlagsausgabe §
DOI: 10.5445/IR/1000009545
Veröffentlicht am 18.06.2025
Cover der Publikation
Zugehörige Institution(en) am KIT Institut für Theoretische Informatik (ITI)
Publikationstyp Proceedingsbeitrag
Publikationsjahr 2007
Sprache Englisch
Identifikator ISBN: 978-87-90708-16-0
KITopen-ID: 1000009545
Erschienen in Proceedings / Machine Translation Summit XI, 10-14 September 2007, Copenhagen, Denmark. Ed.: B. Maegaard
Veranstaltung 11th Machine Translation Summit (MTS 2007), Kopenhagen, Dänemark, 10.09.2007 – 14.09.2007
Verlag European Association for Machine Translation (EAMT)
Seiten 385 - 389
Externe Relationen Siehe auch
Abstract/Volltext
Siehe auch
KIT – Die Universität in der Helmholtz-Gemeinschaft
KITopen Landing Page