Letter N-Gram-based Input Encoding for Continuous Space Language Models

Sperr, H.; Niehues, J.; Waibel, A.

Letter N-Gram-based Input Encoding for Continuous Space Language Models

; Waibel, A.

Abstract:

We present a letter-based encoding for words in continuous space language models. We represent the words completely by letter n-grams instead of using the word index. This way, similar words will automatically have a similar representation. With this we hope to better generalize to unknown or rare words and to also capture morphological information. We show their influence in the task of machine translation using continuous space language models based on restricted Boltzmann machines. We evaluate the translation quality as well as the training time on a German-to-English translation task of TED and university lectures as well as on the news translation task translating from English to German. Using our new approach a gain in BLEU score by up to 0.4 points can be achieved.

KITopen-Download

Verlagsausgabe

DOI: 10.5445/IR/1000037718

Veröffentlicht am 12.06.2025

Export

Statistiken

Seitenaufrufe: 111
seit 09.09.2018

Downloads: 32
seit 30.07.2025

Zugehörige Institution(en) am KIT	Fakultät für Informatik – Institut für Anthropomatik (IFA) Institut für Anthropomatik und Robotik (IAR)
Publikationstyp	Proceedingsbeitrag
Publikationsjahr	2013
Sprache	Englisch
Identifikator	ISBN: 978-1-937284-67-1 KITopen-ID: 1000037718
Erschienen in	ACL 2013 : 51st Annual Meeting of the Association for Computational Linguistics : Proceedings of the Workshop on Continuous Vector Space Models and their Compositionality, August 9, 2013, Sofia, Bulgaria
Veranstaltung	8th ACL Workshop on Statistical Machine Translation (WMT 2013), Sofia, Bulgarien, 08.08.2013 – 09.08.2013
Verlag	Association for Computational Linguistics (ACL)
Seiten	30-39
Externe Relationen	Abstract/Volltext Siehe auch

Repository KITopen

Letter N-Gram-based Input Encoding for Continuous Space Language Models

Abstract: