KIT | KIT-Bibliothek | Impressum | Datenschutz

Bilingual Word Spectral Clustering for Statistical Machine Translation

Zhao, Bing; Xing, Eric P.; Waibel, Alex

Abstract:

In this paper, a variant of a spectral clustering algorithm is proposed for bilingual word clustering. The proposed algorithm generates the two sets of clusters for both languages efficiently with high semantic correlation within monolingual clusters, and high translation quality across the clusters between two languages. Each cluster level translation is considered as a bilingual concept, which generalizes words in bilingual clusters. This scheme improves the robustness for statistical machine translation models. Two HMM-based translation models are tested to use these bilingual clusters. Improved perplexity, word alignment accuracy, and translation quality are observed in our experiments.


Verlagsausgabe §
DOI: 10.5445/IR/1000166414
Veröffentlicht am 04.03.2024
Cover der Publikation
Zugehörige Institution(en) am KIT Institut für Anthropomatik und Robotik (IAR)
Publikationstyp Proceedingsbeitrag
Publikationsjahr 2005
Sprache Englisch
Identifikator KITopen-ID: 1000166414
Erschienen in Building and Using Parallel Texts: Data-Driven Machine Translation and Beyond Proceedings of the Workshop 29-30 June 2005, University of Michigan, Ann Arbor, Michigan, USA. Ed.: P. Koehn, J. Martin, R. Mihalcea, C. Monz, T. Pedersen
Veranstaltung Workshop on Building and Using Parallel Texts (2005), Ann Arbor, MI, USA, 29.06.2005 – 30.06.2005
Verlag Association for Computational Linguistics (ACL)
Seiten 25–32
KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft
KITopen Landing Page