Multi-view Representation Learning for Unifying Languages, Knowledge and Vision

Mogadala, Aditya

Abstract (englisch):
The growth of content on the web has raised various challenges, yet also provided numerous opportunities. Content exists in varied forms such as text appearing in different languages, entity-relationship graph represented as structured knowledge and as a visual embodiment like images/videos. They are often referred to as modalities. In many instances, the different amalgamation of modalities co-exists to complement each other or to provide consensus. Thus making the content either heterogeneous or homogeneous. Having an additional point of view for each instance in the content is beneficial for data-driven learning and intelligent content processing. However, despite having availability of such content. Most advancements made in data-driven learning (i.e., machine learning) is by solving tasks separately for the single modality. The similar endeavor was not shown for the challenges which required input either from all or subset of them.

In this dissertation, we develop models and techniques that can leverage multiple views of heterogeneous or homogeneous content and build a shared representation for aiding several applications wh ... mehr

Open Access Logo

Volltext §
DOI: 10.5445/IR/1000091950
Veröffentlicht am 12.03.2019
Zugehörige Institution(en) am KIT Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
Publikationstyp Hochschulschrift
Jahr 2019
Sprache Englisch
Identifikator urn:nbn:de:swb:90-919508
KITopen-ID: 1000091950
Verlag KIT, Karlsruhe
Umfang IX, 163 S.
Abschlussart Dissertation
Fakultät Fakultät für Wirtschaftswissenschaften (WIWI)
Institut Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
Prüfungsdatum 07.11.2018
Referent/Betreuer PD Dr. A. Rettinger
Schlagworte multi-view representation learning, cross-lingual word embeddings, cross-modal retrieval, image caption generation
