In recent years, the ever-increasing quantities of entities in large knowledge bases on the Web, such as DBpedia, Freebase and YAGO, pose new challenges but at the same time open up new opportunities for intelligent information access. These knowledge bases (KBs) have become valuable resources in many research areas, such as natural language processing (NLP) and information retrieval (IR). Recently, almost every major commercial Web search engine has incorporated entities into their search process, including Google’s Knowledge Graph, Yahoo!’s Web of Objects and Microsoft’s Satori Graph/Bing Snapshots. The goal is to bridge the semantic gap between natural language text and formalized knowledge.
Within the context of globalization, multilingual and cross-lingual access to information has emerged as an issue of major interest. Nowadays, more and more people from different countries are connecting to the Internet, in particular the Web, and many users can understand more than one language. While the diversity of languages on the Web has been growing, for most people there is still very little content in their native language. As a c ... mehronsequence of the ability to understand more than one language, users are also interested in Web content in other languages than their mother tongue. There is an impending need for technologies that can help in overcoming the language barrier for multilingual and cross-lingual information access. In this thesis, we face the overall research question of how to allow for semantic-aware and cross-lingual processing of Web documents and user queries by leveraging knowledge bases.
With the goal of addressing this complex problem, we provide the following solutions: (1) semantic annotation for addressing the semantic gap between Web documents and knowledge; (2) semantic search for coping with the semantic gap between keyword queries and knowledge; (3) the exploitation of cross-lingual semantics for overcoming the language barrier between natural language expressions (i.e., keyword queries and Web documents) and knowledge for enabling cross-lingual semantic annotation and search. We evaluated these solutions and the results showed advances beyond the state-of-the-art. In addition, we implemented a framework of cross-lingual semantic annotation and search, which has been widely used for cross-lingual processing of media content in the context of our research projects.