UWN / MENTA: Towards a Universal Multilingual Wordnet
Overview
UWN is an automatically constructed multilingual lexical knowledge base based on WordNet.
The English language represents a constantly decreasing fraction of the Web. China and the EU each have greatly surpassed the U.S. in the number of Internet users, and other regions are expected to follow. Multilingual knowledge bases address this development by providing labels in multiple languages and making the semantic connections between words and names in different languages explicit.
For over 1,500,000 words in over 200 languages, UWN provides a corresponding list of meanings and shows how such meanings are semantically related. Additionally, the new MENTA extension adds a large-scale hierarchical taxonomy of named entities and their classes, drawing on over 200 different language editions of Wikipedia. This leads to a knowledge base with over 15 million words and names in different languages.
Example
For instance, a word like "board" could refer to a wooden panel, to a committee, to a blackboard, as a verb to the process of getting on a vehicle (e.g. "to board a plane"), and so on. For each of these meanings one can obtain the corresponding words in different languages, e.g. the committee sense of "board" corresponds to комитет in Russian and 委員会 in Japanese. Additionally, meanings are connected to related meanings, e.g. the committee meaning is linked to its generalizations administrative unit, social group, etc., and for each of these meanings one can again obtain corresponding words in different languages.
Query
We offer a User Interface that allows you to search and browse UWN.
People
- de Melo, Gerard
- Weikum, Gerhard
Publications
- Towards a Universal Wordnet by Learning from Combined Evidence PDF BibTeX
Gerard de Melo, Gerhard Weikum (2009)
In: Proc. 18th ACM Conference on Information and Knowledge Management (CIKM 2009), Hong Kong, China. - MENTA: Inducing Multilingual Taxonomies from Wikipedia
Gerard de Melo, Gerhard Weikum (2010)
In: Proc. 19th ACM Conference on Information and Knowledge Management (CIKM 2010), Toronto, Canada. ACM, New York, USA. - Untangling the Cross-Lingual Link Structure of Wikipedia PDF BibTeX
Gerard de Melo, Gerhard Weikum (2010)
In: Proc. 48th Annual Meeting of the Association for Computational Linguistics (ACL 2010), Uppsala, Sweden. - More Publications
Downloads
Java Library
We provide a small Java library that can be used with one or more large plugins, which provide the complete data for offline use (i.e., no need to connect to our servers). More information (and new versions) will follow soon. In the meantime, please contact Gerard de Melo if you have any questions.
Important: A new version of uwnapi.zip was released on 2012-11-23. This version allows you to obtain statement weights and fixes a character encoding issue (thanks to Aya Zoghby for reporting this). Please upgrade!
- UWN library (2012-11-23)
- Princeton WordNet plugin
- UWN Core plugin
- MENTA plugin -- Coming soon
- Etymological Wordnet (2012-02-26) plugin
(more information)
How to use the library
Take a look at the example code.
Other programming languages
- Third-party Ruby wrapper
Raw Dump
Alternatively, you can work with a raw dump of the UWN Core. We provide a gzip-compressed TSV file, which is best decompressed on the fly while reading for best performance. Each line contains subject, predicate, object, and weight, separated by tabs.
- UWN Core. License: CC-BY-NC-SA 3.0.