WordNet is a large lexical database for the English language, a combination of dictionary and thesaurus. Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonym rings (synsets), each expressing a distinct concept. Synsets are interlinked by means of conceptual-semantic and lexical relations. It is accessible to human users via a web browser, but its primary use is in automatic natural language processing and artificial intelligence applications.
The database (lexicographer files) and software tools (compiler called grind and reverse morphology program called morphy) have been released under a BSD style license and are freely available for download from the WordNet website. The database contains about 160.000 words, organized in about 120.000 synsets, for a total of about 200.000 word-sense pairs (see detailed statistics). The current version 3.1 has a size of about 12 MB in compressed form.
WordNet was created in the Cognitive Science Laboratory of Princeton University under the direction of psychology professor George Armitage Miller, starting in 1985, and has been directed in recent years by Christiane Fellbaum.
Christiane Fellbaum, together with Piek Vossen, founded in 2000 the Global WordNet Association.
Global WordNet Association
GWA (Global WordNet Association) is a free, public and non-commercial organization that provides a platform for discussing, sharing and connecting wordnets for all languages in the world. A list of wordnets in other languages are published on the GWA website. Wordnets of the neighbouring countries of Luxembourg are listed hereafter :
- german : GermaNet, Universität Tübingen, open for academic use
- french : WoNeF, CEA-LIST, open
- french : EuroWordNet, Université d’Avignon, Memodata, restricted
- french : WOLF, Université Paris Diderot, open
- dutch : EuroWordNet, University of Amsterdam, restricted
- dutch : Cornetto, University of Amsterdam, restricted
- english : EuroWordNet, University of Sheffield, restricted
The first GWA conference (GWC2002) was organized in January 2002 in Mysore, India. The most recent conference (GWC2014) was organized in Tartu, Estonia.
A major project of the GWA is the creation of a completely free worldwide wordnet grid, build around a shared set of concepts, such as the Common Base Concepts, and the Suggested Upper Merged Ontology (SUMO) owned by the IEEE.
The Suggested Upper Merged Ontology (SUMO) and its domain ontologies form the largest formal public ontology in existence today. They are being used for research and applications in search, linguistics and reasoning. SUMO is the only formal ontology that has been mapped to all of the WordNet lexicons. The Technical editor of SUMO is Adam Pease.
Verena Heinrich from the University of Tübingen created a few images for GermaNet which visualize examples of WordNet relations. These copyrighted pictures are used here with permission.
WordNet Search Results
The following figures show the results of WordNet searches for the term
pedestrian = piéton = Fussgänger
ImageNet is an image database organized according to the WordNet hierarchy (currently only the nouns), in which each node of the hierarchy is depicted by hundreds and thousands of images (an average of over five hundred images per node).
ImageNet does not own the copyright of the images. ImageNet only provides thumbnails and URLs of images, in a way similar to what image search engines do, by compiling an accurate list of web images for each synset of WordNet. The list is freely available.
ImageNet provides the download of SIFT (Scale-Invariant Feature Transform) features, of object bounding boxes for about 1 million pictures and of object attributes, both annotated and verified through Amazon Mechanical Turk.
ImageNet is managed by a research team from the universities of Stanford, Princeton, Michigan and North-Carolina. The project is sponsored by the Stanford Vision Lab, Stanford University, Princeton University, Google Research and A9, a subsidiary of Amazon.com based in Palo Alto, California, that develops search and advertising technology.
The following figure shows the results of the search for pedestrian in the ImageNet database.
For comparison, the results of a Google Image Search for the same term pedestrian is shown below :
Started in 2010 (ILSVRC2010), the ImageNet Team organizes an annual challenge to measure improvements in the state of machine vision technology.
Large Scale Visual Recognition Challenge
The Large Scale Visual Recognition Challenge is based on pattern recognition software that can be trained to recognize objects in digital images and is made possible by the ImageNet database.
In 2012 (ILSVR2012) the contest was won by Geoffrey E. Hinton, a cognitive scientist at the University of Toronto, and his students Alex Krizhevsky and Ilya Sutskever. All three joined Google in 2013.
In 2014 (ILSVR2014), the challenge drew 38 entrants from 13 countries. The groups used advanced software, in most cases modeled loosely on the biological vision systems, to detect, locate and classify a huge set of images taken from Internet sources. Contestants run their recognition programs on high-performance computers based in many cases on specialized processors called GPUs, for graphic processing units. All of the entrants used a variant of an approach known as a convolutional neural network, an approach first refined in 1998 by Yann LeCun, a French computer scientist who recently became director of artificial intelligence research at Facebook.
The results of the 2014 challenge have been published at the ImageNet website.