Deirdre Carr, BA.Mod, M.Sc

Phone: (01) 704 5618

Word Sense Disambiguation in Machine Translation using WordNet

Supervisors: Andy Way,   Josef Van Genebith

Description: Lexical word sense disambiguation (WSD) has always been a major stumbling block in the area of machine translation. WSD involves selecting the correct meaning of a word or phrase where there are several options available. In Machine Translation this problem manifests itself in choosing the correct interpretation in the target language where the source language word has more than one meaning. Take, for example, the word run, one of the most polysemous verbs in the English language. In the sentence "John runs the business", run can be interpreted as to manage, and is translated in German by the verb führen. However, in the sentence "John runs the distance to work every day", run is translated as rennen, to move swiftly.

WordNet, an on-line lexical reference system whose design is inspired by current psycholinguistic theories of human lexical memory, can help in the resolution of these ambiguities in a number of ways. Firstly, the sheer volume of lexical information present in the database is highly beneficial, given that the creation of any kind of large-scale machine translation hinges on the availability and efficiency of a lexicon with sufficient coverage. Secondly, and most importantly for the problem of WSD, WordNet has a number of features which, when properly employed and augmented in some way, can help in the selection of the correct interpretation. These include verb sentence frames, which convey syntactic and some semantic information about how every verb present in the lexicon, should be used. However, the semantic information is limited at present to distinguishing between animate and inanimate ("somebody" versus "something"). This shortfall in information can be overcome by the presence of details about the context in which the verb occurs via the hyponym and meronym relations (isa and part-of). These relations supply information about the roles participating in an action and are crucial for disambiguation.

This research involves extracting the necessary information from WordNet to be used in conjunction with HPSG (Head-driven Phrase Structure Grammar) in order to construct a transfer-based machine translation system concentrating on the translation of ambiguous verbs. The translation will be from English to German. The expected availability of a German version of WordNet (GermaNet) will hopefully enable the system to be bi-directional.

Degree Sought: PhD.

Funding Source: School of Computer Applications

Expected Completion Date: March 1999


You can reach me by email at: dcarr@compapp.dcu.ie

Last Updated November 1997