|
Polibits 2011
Low Cost Construction of a Multilingual Lexicon from Bilingual ListsKeywords: lexical resources, multilingual lexicon, under-resourced languages. Abstract: manually constructing multilingual translation lexicons can be very costly, both in terms of time and human effort. although there have been many efforts at (semi-)automatically merging bilingual machine readable dictionaries to produce a multilingual lexicon, most of these approaches place quite specific requirements on the input bilingual resources. unfortunately, not all bilingual dictionaries fulfil these criteria, especially in the case of under-resourced language pairs. we describe a low cost method for constructing a multilingual lexicon using only simple lists of bilingual translation mappings. the method is especially suitable for under-resourced language pairs, as such bilingual resources are often freely available and easily obtainable from the internet, or digitised from simple, conventional paper-based dictionaries. the precision of random samples of the resultant multilingual lexicon is around 0.70-0.82, while coverage for each language, precision and recall can be controlled by varying threshold values. given the very simple input resources, our results are encouraging, especially in incorporating under-resourced languages into multilingual lexical resources.
|