Multilingual modeling of cross-lingual spelling variants

12 years 3 months ago
Multilingual modeling of cross-lingual spelling variants
Technical term translations are important for cross-lingual information retrieval. In many languages, new technical terms have a common origin rendered with different spelling of the underlying sounds, also known as cross-lingual spelling variants (CLSV). To find the best CLSV in a text database index, we contribute a formulation of the problem in a probabilistic framework, and implement this with an instance of the general edit distance using weighted finite-state transducers. Some training data is required when estimating the costs for the general edit distance. We demonstrate that after some basic training our new multilingual model is robust and requires little or no adaptation for covering additional languages, as the model takes advantage of language independent transliteration patterns. We train the model with medical terms in seven languages and test it with terms from varied domains in six languages. Two test languages are not in the training data. Against a large text databa...
Krister Lindén
Added 13 Dec 2010
Updated 13 Dec 2010
Type Journal
Year 2006
Where IR
Authors Krister Lindén
Comments (0)