跨語言資訊檢索:理論、技術與應用 | Cross-Language Information Retrieval: Theories and Technologies

頁次:19-32 多語性是網路社會的重要特徵之一,如何將網路資源,介紹給不同語言的使用者,同時吸收其他語言所呈現的資訊,都是資訊國際化不能忽略的重要課題。跨語言資訊檢索,提供使用者以某種語言檢索另外一種語言表達的文件,為近年來很活躍的研究題目之一。本文嘗試將這個研究主題相關的理論和技術,介紹給有興趣的讀者。首先探討詢問翻譯、文件翻譯、和不翻譯等三類基本方法。接著考慮翻譯歧義性和目標多義性,以及專有名詞音譯等進階方法。評估是促進技術進步的必要工作,本文最後也介紹跨語言資訊檢索三大評比:TREC、 CLEF、與 NTCIR。 Multilinguality is one of the major characteristics in network era. The trend toward information globalization has brought new challenges for in-formation management. On the one hand, it is often necessary to share the valuable resources on the web with users of different languages. On the other hand, it is also necessary for a user to utilize knowledge presented in a foreign language. This paper introduces related theories and technologies of cross language information retrieval, which is kernel in multilingual information management. The basic concepts are presented in sequence on the basis of the classification of query translation, document translation, and no translation. Besides, some advanced concepts like translation ambiguity and target polysemy, as well as proper name transliteration are discussed. Performance evaluation is indispensable for improvement. This paper also shows three world-wide IR evaluation, including TREC, CLEF and NTCIR.


