Search Results: 1 - 10 of 100 matches for " "
All listed articles are free for downloading (OA Articles)
Page 1 /100
Display every page Item
Machine learning and word sense disambiguation in the biomedical domain: design and evaluation issues
Hua Xu, Marianthi Markatou, Rositsa Dimova, Hongfang Liu, Carol Friedman
BMC Bioinformatics , 2006, DOI: 10.1186/1471-2105-7-334
Abstract: Experiments were designed to measure the effect of "sample size" (i.e. size of the datasets), "sense distribution" (i.e. the distribution of the different meanings of the ambiguous word) and "degree of difficulty" (i.e. the measure of the distances between the meanings of the senses of an ambiguous word) on the performance of WSD classifiers. Support Vector Machine (SVM) classifiers were applied to an automatically generated data set containing four ambiguous biomedical abbreviations: BPD, BSA, PCA, and RSV, which were chosen because of varying degrees of differences in their respective senses. Results showed that: 1) increasing the sample size generally reduced the error rate, but this was limited mainly to well-separated senses (i.e. cases where the distances between the senses were large); in difficult cases an unusually large increase in sample size was needed to increase performance slightly, which was impractical, 2) the sense distribution did not have an effect on performance when the senses were separable, 3) when there was a majority sense of over 90%, the WSD classifier was not better than use of the simple majority sense, 4) error rates were proportional to the similarity of senses, and 5) there was no statistical difference between results when using a 5-fold or 10-fold cross-validation method. Other issues that impact performance are also enumerated.Several different independent aspects affect performance when using ML techniques for WSD. We found that combining them into one single result obscures understanding of the underlying methods. Although we studied only four abbreviations, we utilized a well-established statistical method that guarantees the results are likely to be generalizable for abbreviations with similar characteristics. The results of our experiments show that in order to understand the performance of these ML methods it is critical that papers report on the baseline performance, the distribution and sample size of the senses in the dat
Research on Unsupervised Word Sense Disambiguation

WANG Rui-Qin,KONG Fan-Sheng,

软件学报 , 2009,
Abstract: The goal of this paper is to give a brief summary of the current unsupervised word sense disambiguation techniques in order to facilitate future research. First of all, the significance of unsupervised word sense disambiguation study is introduced. Then, key techniques of various unsupervised word sense disambiguation studies at home and abroad are reviewed, including data sources, disambiguation methods, evaluation system and the achieved performance. Finally, 14 novel unsupervised word sense disambiguation methods are summarized, and the existing research and possible direction for the development of unsupervised word sense disambiguation study are pointed out.
Logarithm Model Based Word Sense Disambiguation

ZHU Jing bo,LI Heng,ZHANG Yue,YAO Tian shun,

软件学报 , 2001,
Abstract: In this paper, a method for automatic word sense disambiguation based on logarithm model (LM) is discussed, and a word sense disambiguation system LM_WSD is implemented. In the experiments, four models are used to word sense disambiguation. Experiments showed the effect of high-frequency sense, salient words, specialized field and general usage to noun and verb word sense disambiguation. Now the system LM_WSD was applied in a word based English-Chinese machine translation system for car fittings field, and improved the performance of the system.
Word Sense Disambiguation in Information Retrieval  [PDF]
Francis de la C. Fernández REYES, Exiquio C. Pérez LEYVA, Rogelio Lau FERNáNDEZ
Intelligent Information Management (IIM) , 2009, DOI: 10.4236/iim.2009.12018
Abstract: The natural language processing has a set of phases that evolves from lexical text analysis to the pragmatic one in which the author’s intentions are shown. The ambiguity problem appears in all of these tasks. Previous works tries to do word sense disambiguation, the process of assign a sense to a word inside a specific context, creating algorithms under a supervised or unsupervised approach, which means that those algorithms use or not an external lexical resource. This paper presents an approximated approach that combines not supervised algorithms by the use of a classifiers set, the result will be a learning algorithm based on unsupervised methods for word sense disambiguation process. It begins with an introduction to word sense disambiguation concepts and then analyzes some unsupervised algorithms in order to extract the best of them, and combines them under a supervised approach making use of some classifiers.
Corpus-Based Word Sense Disambiguation  [PDF]
Atsushi Fujii
Computer Science , 1998,
Abstract: Resolution of lexical ambiguity, commonly termed ``word sense disambiguation'', is expected to improve the analytical accuracy for tasks which are sensitive to lexical semantics. Such tasks include machine translation, information retrieval, parsing, natural language understanding and lexicography. Reflecting the growth in utilization of machine readable texts, word sense disambiguation techniques have been explored variously in the context of corpus-based approaches. Within one corpus-based framework, that is the similarity-based method, systems use a database, in which example sentences are manually annotated with correct word senses. Given an input, systems search the database for the most similar example to the input. The lexical ambiguity of a word contained in the input is resolved by selecting the sense annotation of the retrieved example. In this research, we apply this method of resolution of verbal polysemy, in which the similarity between two examples is computed as the weighted average of the similarity between complements governed by a target polysemous verb. We explore similarity-based verb sense disambiguation focusing on the following three methods. First, we propose a weighting schema for each verb complement in the similarity computation. Second, in similarity-based techniques, the overhead for manual supervision and searching the large-sized database can be prohibitive. To resolve this problem, we propose a method to select a small number of effective examples, for system usage. Finally, the efficiency of our system is highly dependent on the similarity computation used. To maximize efficiency, we propose a method which integrates the advantages of previous methods for similarity computation.
Exemplar-Based Word Sense Disambiguation: Some Recent Improvements  [PDF]
Hwee Tou Ng
Computer Science , 1997,
Abstract: In this paper, we report recent improvements to the exemplar-based learning approach for word sense disambiguation that have achieved higher disambiguation accuracy. By using a larger value of $k$, the number of nearest neighbors to use for determining the class of a test example, and through 10-fold cross validation to automatically determine the best $k$, we have obtained improved disambiguation accuracy on a large sense-tagged corpus first used in \cite{ng96}. The accuracy achieved by our improved exemplar-based classifier is comparable to the accuracy on the same data set obtained by the Naive-Bayes algorithm, which was reported in \cite{mooney96} to have the highest disambiguation accuracy among seven state-of-the-art machine learning algorithms.
Applying Deep Belief Networks to Word Sense Disambiguation  [PDF]
Peratham Wiriyathammabhum,Boonserm Kijsirikul,Hiroya Takamura,Manabu Okumura
Computer Science , 2012,
Abstract: In this paper, we applied a novel learning algorithm, namely, Deep Belief Networks (DBN) to word sense disambiguation (WSD). DBN is a probabilistic generative model composed of multiple layers of hidden units. DBN uses Restricted Boltzmann Machine (RBM) to greedily train layer by layer as a pretraining. Then, a separate fine tuning step is employed to improve the discriminative power. We compared DBN with various state-of-the-art supervised learning algorithms in WSD such as Support Vector Machine (SVM), Maximum Entropy model (MaxEnt), Naive Bayes classifier (NB) and Kernel Principal Component Analysis (KPCA). We used all words in the given paragraph, surrounding context words and part-of-speech of surrounding words as our knowledge sources. We conducted our experiment on the SENSEVAL-2 data set. We observed that DBN outperformed all other learning algorithms.
Word Sense Disambiguation: An Empirical Survey
J. Sreedhar,S. Viswanadha Raju,A. Vinaya Babu,Amjan Shaik
International Journal of Soft Computing & Engineering , 2012,
Abstract: Word Sense Disambiguation(WSD) is a vital area which is very useful in today’s world. Many WSD algorithms are available in literature, we have chosen to opt for an optimal and portable WSD algorithms. We are discussed the supervised, unsupervised, and knowledge-based approaches for WSD. This paper will also furnish an idea of few of the WSD algorithms and their performances, Which compares and asses the need of the word sense disambiguity.
What is word sense disambiguation good for?  [PDF]
Adam Kilgarriff
Computer Science , 1997,
Abstract: Word sense disambiguation has developed as a sub-area of natural language processing, as if, like parsing, it was a well-defined task which was a pre-requisite to a wide range of language-understanding applications. First, I review earlier work which shows that a set of senses for a word is only ever defined relative to a particular human purpose, and that a view of word senses as part of the linguistic furniture lacks theoretical underpinnings. Then, I investigate whether and how word sense ambiguity is in fact a problem for different varieties of NLP application.
Word Sense Disambiguation by Web Mining for Word Co-occurrence Probabilities  [PDF]
Peter D. Turney
Computer Science , 2004,
Abstract: This paper describes the National Research Council (NRC) Word Sense Disambiguation (WSD) system, as applied to the English Lexical Sample (ELS) task in Senseval-3. The NRC system approaches WSD as a classical supervised machine learning problem, using familiar tools such as the Weka machine learning software and Brill's rule-based part-of-speech tagger. Head words are represented as feature vectors with several hundred features. Approximately half of the features are syntactic and the other half are semantic. The main novelty in the system is the method for generating the semantic features, based on word \hbox{co-occurrence} probabilities. The probabilities are estimated using the Waterloo MultiText System with a corpus of about one terabyte of unlabeled text, collected by a web crawler.
Page 1 /100
Display every page Item

Copyright © 2008-2017 Open Access Library. All rights reserved.