Home OALib Journal OALib PrePrints Submit Ranking News My Lib FAQ About Us Follow Us+
 Title Keywords Abstract Author All
Search Results: 1 - 10 of 100 matches for " "
 Page 1 /100 Display every page 5 10 20 Item
 Computer Science , 1996, Abstract: Multiplicative weight-updating algorithms such as Winnow have been studied extensively in the COLT literature, but only recently have people started to use them in applications. In this paper, we apply a Winnow-based algorithm to a task in natural language: context-sensitive spelling correction. This is the task of fixing spelling errors that happen to result in valid words, such as substituting {\it to\/} for {\it too}, {\it casual\/} for {\it causal}, and so on. Previous approaches to this problem have been statistics-based; we compare Winnow to one of the more successful such approaches, which uses Bayesian classifiers. We find that: (1)~When the standard (heavily-pruned) set of features is used to describe problem instances, Winnow performs comparably to the Bayesian method; (2)~When the full (unpruned) set of features is used, Winnow is able to exploit the new features and convincingly outperform Bayes; and (3)~When a test set is encountered that is dissimilar to the training set, Winnow is better than Bayes at adapting to the unfamiliar test set, using a strategy we will present for combining learning on the training set with unsupervised learning on the (noisy) test set.
 Andrew R. Golding Computer Science , 1996, Abstract: Two classes of methods have been shown to be useful for resolving lexical ambiguity. The first relies on the presence of particular words within some distance of the ambiguous target word; the second uses the pattern of words and part-of-speech tags around the target word. These methods have complementary coverage: the former captures the lexical atmosphere'' (discourse topic, tense, etc.), while the latter captures local syntax. Yarowsky has exploited this complementarity by combining the two methods using decision lists. The idea is to pool the evidence provided by the component methods, and to then solve a target problem by applying the single strongest piece of evidence, whatever type it happens to be. This paper takes Yarowsky's work as a starting point, applying decision lists to the problem of context-sensitive spelling correction. Decision lists are found, by and large, to outperform either component method. However, it is found that further improvements can be obtained by taking into account not just the single strongest piece of evidence, but ALL the available evidence. A new hybrid method, based on Bayesian classifiers, is presented for doing this, and its performance improvements are demonstrated.
 Computer and Information Science , 2012, DOI: 10.5539/cis.v5n3p37 Abstract: In computing, spell checking is the process of detecting and sometimes providing spelling suggestions for incorrectly spelled words in a text. Basically, a spell checker is a computer program that uses a dictionary of words to perform spell checking. The bigger the dictionary is, the higher is the error detection rate. The fact that spell checkers are based on regular dictionaries, they suffer from data sparseness problem as they cannot capture large vocabulary of words including proper names, domain-specific terms, technical jargons, special acronyms, and terminologies. As a result, they exhibit low error detection rate and often fail to catch major errors in the text. This paper proposes a new context-sensitive spelling correction method for detecting and correcting non-word and real-word errors in digital text documents. The approach hinges around data statistics from Google Web 1T 5-gram data set which consists of a big volume of n-gram word sequences, extracted from the World Wide Web. Fundamentally, the proposed method comprises an error detector that detects misspellings, a candidate spellings generator based on a character 2-gram model that generates correction suggestions, and an error corrector that performs contextual error correction. Experiments conducted on a set of text documents from different domains and containing misspellings, showed an outstanding spelling error correction rate and a drastic reduction of both non-word and real-word errors. In a further study, the proposed algorithm is to be parallelized so as to lower the computational cost of the error detection and correction processes.
 Computer Science , 1996, Abstract: This paper addresses the problem of correcting spelling errors that result in valid, though unintended words (such as peace'' and piece'', or quiet'' and quite'') and also the problem of correcting particular word usage errors (such as amount'' and number'', or among'' and between''). Such corrections require contextual information and are not handled by conventional spelling programs such as Unix `spell'. First, we introduce a method called Trigrams that uses part-of-speech trigrams to encode the context. This method uses a small number of parameters compared to previous methods based on word trigrams. However, it is effectively unable to distinguish among words that have the same part of speech. For this case, an alternative feature-based method called Bayes performs better; but Bayes is less effective than Trigrams when the distinction among words depends on syntactic constraints. A hybrid method called Tribayes is then introduced that combines the best of the previous two methods. The improvement in performance of Tribayes over its components is verified experimentally. Tribayes is also compared with the grammar checker in Microsoft Word, and is found to have substantially higher performance.
 计算机科学技术学报 , 2002, Abstract: The researches on spelling correction aiming at detecting errors in texts tend to focus on context-sensitive spelling error correction, which is more difficult than traditional isolated-word error correction. A novel and efficient algorithm for the system of Chinese spelling error correction, CInsunSpell, is presented. In this system, the work of correction includes two parts: checking phase and correcting phase. At the first phase, a Trigram algorithm within one fixed-size window is designed to locate potential errors in local area. The second phase employs a new method of automatically and dynamically distributing weights among the characters in the confusion set as well as in the Bayesian language model. The tactics used above exhibits good performances.
 Computer Science , 1998, Abstract: The study presented here relies on the integrated use of different kinds of knowledge in order to improve first-guess accuracy in non-word context-sensitive correction for general unrestricted texts. State of the art spelling correction systems, e.g. ispell, apart from detecting spelling errors, also assist the user by offering a set of candidate corrections that are close to the misspelled word. Based on the correction proposals of ispell, we built several guessers, which were combined in different ways. Firstly, we evaluated all possibilities and selected the best ones in a corpus with artificially generated typing errors. Secondly, the best combinations were tested on texts with genuine spelling errors. The results for the latter suggest that we can expect automatic non-word correction for all the errors in a free running text with 80% precision and a single proposal 98% of the times (1.02 proposals on average).
 Computer Science , 2012, Abstract: At the present time, computers are employed to solve complex tasks and problems ranging from simple calculations to intensive digital image processing and intricate algorithmic optimization problems to computationally-demanding weather forecasting problems. ASR short for Automatic Speech Recognition is yet another type of computational problem whose purpose is to recognize human spoken speech and convert it into text that can be processed by a computer. Despite that ASR has many versatile and pervasive real-world applications,it is still relatively erroneous and not perfectly solved as it is prone to produce spelling errors in the recognized text, especially if the ASR system is operating in a noisy environment, its vocabulary size is limited, and its input speech is of bad or low quality. This paper proposes a post-editing ASR error correction method based on MicrosoftN-Gram dataset for detecting and correcting spelling errors generated by ASR systems. The proposed method comprises an error detection algorithm for detecting word errors; a candidate corrections generation algorithm for generating correction suggestions for the detected word errors; and a context-sensitive error correction algorithm for selecting the best candidate for correction. The virtue of using the Microsoft N-Gram dataset is that it contains real-world data and word sequences extracted from the web which canmimica comprehensive dictionary of words having a large and all-inclusive vocabulary. Experiments conducted on numerous speeches, performed by different speakers, showed a remarkable reduction in ASR errors. Future research can improve upon the proposed algorithm so much so that it can be parallelized to take advantage of multiprocessor and distributed systems.
 Kemal Oflazer Computer Science , 1994, Abstract: This paper presents an approach to spelling correction in agglutinative languages that is based on two-level morphology and a dynamic programming based search algorithm. Spelling correction in agglutinative languages is significantly different than in languages like English. The concept of a word in such languages is much wider that the entries found in a dictionary, owing to {}~productive word formation by derivational and inflectional affixations. After an overview of certain issues and relevant mathematical preliminaries, we formally present the problem and our solution. We then present results from our experiments with spelling correction in Turkish, a Ural--Altaic agglutinative language. Our results indicate that we can find the intended correct word in 95\% of the cases and offer it as the first candidate in 74\% of the cases, when the edit distance is 1.
 Dan Roth Computer Science , 1998, Abstract: We analyze a few of the commonly used statistics based and machine learning algorithms for natural language disambiguation tasks and observe that they can be re-cast as learning linear separators in the feature space. Each of the methods makes a priori assumptions, which it employs, given the data, when searching for its hypothesis. Nevertheless, as we show, it searches a space that is as rich as the space of all linear separators. We use this to build an argument for a data driven approach which merely searches for a good linear separator in the feature space, without further assumptions on the domain or a specific problem. We present such an approach - a sparse network of linear separators, utilizing the Winnow learning algorithm - and show how to use it in a variety of ambiguity resolution problems. The learning approach presented is attribute-efficient and, therefore, appropriate for domains having very large number of attributes. In particular, we present an extensive experimental comparison of our approach with other methods on several well studied lexical disambiguation tasks such as context-sensitive spelling correction, prepositional phrase attachment and part of speech tagging. In all cases we show that our approach either outperforms other methods tried for these tasks or performs comparably to the best.
 L. Amber Wilcox-O'Hearn Computer Science , 2014, Abstract: Real-word spelling correction differs from non-word spelling correction in its aims and its challenges. Here we show that the central problem in real-word spelling correction is detection. Methods from non-word spelling correction, which focus instead on selection among candidate corrections, do not address detection adequately, because detection is either assumed in advance or heavily constrained. As we demonstrate in this paper, merely discriminating between the intended word and a random close variation of it within the context of a sentence is a task that can be performed with high accuracy using straightforward models. Trigram models are sufficient in almost all cases. The difficulty comes when every word in the sentence is a potential error, with a large set of possible candidate corrections. Despite their strengths, trigram models cannot reliably find true errors without introducing many more, at least not when used in the obvious sequential way without added structure. The detection task exposes weakness not visible in the selection task.
 Page 1 /100 Display every page 5 10 20 Item