|
WORD PREDICTOR USING NATURAL LANGUAGE GRAMMAR INDUCTION TECHNIQUEKeywords: Natural Language Grammatical Inference , K-Means Clustering , Support Vector Machines Abstract: Language is a unique phenomenon that distinguishes man from other animals. It is our primary method of communication with each other, yet very little is understood about how language is acquired when we are infants. A greater understanding in this area would have the potential to improve man machine communication. The problem that is attempted to be solved in this paper is that of programming a computer to play the Shannon Game. To play the Shannon game, one must predict which words are most likely to follow a given segment of English Text. Word Prediction would be most useful for writers with physical disabilities and severe spelling problems. The aim of this paper is to improve on existing results by writing a program that is capable of automatically inferring a grammar from a Natural Language Corpus, and applying this to the Shannon Game. To play the Shannon Game, a stochastic Grammar for an approximation to the target language must be inferred from a text sample, and as the quality of this grammar improves so too does the quality of the predictor that uses the inferred grammar. The proposed algorithm in the paper uses Support Vector Machine to perform the part of speech tagging which produces 97.6% correct predictions.
|