|
计算机科学 2006
Text Classification Based on Maximal Association Rule
|
Abstract:
We propose a novel association based method called SAT-MOD for text classification. While previous methods mainly mined frequently co-occurring words (frequent itemsets) at the document-level, the basic semantic unit in a document is a sentence. Words within the same sentence are typically more semantically related than words that appear in the same document. Our proposed SAT-MOD views a sentence rather than a document as a transaction. The effectiveness of proposed SAT-MOD method has been demonstrated by extensive experimental studies using popular benchmark text collections.