|
计算机科学 2005
SAT-FOIL+: Sentence-Level Association Based Text Classification
|
Abstract:
While previous association based methods mainly mined frequently co-occurring words (frequent itemsets) at the document-level, the basic semantic unit in a document is actually a sentence. Words within the same sentence are typically more semantically related than words that just appear in the same document. Our proposed SAT-FOIL views a sentence rather than a document as a transaction. In this paper we proposed new score models to get the im- proved algorithm SAT-FOIL . The effectiveness of our proposed SAT-FOIL method has been demonstrated not only better than our former algorithm SAT-FOIL but also comparable to well-known alternatives and much better than previous document-level association based methods by extensive experimental studies using popular benchmark text collections Reuters. In addition, SAT-FOIL has inherent readability and refinability of acquired classification rules.