|
A Multi-Phase Feature Selection Approach for the Detection of SPAMKeywords: component , detection , feature selection , Na ve Bayesian classifiers. Abstract: In the past few years the Na ve Bayesian (NB) classifier has been trained automatically to detect spam (unsolicited bulk e-mail). The paper introduces a simple feature selection algorithm to construct a feature vector on which the classifier will be built. We conduct an experiment on SpamAssassin public email corpus to measure the performance of the NB classifier built on the feature vector constructed by the introduced algorithm against the feature vector constructed by the Mutual Information algorithm which is widely used in the literature. The effect of the stop-list and the phrases-list on the classifier performance was also investigated. The results of the experiment show that the introduced algorithm outperforms the Mutual Information algorithm.
|