|
A FUZZY SIMILARITY APPROACH FOR AUTO-MATED SPAM FILTERING AND NA VE BAYES CLASSIFIERKeywords: Spam detection , e-mail abstraction , near-duplicate matching. Abstract: E-mail communication is indispensable nowadays, but the e-mail spam problem continues growing drastically. In re-cent years, the notion of collaborative spam detection sys-tem with a novel e-mail abstraction scheme with near-duplicate matching scheme has been widely discussed. The primary idea of the similarity matching for spam detection is to maintain a known spam database, On purpose of achieving efficient similarity matching and reducing sto-rage utilization, prior works mainly represent each e-mail by a succinct abstraction derived from e-mail content text. However, these abstractions of e-mails cannot fully catch the evolving nature of spams, and are thus not effective enough in near-duplicate detection. In this paper, we pro-pose a novel e-mail abstraction scheme, which considers e-mail layout structure to represent e-mails. We present a pro-cedure to generate the e-mail abstraction using HTML con-tent in e-mail,imap,pop3 and this newly devised abstrac-tion can more effectively capture the A Fuzzy Similarity Aproch For Automated Spam Filtering And Na ve Bayes Classifier is a near-duplicate phenomenon of spams.
|