OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

自动化学报 2012

邮件网络协同过滤机制研究

DOI: 10.3724/SP.J.1004.2012.00399, PP. 399-411

杨震, 赖英旭, 段立娟, 李玉鑑, 许昕

Keywords: 文本分类,邮件过滤,邮件网络,协同过滤

Full-Text Cite this paper Add to My Lib

Abstract:

？基于Enron邮件集合探索真实邮件网络,揭示出邮件网络的无标度特性和有限小世界特性.在此基础上,依据用户间交互强度设计出垃圾邮件协同过滤机制,通过调整参数λ,用户可以决定主要是依靠自己还是其他用户协同进行垃圾信息过滤.算法即使在没有对用户个人阅读习惯充分训练的情况下,也可以通过基于交互强度的网络协同方式实现良好过滤.同时为了解决Enron数据集缺乏标注的情况,基于训练样本集W和测试样本集T独立同分布的假设,利用改进的EM(Expectationmaximization)算法最小化W∪T集合上风险函数,给出了未知样本的一个良好标注.真实数据上的实验表明,同单机过滤和集成过滤方法相比,协同过滤能够提高平均过滤精度且方法简单易行.

References

[1]	Luo Hao, Fang Bin-Xing, Tang Jian-Qi. Spam mail and process method. Telecommunications Science, 2006, (2): 48-52(罗浩, 方滨兴, 唐剑琪. 垃圾邮件问题及其处理方法. 电信科学, 2006, (2): 48-52)
[2]	Anti-spam center of ISC. 2006 2Q Anti-spam investigation report [Online], available: http://anti-spam.cn/pdf/ 2010_02_report.pdf, 2010(中国互联网协会反垃圾邮件(信息)中心. 2010年第二季度中国反垃圾邮件状况调查报告 [Online], available: http://anti-spam.cn/pdf/2010_02_report.pdf, 2010)
[3]	Drucker H, Wu D H, Vapnik V N. Support vector machines for spam categorization. IEEE Transactions on Neural Networks, 1999, 10(5): 1048-1054
[4]	Xu H, Yu B. Automatic thesaurus construction for spam filtering using revised back propagation neural network. Expert Systems with Applications, 2010, 37(1): 18-23
[5]	Tich P T, Nquyen T T, Tsai P, Kong X Y. BSPNN: boosted subspace probabilistic neural network for email security. Artificial Intelligence Review, 2011, 35(4): 369-382
[6]	Ma S C, Shi H B. Tree-augmented naive Bayes ensembles. In: Proceedings of the International Conference on Machine Learning and Cybernetics. Shanghai, China: IEEE, 2004. 1497-1502
[7]	Yerazunis W S, Chhabra S, Siefkes C, Assis F, GunopulosD. A unified model of spam filtration [Online], available: http://www.merl.com/reports/docs/TR2005-085.pdf, January 10, 2012
[8]	The SpamCop. How to implement the SCBL [Online], available: http://www.spamcop.net/fom-serve/cache/291.html, January 10, 2012
[9]	The SpamAssassin. The apache spamassassin project [Online], available: http://spamassassin.apache.org/, January 10, 2012
[10]	Watts D J, Strogatz S H. Collective dynamics of "small-world" networks. Nature, 393(6684): 440-442
[11]	Strogatz S H. Exploring complex networks. Nature, 2001, 410(6825): 268-276
[12]	Zhou Tao, Fu Zhong-Qian, Niu Yong-Wei, Wang Da, Zeng Yan, Wang Bing-Hong, Zhou Pei-Ling. Complex networks dynamics research. Progress in Natural Science, 2005, 15(5): 513-518(周涛, 傅忠谦, 牛永伟, 王达, 曾燕, 汪秉宏, 周佩玲. 复杂网络上传播动力学研究综述. 自然科学进展, 2005, 15(5): 513-518)
[13]	Chen Yi-Song, Wang Guo-Ping, Dong Shi-Hai. A progressive transductive inference algorithm based on support vector machine. Journal of Software, 2006, 14(3): 451-460(陈毅松, 汪国平, 董士海. 基于支持向量机的渐进直推式分类学习. 软件学报, 2006, 14(3): 451-460)
[14]	Kothari R, Jain V. Learning from labeled and unlabeled data using a minimal number of queries. IEEE Transactions on Neural Networks, 2003, 14(6): 1496-1505
[15]	Nigam K, McCallum A K, Thrun S, Mitchell T. Text classification from labeled and unlabeled documents using EM. Machine Learning, 2000, 39(2-3): 103-134
[16]	Yang Z, Wang J, Xu W, Guo J. Combining labeled and unlabeled data for spam classification. In: Proceedings of the International Conference on Complex Systems and Applications. Jinan, China: Watam Press, 2007. 1476-1479
[17]	Breiman L. Bagging predictors. Machine Learning, 1996, 24(2): 123-140
[18]	Zeimpekis D, Gallopoulos E. TMG: a Matlab toolbox for generating term-document matrices from text collections. Grouping Multidimensional Data: Recent Advances in Clustering. Berlin: Springer, 2006. 187-210
[19]	Hambridge S, Lunde A. DON'T SPEW —— a set of guidelines for mass unsolicited mailings and postings (spam*) [Online], available: http://www.ietf.org/rfc/rfc2635.txt, January 6, 2012
[20]	Cormack G, Lynam T. TREC 2005 spam track overview. In: Proceedings of the 14th Text Retrieval Conference. Maryland, USA: NIST Special Publication, 2005. 1-17
[21]	Sahami M, Dumais S, Heckerman D, Horvitz E. A Bayesian approach to filtering junk e-mail [Online], available: http://robotics.stanford.edu/users/sahami/papers-dir/spam.pdf, January 10, 2012
[22]	Delany S J, Cunningham P. An analysis of case-base editing in a spam filtering system. In: Proceedings of the 7th European Conference on Advances in Case-based Reasoning. Madrid, Spain: Springer, 2004. 128-141
[23]	Eyharabide V, Amandi A. Semantic spam filtering from personalized ontologies. Journal of Web Engineering, 2008, 7(2): 158-176
[24]	Zhang H. The optimality of naive Bayes. In: Proceedings of the 17th International Florida Artificial Intelligence Research Society Conference. Florida, USA: AAAI Press, 2004. 562-567
[25]	Song Y, Kolcz A, Giles C L. Better naive Bayes classification for high-precision spam detection. Software: Practice and Experience, 2009, 39(11): 1003-1024
[26]	Zhang Ni, Jiang Yu, Fang Bin-Xing, Guo Li. Spam filtering algorithm based on geographic e-mail path analysis. Journal on Communications, 2007, 28(12): 90-95(张尼, 姜誉, 方滨兴, 郭莉. 基于邮件路径地理属性分析的垃圾邮件过滤算法. 通信学报, 2007, 28(12): 90-95)
[27]	The CBL. Composite blocking list [Online], available: http://cbl.abuseat.org/, January 10, 2012
[28]	Clayton R. How not to beat spam —— there will never be a complete technical solution to unsolicited email, claims Richard Clayton. It is now down to governments to regulate. New Scientist, 2003, 178(2401): 24-24
[29]	Ebel H, Mielsch L I, Bornholdt S. Scale-free topology of e-mail networks. Physical Review E, 2002, 66(3): 1-4
[30]	Klimt B, Yang Y. Introducing the Enron corpus [Online], available: http://ceas.cc/2004/168.pdf, January 4, 2012
[31]	Tenenbaum J, Silva V, Langford J. A global geometric framework for nonlinear dimensionality reduction. Science, 2000, 290(5500): 2319-2323
[32]	Plutowski M, White H. Selecting concise training sets from clean data. IEEE Transactions on Neural Networks, 1993, 4(2): 305-318
[33]	Zelikovitz S, Hirsh H. Improving text classification using EM with background text. In: Proceedings of the 18th Florida Artificial Intelligence Research Society Conference. Florida, USA: AAAI Press, 2004. 499-505
[34]	Vapnik V N. The Nature of Statistical Learning Theory (Second Edition). New York: Springer, 2000
[35]	Narasimhamurthy A. Theoretical bounds of majority voting performance for a binary classification problem. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2006, 27(12): 1988-1995
[36]	Lam L, Suen C Y. Optimal combination of pattern classifiers. Pattern Recognition Letters, 1995, 16(9): 945-954

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133