全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

非共现数据的二元化加权转化算法

, PP. 584-591

Keywords: 非共现数据,特征权重,信息瓶颈,面向范畴数据的序列化信息瓶颈(CD-sIB)算法,二元化转化

Full-Text   Cite this paper   Add to My Lib

Abstract:

面向范畴数据的序列化信息瓶颈算法(CD-sIB)假设数据各个属性特征对二元化转化的贡献均匀,从而影响转化效果。文中提出二元化加权转化方法来反映非共现数据的特征。该方法通过突出非共现数据的代表性属性,从抑制非代表性(冗余)属性,从而获取最佳共现表示。文中提出随机分布数据的适用性和计算方法的无监督性两个非共现加权原则,并基于加权粒度概念构造二元化加权转化算法。实验结果表明,文中算法的聚类精度优于其它算法。

References

[1]  Bekkerman R,El-Yaniv R,Tishby N. Distributional Word Clusters vs Words for Text Categorization. Journal of Machine Learning Research,2003,3: 1183-1208
[2]  Slonim N. The Information Bottleneck: Theory and Application. Ph.D Dissertation. Jerusalem,Israel: The Hebrew University of Jerusalem,2002
[3]  Ye Yangdong,He Xidian,Jia Limin. CD-sIB: A Kind of sIB Algorithm Orient to Categorical Data. Acta Electronica Sinica,2009,37(10): 2165-2172(in Chinese)(叶阳东,何锡点,贾利民.面向范畴类型数据的sIB算法.电子学报,2009,37(10): 2165-2172)
[4]  Seldin Y,Slonim N,Tishby N. Information Bottleneck for Non Co-Occurrence Data // Scholkpf B,Platt J C,Hoffman T,eds. Advances in Neural Information Processing Systems. Cambridge,USA: MIT Press,2007,XIX: 1241-1248
[5]  Shamir O,Sabato S,Tishby N. Learning and Generalization with the Information Bottleneck. Theoretical Computer Science,2010,411(29/30): 2696-2711
[6]  Yuan H Q,Ye Y D. Iterative sIB Algorithm. Pattern Recognition Letters,2011,32(4): 606-614
[7]  Xia Limin,Tan Liqiu,Zhong Hong. Semantic Annotation of Image Based on Information Bottleneck Method. Pattern Recognition and Artificial Intelligence,2008,21(6): 812-818(in Chinese)(夏利民,谭立球,钟 洪.基于信息瓶颈算法的图像语义标注.模式识别与人工智能,2008,21(6): 812-818)
[8]  van Rijsbergen C J. A Theoretical Basis for the Use of Co-occurrence Data in Information Retrieval. Journal of Documentation,1997,33(2): 106-119
[9]  Peat H J,Willett P. The Limitations of Term Co-occurrence Data for Query Expansion in Document Retrieval Systems. Journal of the American Society for Information Science,1991,42(5): 378-383
[10]  Andritsos P,Tsaparas P,Miller R J,et al. LIMBO: Scalable Clustering of Categorical Data // Proc of the 9th International Conference on Extending Database Technology. Heraklion,Greece,2004: 531-532
[11]  Sebastiani F. Machine Learning in Automated Text Categori zation. ACM Computing Surveys,2002,34(1): 1-47
[12]  Cost S,Salzberg S. A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features. Machine Learning,1973,10(1): 57-78
[13]  Joachims T. A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization // Proc of the 14th International Conference on Machine Learning.San Francisco,USA:Morgan Kaufmann Publishers,1997: 143-151
[14]  Han E H,Karypis G, Kumar V. Text Categorization Using Weight-Adjusted k-Nearest Neighbor Classification // Proc of the Asia Conference on Knowledge Discovery and Data Mining. Hong Kong,China,2001: 53-65
[15]  Shankar S,Karypis G. A Feature Weight Adjustment Algorithm for Document Categorization // Proc of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York,USA: ACM Press,2000
[16]  Debole F,Sebastiani F. Supervised Term Weighting for Automated Text Categorization // Proc of the ACM Symposium on Applied Computing. Melbourne,USA,2003: 781-788
[17]  Gibson D,Kleinberg J,Raghavan P. Clustering Categorical Data: An Approach Based on Dynamical Systems // Proc of the International Conference on Very Large Data Bases. San Francisco,USA,1998: 311-322
[18]  Yates R B,Neto B R.Modern Information Retrieval. New York,USA: Addison-Wesley-Longman,1999

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133