全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

维吾尔文无监督自动切分及无监督特征选择

, PP. 845-852

Keywords: 维吾尔文切分,互信息,t-测试差,邻接对熵,无监督特征选择

Full-Text   Cite this paper   Add to My Lib

Abstract:

维吾尔文常用切分方法会产生大量的语义抽象甚至多义的词特征,因此学习算法难以发现高维数据中隐藏的结构.提出一种无监督切分方法dme-TS和一种无监督特征选择方法UMRMR-UFS.dme-TS从大规模生语料中自动获取单词Bi-gram及上下文语境信息,并将相邻单词间的t-测试差、互信息及双词上下文邻接对熵的线性融合作为一个组合统计量(dme)来评价单词间的结合能力,从而将文本切分成语义具体的独立语言单位的特征集合.UMRMR-UFS用一种综合考虑最大相关度和最小冗余的无监督特征选择标准(UMRMR)来评价每一个特征的重要性,并将最重要的特征依次移入到特征子集中.实验结果表明dme-TS能有效控制原始特征集的规模,提高特征项本身的质量,用UMRMR-UFS的输出来表征文本时,学习算法也表现出其最高的性能.

References

[1]  Sun Maosong,Xiao Ming,Tsou B K. Chinese Word Segmentation without Using Dictionary Based on Unsupervised Learning Strategy. Chinese Journal of Computers,2004,27(6): 736-742 (in Chinese) (孙茂松,肖 明,邹嘉彦.基于无指导学习策略的无词表条件下的汉语自动分词.计算机学报,2004,27(6): 736-742)
[2]  Wang Sili,Wang Bin. A Chinese Overlapping Ambiguity Resolution Method Based on Coupling Degree of Double Characters. Journal of Chinese Information Processing,2007,21(5): 14-17 (in Chinese) (王思力,王 斌.基于双字耦合度的中文分词交叉歧义处理方法.中文信息学报,2007,21(5): 14-17)
[3]  Fei Hongxiao,Kang Songlin,Zhu Xiaojuan,et al. Chinese Word Segmentation Research Based on Statistic the Frequency of the Word. Computer Engineering and Applications,2005,30(7): 67-69 (in Chinese) (费洪晓,康松林,朱小娟,等.基于词频统计的中文分词的研究.计算机工程与应用,2005,30(7): 67-69)
[4]  Wang Fang,Wan Changxuan.Chinese Integrated Word Identification Based on Confidence. Journal of Chinese Information Processing,2009,23(3): 17-23 (in Chinese) (王 芳,万常选.基于可信度的中文完整词自动识别.中文信息学报,2009,23(3): 17-23)
[5]  He Saike,Wang Xiaojie,Dong Yuan,et al. Apply Normalized Accessory Variety in Chinese Word Segmentation. Journal of Chinese Information Processing,2010,24(1): 15-19 (in Chinese) (何赛克,王小捷,董 远,等.归一化的邻接变化数方法在中文分词中的应用.中文信息学报,2010,24(1): 15-19)
[6]  Jiang Jianhong,Zhao Songzheng. Luo Mei. Analysis and Application of Chinese Word Segmentation Model which Consist of Dictionary and Statistics Method. Computer Engineering and Design,2012,33(1): 387-391 (in Chinese) (蒋建洪,赵嵩正,罗 玫.词典与统计方法结合的中文分词模型研究及应用.计算机工程与设计,2012,33(1): 387-391)
[7]  Mitra P,Murthy C A,Pal S K. Unsupervised Feature Selection Using Feature Similarity. IEEE Trans on Pattern Analysis and Machine Intelligence,2002,24(3): 301-312
[8]  He Zhongshi,Xu Zhejun. A New Method Unsupervised Feature Selection for Text Mining. Journal of Chongqing University: Natural Science Edition,2007,30(6): 77-79 (in Chinese) (何中市,徐浙君.一种新型的文本无监督特征选择方法.重庆大学学报:自然科学版,2007,30(6): 77-79)
[9]  Liu Tao,Wu Gongyi,Chen Zheng. An Effective Unsupervised Feature Selection Method for Text Clustering. Journal of Computer Research and Development,2005,42(3): 381-386 (in Chinese) (刘 涛,吴功宜,陈 正.一种高效的用于文本聚类的无监督特征选择算法.计算机研究与发展,2005,42(3): 381-386)
[10]  Zhu Haodong,Li Hongchan,Zhong Yong. New Unsupervised Feature Selection Method. Journal of University of Electronic Science and Technology of China,2010,39(3): 412-415 (in Chinese)(朱颢东,李红婵,钟 勇.新颖的无监督特征选择方法.电子科技大学学报,2010,39(3): 412-415)
[11]  Ye Fei,Luo Jingqing,Yu Zhifu. Unsupervised Feature Selection Algorithm Based on Center Distance Ratio Principle. Computer Engineering and Applications,2009,45(4): 162-164 (in Chinese)(叶 菲,罗景青,俞志富.基于中心距离比值准则的无监督特征选择算法.计算机工程与应用,2009,45(4): 162-164)
[12]  Wang Lianxi,Jiang Shengyi. Unsupervised Feature Selection Method for Categorical Features. Journal of Chinese Computer Systems,2011,32(1): 47-50 (in Chinese)(王连喜,蒋盛益.面向分类特征的无监督特征选择方法研究.小型微型计算机系统,2011,32(1): 47-50)
[13]  Guyon I,Elisseeff A. An Introduction to Variable and Feature Selection. Journal of Machine Learning Research,2003,27(3):1157-1182
[14]  Church K,Gale W,Hanks P,et al. Using Statistics in Lexical Analysis // Uri Zernik. Lexical Acquisition: Exploiting On-line Resources to Build a Lexicon. Hillsdale,USA: Lawrence Erlbaum Associates,1991: 115-164
[15]  He Min,Gong Caichun,Zhang Huaping,et al. Method of New Word Identification Based on Lager-Scale Corpus. Computer Engineering and Applications,2007,43(21): 157-159 (in Chinese)(贺 敏,龚才春,张华平,等.一种基于大规模语料的新词识别方法.计算机工程与应用,2007, 43(21): 157-159)
[16]  Liu Tao,Liu Shengping,Chen Zheng,et al. An Evaluation on Feature Selection for Text Clustering // Proc of the 12th International Conference on Machine Learning. Washington,USA,2003: 488-495
[17]  Yang Yiming,Pedersen J O. A Comparative Study on Feature Selection in Text Categorization // Proc of the 14th International Conference on Machine Learning. San Francisco,USA,1997: 412-420
[18]  Peng Hanchuan,Long Fuhui,Ding Chris. Feature Selection Based on Mutual Information: Criteria of Max-Dependency,Max Relevance,and Min-Redundancy. IEEE Trans on Pattern Analysis and Machine Intelligence,2005,27(8): 1226-1238

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133