全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...
科技导报  2014 

政府网站移动搜索的日志挖掘和个性化改进

DOI: 10.3981/j.issn.1000-7857.2014.36.018, PP. 110-116

Keywords: 个性化搜索,个性化推荐,聚类分析,MapReduce

Full-Text   Cite this paper   Add to My Lib

Abstract:

为充分利用移动搜索和政府网站的特点,发挥Hadoop处理大数据的优势,设计开发了日志挖掘和个性化定制系统。利用Flume和HDFS实现了海量日志的汇总和存储,为日志挖掘提供了数据源和调用接口;采用MapReduce实现了对日志的高效分析,利用搜索结果网页的标签和导航,建立了网页向量空间模型和用户兴趣模型;根据用户兴趣模型,使用聚类分析中的K-means算法将有相似兴趣的用户组成兴趣组;通过计算搜索结果网页到用户所在兴趣组的距离,判断用户对该网页是否感兴趣,据此调整搜索结果的排序,实现个性化搜索和推送功能。

References

[1]  中国互联网络信息中心. 第34 次中国互联网络发展状况统计报告[EB/OL]. 2014-07-21[2014-08-20]. http://www.cnnic.net.cn/hlwfzyj/hlwxzbg/hlwtjbg/201407/P020140721507223212132.pdf. China Internet Network Information Center. The 34th statistical report on internet development in China[EB/OL]. 2014-07-21[2014-08-20]. http://www.cnnic.net.cn/hlwfzyj/hlwxzbg/hlwtjbg/201407/P020140721507223212132.pdf.
[2]  王继民, 李雷明子, 郑玉凤. 基于日志挖掘的移动搜索用户行为研究 综述[J]. 情报理论与实践, 2014, 37(3): 134-139. Wang Jimin, Li Leimingzi, Zheng Yufeng. Review on mobile users search behavior based on Web log mining[J]. Information Studies: Theory & Application, 2014, 37(3): 134-139.
[3]  万飞, 赵溪, 梁循, 等. 基于移动互联网日志的搜索引擎用户行为研究[J]. 中文信息学报, 2014, 28(2): 144-150. Wan Fei, Zhao Xi, Liang Xun, et al. Research on search engine mobile Internet user behavior based on log[J]. Journal of Chinese Information Processing, 2014, 28(2): 144-150.
[4]  王振宇, 郭力. 基于Hadoop的搜索引擎用户行为分析[J]. 计算机工程 与科学, 2011, 33(4): 115-120. Wang Zhenyu, Guo Li. Search engine user behavior analysis based on Hadoop[J]. Computer Engineering & Science, 2011, 33(4): 115-120.
[5]  胡晓, 王理, 潘守慧. 基于改进VSM的Web文本分类方法[J]. 情报杂 志, 2010, 29(5): 144-147. Hu Xiao, Wang Li, Pan Shouhui. Web text classification method based on improved VSM[J]. Journal of Intelligence, 2010, 29(5): 144-147.
[6]  周炎涛, 唐剑波, 王家琴. 基于信息熵的改进TFIDF特征选择算法[J]. 计算机工程与应用, 2007, 43(35): 156-171. Zhou Yantao, Tang Jianbo, Wang Jiaqin. Improved TFIDF feature selection algorithm based on information entropy[J]. Computer Engineering and Applications, 2007, 43(35): 156-171.
[7]  李杉, 刘莉莉. 基于MapReduce的Web日志挖掘[J]. 计算机工程与应 用, 2012, 48(22): 95-98. Li Shan, Liu Lili. MapReduce log mining based on Web[J]. Computer Engineering and Applications, 2012, 48(22): 95-98.
[8]  Amresh K, Kiran M, Prathap B R. Verification and validation of mapreduce program model for parallel K-means algorithm on hadoop cluster [C]//2013 Fourth International Conference on Computing, Communications and Networking Technologies. Tiruchengode, India: IEEE, 2013: 274-282.
[9]  江小平, 李成华, 向文, 等. K-means聚类算法的MapReduce并行化实 现[J]. 华中科技大学学报: 自然科学版, 2011, 39(6): 120-124. Jiang Xiaoping, Li Chenghua, Xiang Wen, et al. Parallel implementation of K-means clustering algorithm MapReduce[J]. Journal of Huazhong University of Science and Technology: Natural Science Edition, 2011, 39 (6): 120-124.
[10]  周婷, 张君瑛, 罗成. 基于Hadoop的K-means聚类算法的实现[J]. 计 算机工程与发展, 2013, 23(4): 18-21. Zhou Ting, Zhang Junying, Luo Cheng. Realization of K-means clustering algorithm based on Hadoop[J]. Computer Technology and Development, 2013, 23(4): 18-21.
[11]  冀素琴, 石洪波. 基于MapReduce的K-means聚类集成[J]. 计算机工 程, 2013, 39(9): 84-87. Yi Suqin, Shi Hongbo. Clustering of K-means integration based on MapReduce[J]. Computer Engineering, 2013, 39(9): 84-87.
[12]  倪红军. 基于Android平台的消息推送研究与实现[J]. 实验室研究与 探索, 2014, 33(5): 96-100. Ni Hongjun. Research and implementation of push messages based on Android platform[J]. Research and Exploration in Laboratory, 2014, 38 (5): 96-100.
[13]  赵龙. 基于hadoop的海量搜索日志分析平台的设计和实现[D]. 大连: 大连理工大学, 2013. Zhao Long. The design and implementation of massive search logs analysis platform based on hadoop[D]. Dalian: Dalian University of Technology, 2013.
[14]  周婷婷. 基于海量查询日志的数据挖掘及用户行为分析[D]. 北京: 北 京邮电大学, 2012. Zhou Tingting. Data mining and user behavior analysis based on the massive query log[D]. Beijing: Beijing University of Posts and Telecommunications, 2012.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133