|
- 2016
采用用户名相似度传播模型的线上用户身份 属性关联方法
|
Abstract:
针对用户跨线上行为复杂多样难以融合监控的问题,提出了基于用户名相似度传播模型的线上用户身份属性关联方法。结合中文社交网络中用户名的特征,将用户名中的中英文字符进行分离,并采用贪婪算法分别求取不同用户名之间的中英文字符串的最大公共子串,以此实现含中英文字符的用户名相似度的计算;结合用户线上的好友结构网络,仅利用一阶邻居的用户名相似度求解用户对的匹配度,由此不但实现了用户名相似度沿网络结构的快速传播,也大幅度地降低了匹配算法的计算复杂度。结合所收集的新浪微博和人人网中用户身份属性数据的实验结果表明:新提出的字符串匹配算法将用户名匹配准确率提升了近30%,传播模型也大幅度地减少了用户名匹配的计算量,分析结果不但可以实现用户跨线上应用行为的关联融合,也对网络舆论控制和行为监管具有重要的参考价值。
A user identity attribute correlation method is proposed to focus on the problem that behaviors of online users are hard for fusion and supervision among multi??online applications due to their complexity and variation. The method is based on a propagation model of username similarities. The English and Chinese characters in usernames are separated by considering the characteristics of username in the Chinese social networks. A greedy algorithm is used to extract the longest common sequence between English and Chinese characters respectively for different usernames, and then username similarities are calculated. User’s online connection structure and the username similarities of their first??order neighbors are used to decide the matching degree of the selected user pairs. Hence, not only the username similarity is propagated quickly among the connection networks, but also the complexity of matching calculation is greatly reduced. Experimental results based on the datasets collected from Sina Microblog and Renren networks show that the proposed algorithm improves the matching accuracy of usernames by about 30%, and the propagation model greatly reduces the calculation complexity of username similarities. The analysis results achieve the goal of user’s behavior fusion among different online applications, and have a reference value for online network security management and user’s online behavior supervision
[1] | [1]JOHANSSON F, KAATI L, SHRESTHA A. Detecting multiple aliases in social media [C]∥Proceedings of the International Conference on Advances in Social Networks Analysis and Mining. Piscataway, NJ, USA: IEEE Communication Society, 2013: 1004??1011. |
[2] | [2]AN Ning, JIANG Lili, WANG Jianyong, et al. Toward detection of aliases without string similarity [J]. Information Sciences, 2014, 261(10): 89??100. |
[3] | [3]ANWAR T, ABULAISH M. Namesake alias mining on the Web and its role towards suspect tracking [J]. Information Sciences, 2014, 276(20): 123??145. |
[4] | [4]LIU Zhaoli, QIN Tao, GUAN Xiaohong, et al. Alias detection across multi??online applications based on user’s behavior characteristics [C]∥Proceedings of the 2015 IEEE Trustcom. Piscataway, NJ, USA: IEEE Computer Society, 2015: 1154??1159. |
[5] | [5]KUMAR S, ZAFARANI R, LIU Huan. Understanding user migration patterns in social media [C]∥Proceedings of the 25th AAAI Conference on Artificial Intelligence. New York, USA: ACM, 2011: 1??6. |
[6] | [6]BEKKERMAN R, MCCALLUM A. Disambiguating Web appearances of people in a social network [C]∥Proceedings of the International Conference on World Wide Web. New York, USA: ACM, 2005: 463??470. |
[7] | [7]HIRSCHBERG D S. Algorithms for the longest common subsequence problem [J]. Journal of the ACM, 1977, 24(4): 664??675. |
[8] | [8]WANG Chenxu, GUAN Xiaohong, QIN Tao. Who are active? an in??depth measurement on user activity characteristics in Sina Microblog [C]∥Proceedings of the 2012 IEEE Global Communications Conference. Piscataway, NJ, USA: IEEE Communication Society, 2012: 2083??2088. |
[9] | [9]LEVENSHTEIN V I. Binary codes capable of correcting deletions, insertions and reversals [J]. Soviet Physics Doklady, 1965, 163(4): 845??848. |
[10] | [10]AHO A V, HIRSCHBERG D S, ULLMAN J D. Bounds on the complexity of the longest common subsequence problem [J]. Journal of the ACM, 1976, 23(1): 104??109. |
[11] | [11]QIN Tao, ZHAO Dan, ZHU Min, et al. Mapping different online behaviors to physical user for comprehensive knowledge??pushing services [C]∥Proceedings of the IEEE International Conference on Communications. Piscataway, NJ, USA: IEEE Communication Society, 2014: 671??675. |
[12] | [12]WISE M J. String similarity via greedy string tiling and running KarpRabin matching: 463 [R]. Sydney, Australia: University of Sydney. Department of Computer Science, 1993. |