|
- 2015
基于随机森林分类的微博机器用户识别研究
|
Abstract:
摘要 针对网络上机器用户大量散布谣言, 发布虚假信息, 误导网民舆论, 严重影响网络环境的问题, 以微博中的机器用户为研究对象, 结合其自动化程度高、伪装能力强、信息发布有针对性的特点, 从行为模式、微博内容、用户关系和发布平台4个维度分析机器用户的特征指标, 利用信息熵、内容重复率等8个指标构建微博用户的特征向量, 通过随机森林算法设计微博中机器用户的识别模型。最后, 在真实的新浪微博数据集上进行验证, 结果表明本模型识别机器用户的准确度达到96.7%, 可以有效地区分微博中的机器用户和普通用户。
Abstract Bot-users spread rumors or fake information widely, misleading the public opinion, seriously affecting the normal network environment. Taking Weibo bot-users as main focus, considering their high-level automation, strong disguise power and targeted ability to release, a four-dimensional characteristic index of information entropy, content repetition rate, reputation, mutural, mention ratio, comment ratio, message and numofplatform is proposed to construct a feature vector and an identification model based on random forest algorithm is designed to recognize the bot-users. Finally, the Sina Weibo set are used to verify the efficiency and effectiveness of the model, with the accuracy of 96.7%. The result shows that the model is good at distinguishing the bot-users from ordinary users.