全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...
-  2018 

大规模网络图中4节点子图数量快速估计算法
A Fast Mining Algorithm for 4??Node Graphlets in Large Graphs

DOI: 10.7652/xjtuxb201812009

Keywords: 网络图,采样算法,子图数量估计
1. Ministry of Education Key Lab for Intelligent Networks and Network Security
,Xi’an Jiaotong University,Xi’an 710049,China,2. Center of Network,Xi’an Jiaotong University,Xi’an 710049,China,3. School of Electronic and Information Engineering,Xi’an Jiaotong University,Xi’an 710049,China

Full-Text   Cite this paper   Add to My Lib

Abstract:

针对大规模网络图中存在的大量4节点子图数量难以精确统计分析的问题,提出了一种大规模网络图中4节点子图数量快速估计算法(SmartMoss)。该算法通过随机变量方差分析技术对比3路径采样算法(3PS)和中心3路径采样算法(C3PS)两种前沿算法的估计误差得出其各自不同的适用范围,进而通过计算被测网络图权重密度分布与误差实时选择使用3PS算法或C3PS算法对网络图中4节点子图进行快速采样,通过采样比例混合3PS算法与C3PS算法的估计结果实现对网络图中各4节点子图出现数量的快速估计。实验结果表明,在同等估计误差下提出的SmartMoss算法比已有3PS算法和C3PS算法快10倍以上。SmartMoss算法可以实现对大规模网络图中4节点子图数量进行快速准确的估计,同时为网络社团演化和恶意代码检测等实际应用提供一定的理论参考。
A fast method (SmartMoss) for efficiently estimating the frequencies of 4??node graph??lets in large??scale networks is proposed to address the challenge of exactly counting the frequencies of 4??node graphlets occurring in large??scale networks. SmartMoss compares the estimation errors of the two state??of??the??art algorithms, 3PS (3??Path Sampling) and C3PS (Center 3??Path Sampling), through random variables variance analysis to obtain their different application ranges, and then selects either 3PS algorithm or C3PS algorithm for 4??node graphlets sampling by calculating the distribution of network weight densities and real??time errors, to achieve quick estimation of the frequencies of 4??node graphlets by combing the estimation results of 3PS and C3PS based on the sampling proportions. Experimental results show that SmartMoss is more than 10 times faster than the state??of??the??art algorithms 3PS and C3PS under the same estimation error. SmartMoss efficiently and accurately estimates the frequencies of 4??node graph??lets for large network graphs, and provides theoretical reference for practical applications such as network community evolution and malicious code detection

References

[1]  YAN Yuliang, DONG Yihong, HE Xianmang, et al. FSMBUS: a frequent subgraph mining algorithm in single large??scale graph using spark [J]. Journal of Computer Research and Development, 2015, 52(8): 1768??1783.
[2]  [8]栾华, 周明全, 付艳. 多核处理器上的频繁图挖掘方法 [J]. 计算机研究与发展, 2015, 52(12): 2844??2856.
[3]  [9]PINAR Y, VISHWANATHAN S V N. Deep graph kernels [C]∥Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA: ACM, 2015: 1365??1374.
[4]  [10]CHEN K, WANG P, LEE Y, et al. Finding unknown malice in 10 seconds: mass vetting for new threats at the Google??play scale [C]∥Proceedings of the Usenix Conference on Security Symposium. New York, USA: USENIX Association, 2015: 659??674.
[5]  [13]WANG Pinghui, LUI J C S, RIBEIRO B, et al. Efficiently estimating motif statistics of large networks [J]. ACM Transactions on Knowledge Discovery from Data, 2014, 9(2): 1??27.
[6]  [1]SHAI S, RON M, SHMOOLIK M, et al. Network motifs in the transcriptional regulation network of escherichia coli [J]. Nature Genetics, 2002, 31(1): 64??68.
[7]  [2]谢莹, 吴建国, 李炜, 等. 基于gSpan算法的未知化合物毒性预测 [J]. 合肥工业大学学报(自然科学版), 2007, 30(10): 1278??1280.
[8]  XIE Ying, WU Jianguo, LI Wei, et al. Predicting the toxicity of unknown chemicals based on gSpan algorithm [J]. Journal of Hefei University of Technology (Natural Science), 2007, 30(10): 1278??1280.
[9]  [3]CHUN H, KWAK H, EOM Y H, et al. Comparison of online social relations in terms of volume vs interaction: a case study of Cyworld [C]∥Proceedings of the ACM SIGCOMM Conference on Internet Measurement. New York, USA: ACM, 2008: 57??70.
[10]  [4]KUNEGIS J, LOMMATZSCH A, BAUCKHAGE C. The slashdot zoo: mining a social network with negative edges [C]∥Proceedings of the 18th International Conference on World Wide Web. New York, USA: ACM, 2009: 741??750.
[11]  [5]ZHAO Junzhou, LUI J C S, TOWSLEY D, et al. Empirical analysis of the evolution of follower network: a case study on Douban [C]∥Proceedings of the 2011 IEEE Conference on Computer Communications Workshops. Piscataway, NJ, USA: IEEE, 2011: 941??946.
[12]  [6]UGANDER J, BACKSTROM L, KLEINBERG J. Subgraph frequencies: mapping the empirical and extremal geography of large graph collections [C]∥Proceedings of the 22nd Inlernational Conference on World Wide Web. New York, USA: ACM, 2013: 1307??1318.
[13]  [7]严玉良, 董一鸿, 何贤芒, 等. Fsmbus: 一种基于spark的大规模频繁子图挖掘算法 [J]. 计算机研究与发展, 2015, 52(8): 1768??1783.
[14]  LUAN Hua, ZHOU Mingquan, FU Yan. Frequent graph mining on multi??core processor [J]. Journal of Computer Research and Development, 2015, 52(12): 2844??2856.
[15]  [11]GASCON H, YAMAGUCHI F, ARP D, et al. Structural detection of android malware using embedded call graphs [C]∥Proceedings of the 2013 ACM Workshop on Artificial Intelligence and Security. New York, USA: ACM, 2013: 45??54.
[16]  [12]SHEN Tong, ZHONGYANG Yibing, XIN Zhi, et al. Detect android malware variants using component based topology graph [C]∥Proceedings of the 2014 13th IEEE International Conference on Trust, Security and Privacy in Computing and Communications. New York, USA: ACM, 2014: 406??413.
[17]  [14]JHA M, SESHADHRI C, PINAR A. Path sampling: a fast and provable method for estimating 4??vertex subgraph counts [C]∥Proceedings of the 24th International Conference on World Wide Web. New York, USA: ACM, 2015: 495??505.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133