全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...
-  2018 

开源代码仓库增量分析方法
Incremental analysis of open source repositories

DOI: 10.16511/j.cnki.qhdxxb.2018.25.029

Keywords: 开源代码,程序分析,增量分析,代码仓库,
open source
,program analysis,incremental parsing,code repository

Full-Text   Cite this paper   Add to My Lib

Abstract:

代码溯源是开源软件复用中的常见实践,溯源过程依赖于高效的程序分析方法支撑。现有的程序分析方法主要识别完整的语法结构,分析时间依赖于整体代码规模,缺乏增量分析能力,难以满足大规模开源代码仓库的高效分析需求。针对开源代码仓库中相邻快照间高度相似的特点,该文提出了一种有效的增量分析方法,仅对快照中变更的代码进行分析,从而有效减少分析规模。首先解析文件快照获得历次代码的修改内容,其次设计映射算法将上述修改内容映射成完整的、可分析的函数,最后将上述函数转化为指纹进行函数比对。与传统分析方法相比,该文方法有效减少了开源代码仓库的分析规模,加快了函数比对速度,能更好地支撑代码溯源等开源软件复用需求。
Abstract:Code traceability is a common practice for reusing open source software which relies heavily on efficient code analysis methods. Existing methods mainly identify complete grammatical structures with the analysis time depending on the total code size, so they lack the ability to do incremental analyses and cannot be used to analyze large open source code repositories. An incremental analysis method was developed here to analyze only the changed parts in code repositories based on the similarity between adjacent snapshots to effectively reduce the analysis scale. The method first parses snapshots to retrieve the modified content between snapshots and then maps these modifications into complete, analyzable functions. These functions are then converted to fingerprints for comparisons. This method significantly reduces the scale of the open source code repositories compared with traditional analysis methods to speed up function comparisons for better traces of the origin of open source codes.

References

[1]  GITHUB INC. GitHub official website[EB/OL].[2017-11-16]. https://www.github.com/.
[2]  BLACK DUCK SOFTWARE INC. OpenHub official website[EB/OL].[2017-11-20]. https://www.openhub.net/.
[3]  GERMAN D, PENTA M D. A method for open source license compliance of java applications[J]. IEEE Software, 2012, 29(3):58-63.
[4]  GERMAN D M, HASSAN A E. License integration patterns:Addressing license mismatches in component-based development[C]//Proceedings of the 31st International Conference on Software Engineering. Vancouver, British Columbia, Canada:IEEE, 2009:188-198.
[5]  UDDIN M S, ROY C K, SCHNEIDER K A, et al. On the effectiveness of simhash for detecting near-miss clones in large scale software systems[C]//Proceedings of the 18th Working Conference on Reverse Engineering. Lero, Limerick, Ireland:IEEE, 2011:13-22.
[6]  TIOBE. TIOBE index[EB/OL].[2017-10-27]. http://www.tiobe.com/tiobe-index/.
[7]  REDISSON. Redisson official website[EB/OL].[2017-11-13]. https://redisson.org.
[8]  APACHE SOFTWARE FOUNDATION. Apache license v2[EB/OL].[2017-10-12]. http://www.apache.org/licenses/license-2.0.
[9]  吴斐, 唐雁. 基于N-gram的程序代码抄袭检测方法研究[D]. 重庆:西南大学, 2012.WU F, TANG Y. Research of source code plagiarism detection method based on N-gram[D]. Chongqing:Southwest University, 2012. (in Chinese)
[10]  SONATYPE INC. Sonatype official website[EB/OL].[2017-11-10]. https://www.sonatype.com/.
[11]  金芝, 周明辉, 张宇霞. 开源软件与开源软件生态:现状与趋势[J]. 科技导报, 2016, 34(14):42-48.JIN Z, ZHOU M H, ZHANG Y X. Open source software and its eco-systems:Today and tomorrow[J]. Science & Technology Review, 2016, 34(14):42-48. (in Chinese)
[12]  MATHUR A, CHOUDHARY H, VASHIST P, et al. An empirical study of license violations in open source projects[C]//Proceedings of the 35th Software Engineering Workshop. Heraklion, Crete, Greece:IEEE, 2012:168-176.
[13]  SCHWARZ N, LUNGU M, ROBBES R. On how often code is cloned across repositories[C]//Proceedings of the 34th International Conference on Software Engineering. Zurich, Switzerland:IEEE, 2012:1289-1292.
[14]  夏杨添. 论计算机软件专利制度同源代码开放的冲突与协调[D]. 北京:中国政法大学, 2008.XIA Y T. On the conflict and coordination between the open source of computer software patent system[D]. Beijing:China University of Political Science and Law, 2008. (in Chinese)
[15]  PALAMIDA INC. Palamida official website[EB/OL].[2017-11-20]. http://www.palamida.com/.
[16]  ESTUBLIER J. Software configuration management:A roadmap[C]//Proceedings of the Conference on the Future of Software Engineering. Limerick, Ireland:ACM, 2000:279-289.
[17]  APACHE SOFTWARE FOUNDATION. Apache subversion official website[EB/OL].[2017-10-25]. http://subversion.apache.org/.
[18]  WIKIMEDIA FOUNDATION. Diff[EB/OL].[2017-11-13]. https://en.wikipedia.org/wiki/Diff.
[19]  GNU. Detailed description of unified format[EB/OL].[2017-11-4]. http://www.gnu.org/software/diffutils/manual/html_node/Detailed-Unified.html#Detailed-Unified.
[20]  KRINKE J. Identifying similar code with program dependence graphs[C]//Proceedings of the 8th Working Conference on Reverse Engineering. Stuttgart, Germany:IEEE, 2001:301-309.
[21]  ROY C K, CORDY J R. An empirical study of function clones in open source software[C]//Proceedings of the 15th Working Conference on Reverse Engineering. Antwerp, Belgium:IEEE, 2008:81-90.
[22]  PAULS A, DAN K. Faster and smaller N-gram language models[C]//Proceedings of the Meeting of the Association for Computational Linguistics:Human Language Technologies. Portland, Oregon, USA:DBLP, 2011:258-267.
[23]  CRAW S. Manhattan distance[M]. New York, USA:Springer, 2011.
[24]  WIKIMEDIA FOUNDATION. GNU general public license[EB/OL].[2017-11-10]. http://en.wikipedia.org/wiki/GNU_General_Public_License.
[25]  BROWN E. Cisco sued for Linksys GPL violation[EB/OL].[2017-11-18]. http://linuxdevices.linuxgizmos.com/cisco-sued-for-linksys-gpl-violation/.
[26]  VLASENKO D. BusyBox official website[EB/OL].[2017-10-12]. http://www.busybox.net.
[27]  WIKIMEDIA FOUNDATION. Oracle America, Inc. v. Google, Inc.[EB/OL].[2017-11-20]. https://en.wikipedia.org/wiki/Oracle_America,_Inc._v._Google,_Inc./.
[28]  BOUGHANMI F. Multi-language and heterogeneously-licensed software analysis[C]//Proceedings of the 17th Working Conference on Reverse Engineering. Beverly, MA, USA:IEEE Computer Society, 2010:293-296.
[29]  BLACK DUCK SOFTWARE INC. Black duck software official website[EB/OL].[2017-11-12]. http://www.blackducksoftware.com/.
[30]  BAKER B S. A program for identifying duplicated code[J]. Computing Science & Statistics, 1992, 24:49-57.
[31]  WISE M J. Detection of similarities in student programs:YAP'ing may be preferable to plague'ing[C]//Sigcse Technical Symposium on Computer Science Education. Kansas City, Missouri, USA:ACM, 1992:268-271.
[32]  RAHAL I, DEGIOVANNI J. Towards efficient source code plagiarism detection:An N-gram-based approach[C]//Proceedings of the 21st International Conference on Computer Applications in Industry and Engineering. Honolulu, Hawaii, USA:DBLP, 2008:174-179.
[33]  RAO M A N, STEVENSON M, CLOUGH P. University of Sheffield:Lab report for PAN at CLEF 2010[C]//Proceedings of the 4th International Workshop on Uncovering Plagiarism Authorship, and Social Software Misuse. Padua, Italy:DBLP, 2010:9-16.
[34]  BAXTER I D, YAHIN A, MOURA L, et al. Clone detection using abstract syntax trees[C]//Proceedings of the International Conference on Software Maintenance. Bethesda, MD, USA:IEEE, 2002:368-377.
[35]  KOSCHKE R, FALKE R, FRENZEL P. Clone detection using abstract syntax suffix trees[C]//Proceedings of the 13th Working Conference on Reverse Engineering. Benevento, Italy:IEEE, 2006:253-262.
[36]  CUI B, LI J, GUO T, et al. Code comparison system based on abstract syntax tree[C]//Proceedings of the 3rd IEEE International Conference on Broadband Network and Multimedia Technology. Beijing, China:IEEE, 2011:668-673.
[37]  LIU C, CHEN C, HAN J, et al. GPLAG:Detection of software plagiarism by program dependence graph analysis[C]//Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Philadelphia, PA, USA:ACM, 2006:872-881.
[38]  KOMONDOOR R, HORWITZ S. Using slicing to identify duplication in source code[C]//Proceedings of the 8th International Symposium on Static Analysis. Paris, France:Springer-Verlag, 2001:40-56.
[39]  HOTTA K, HIGO Y, KUSUMOTO S. Identifying, tailoring, and suggesting form template method refactoring opportunities with program dependence graph[C]//Proceedings of the 16th European Conference on Software Maintenance and Reengineering. Szeged, Hungary:IEEE, 2012:53-62.
[40]  CHANG H F, MOCKUS A. Constructing universal version history[C]//Proceedings of the 28th International Conference on Software Engineering. Shanghai, China:ACM, 2006:76-79.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133