All Title Author
Keywords Abstract

-  2018 

Efficient multiple sets intersection using SIMD instructions

DOI: 10.6040/j.issn.1671-9352.1.2017.040

Keywords: 求交算法,并行处理,倒排索引,性能评价,
inverted index
,vectorized processing,performance evaluation,set intersection

Full-Text   Cite this paper   Add to My Lib


摘要: 布尔查询中的求交操作被广泛应用于各种信息系统中,是进行文档检索的基本操作之一。其基本形式可以视作多个有序整数序列的交集问题,而提高求交运算的效率是当前研究的重点。在传统求交算法的基础上,利用单指令多数据流(single instruction multiple data, SIMD)并行指令集,针对其核心的搜索步骤,提出了两种基于SIMD的跳跃式搜索算法。该算法在提高性能的同时,能有效适配在传统多倒排链求交算法中。实验证明,优化后的算法相比未使用SIMD的情况下有了很大的提升,甚至优于SIMD优化后的两两相交算法,性能最高提升37.3%。
Abstract: Conjunctive Boolean query is one fundamental operation for document retrieval and widely used in many information systems and databases. In its most basic and popular form, a conjunctive query can be seen as the intersection problem of multiple sets of sorted integers, and how to improve its efficiency is becoming one important research highlight. Based on the traditional intersection algorithms, this paper proposes two optimizations on the essential searching algorithms using SIMD instructions. The optimized search algorithms are able to be adopted into various multiple sets intersection methods while improving intersection efficiency. Experiments show that the optimized algorithms performs much better than the traditional ones, even outperform the recent SIMD intersection algorithms,and the improvement is up to 37.3% at most


[1]  ZOBEL J, MOFFAT A. Inverted files for text search engines[J]. ACM Computing Surveys, 2006, 38(2):6.
[2]  BAEZA-YATES R. Afast set intersection algorithm for sorted sequences[C] // Proceedings of the 15th Annual Symposium on Combinatorial Pattern Matching. Berlin: Springer-Verlag, 2004:400-408.
[3]  INOUE H, OHARA M, TAURA K. Faster set intersection with SIMD instructions by reducing branch mispredictions[J]. Proceedings of the Vldb Endowment, 2014, 8(3):293-304.
[4]  BO?A V. Experimental comparison of set intersection algorithms for inverted indexing[C/OL] // ITAT Proceedings, CEUR Workshop Proceedings. 2013, 1003: 58-64.[2017-02-05].
[5]  SANDERS P, TRANSIER F. Intersection ininteger inverted indices[C] //Proceedings of the 9th Workshop on Algorithm Engineering and Experiments/4th Workshop on Analytic Algorithmics and Combinatorics.Philadelphia:SIAM, 2007, Article 71.
[6]  JéRéMY BARBAY, LU T, SALINGER A. An experimental investigation of set intersection algorithms for text searching [J]. Journal of Experimental Algorithmics, 2009, 14:37. DOI:10.1145/1498698.1564507.
[7]  NAVARROG, PUGLISI S J. Dual-sorted inverted lists[C] // Proceedings of the 17th International Conference on String Processing and Information Retrieval. Berlin: Springer-Verlag, 2010:309-321.
[8]  TATIKONDAS, JUNQUEIRA F, CAMBAZOGLU B B, et al. On efficient posting list intersection with multicore processors[C] // Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2009:738-739.
[9]  TAKUMA D, YANAGISAWA H. Faster upper bounding of intersection sizes[C] //International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 2013:703-712.
[10]  DEMAINE E D, LóPEZ-ORTIZ A, MUNRO J I. Experiments on adaptive set intersections for text retrieval systems[C] //Proceedings of Revised Papers from the 3rd International Workshop on Algorithm Engineering and Experimentation. London: Springer-Verlag, 2001:91-104.
[11]  BARBAY J, PEZ-ORTIZ A, LU T. Faster adaptive set intersections for text searching[C] //Proceedings of the 5th International Workshop on Experimental Algorithms(WEA 2006). Berlin: Springer-Verlag, 2006: 146-157.
[12]  SCHLEGEL B, GEMULLA R, LEHNER W. Fast integer compression using SIMD instructions[C] // Proceedings of the 6th International Workshop on Data Management on New Hardware(DaMoN'10).New York: ACM, 2010:34-40.
[13]  LEMIRE D, BOYTSOV L, KURZ N. SIMD compression and the intersection of sorted integers[J]. Software Practice & Experience, 2014, 46(6):723-749.
[14]  CULPEPPER J S, MOFFAT A. Efficient set intersection for inverted indexing[J]. ACM Transactions on Information Systems, 2010, 29(1):1-25.
[15]  CULPEPPER J S, MOFFAT A. Compact set representation for information retrieval[C] //Proceedings of the 14th International Symposium on String Processing and Information Retrieval. Berlin: Springer-Verlag, 2007:137-148.
[16]  DING B. Fast set intersection in memory[J]. Proceedings of the VLDB Endowment, 2011, 4(4):255-266.
[17]  DEMAINE E D, LóPEZ-ORTIZ A, MUNRO J I. Adaptive set intersections, unions, and differences[C] // Proceedings of the 11th ACM-SIAM Symposium on Discrete Algorithms. New York: ACM, 2000:743-752.
[18]  BLELLOCHG E, REID-MILLER M. Fast set operations using treaps[C] // Proceedings of the 10th Annual ACM Symposium on Parallel Algorithms and Architectures. New York: ACM, 1998:16-26.
[19]  AO N, ZHANG F, WU D, et al. Efficient parallel lists intersection and index compression algorithms using graphics processing units[J]. Proceedings of the Vldb Endowment, 2011, 4(8):470-481.
[20]  闫宏飞, 张旭东, 单栋栋,等. 基于指令级并行的倒排索引压缩算法[J]. 计算机研究与发展, 2015, 52(5):995-1004. YAN Hongfei, ZHANG Xudong, SHAN Dongdong.SIMD-based inverted index compression algorithms[J]. Journal of Computer Research and Development, 2015, 52(5):995-1004.


comments powered by Disqus