Efficient multiple sets intersection using SIMD instructions

DOI: 10.6040/j.issn.1671-9352.1.2017.040

Keywords: 求交算法,并行处理,倒排索引,性能评价,
inverted index
,vectorized processing,performance evaluation,set intersection

摘要: 布尔查询中的求交操作被广泛应用于各种信息系统中,是进行文档检索的基本操作之一。其基本形式可以视作多个有序整数序列的交集问题,而提高求交运算的效率是当前研究的重点。在传统求交算法的基础上,利用单指令多数据流(single instruction multiple data, SIMD)并行指令集,针对其核心的搜索步骤,提出了两种基于SIMD的跳跃式搜索算法。该算法在提高性能的同时,能有效适配在传统多倒排链求交算法中。实验证明,优化后的算法相比未使用SIMD的情况下有了很大的提升,甚至优于SIMD优化后的两两相交算法,性能最高提升37.3%。
Abstract: Conjunctive Boolean query is one fundamental operation for document retrieval and widely used in many information systems and databases. In its most basic and popular form, a conjunctive query can be seen as the intersection problem of multiple sets of sorted integers, and how to improve its efficiency is becoming one important research highlight. Based on the traditional intersection algorithms, this paper proposes two optimizations on the essential searching algorithms using SIMD instructions. The optimized search algorithms are able to be adopted into various multiple sets intersection methods while improving intersection efficiency. Experiments show that the optimized algorithms performs much better than the traditional ones, even outperform the recent SIMD intersection algorithms,and the improvement is up to 37.3% at most


