%0 Journal Article %T 基于LSH技术的试题相似度检测方法
The Application of LSH Technology in Similar Question Detection %A 陈瑞 %A 王松 %A 梅莹 %A 杨云源 %J Computer Science and Application %P 741-748 %@ 2161-881X %D 2020 %I Hans Publishing %R 10.12677/CSA.2020.104077 %X
试题内容重复率是评价试题库及试卷质量的重要指标之一,为了快速找出题库中的相似试题,本文主要研究了基于K-shingles的Jaccard相似度、MinHash和LSH技术应用于相似试题的检测方法。此方法首先将题干内容进行中文分词,进行适当处理后转换成K-shingle集,通过MinHash计算出签名,最后使用LSH技术快速地找出候选相似试题对并计算出相应的Jaccard相似度,若该相似度大于给定的阈值,则发现相似试题。该方法通过在题库系统中的使用,充分验证了该方法的可行性,达到了很好的效果。
The repetition rate of test questions is one of the important indexes to evaluate the quality of test questions and test papers. In order to quickly find out similar questions in the test bank, this paper mainly studies the detection methods of similar questions based on K-shingles, Jaccard similarity, MinHash and LSH technology. First of all, the main content of the question is segmented into Chinese words, then converted into K-shingle set after proper processing, and the signature is calculated by MinHash. Finally, LSH technology is used to quickly find out the candidate pairs of similar questions and calculate the corresponding Jaccard similarity. If the similarity is greater than the given threshold, similar questions are found. Experiments prove to be practicable and effective.
%K 试题查重,LSH算法,Jaccard相似度,K-shingle
Examination Checking %K LSH Algorithm %K Jaccard Similarity %K K-Shingle %U http://www.hanspub.org/journal/PaperInformation.aspx?PaperID=35242