OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

模式识别与人工智能 2011

基于查询扩展的中文语音高效检索

, PP. 561-566

李伟吴,吕萍

Keywords: 中文语音检索,分词,查询扩展,有穷自动机,基于令牌的搜索

Full-Text Cite this paper Add to My Lib

Abstract:

中文语音检索系统用于快速准确地在中文语音文档中定位用户查询。典型实现方案对语音文档进行识别后建立索引，对查询串进行分词并以分词结果检索。检索过程中出现的查询分词与识别结果不匹配将影响系统性能。为解决该问题，产生多种查询分词结果，并对其进行前后缀扩展后检索。为解决因扩展带来的检索内容过多，用时较长的问题，引入有穷自动机压缩扩展，在此基础上设计基于令牌的搜索算法高效检索。实验证明，对查询的多分词与前后缀扩展可以使检索EER相对提升50%~70%，引入FSA可压缩检索空间，使得检索速度提升近30倍。

References

[1]	National Institute of Standards and Technology.The Spoken Term Detection (STD) 2006 Evaluation Plan [EB/OL].[2010-7-5].http://www.itl.nist.gov/iad/mig//tests/std/2006/docs/std06-evalplan-v10.pdf
[2]	Fiscus J G,Ajot J,Garofolo J S,et al.Results of the 2006 Spoken Term Detection Evaluation // Proc of the SIGIR Workshop on Search Spontaneous Conversational Speech.Amsterdam,Netherlands,2007: 51-56
[3]	Mamou J,Ramabhadran B,Siohan O.Vocabulary Independent Spoken Term Detection // Proc of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.Amsterdam,Netherlands,2007: 615-622
[4]	Parlak S,Saraclar M.Spoken Term Detection for Turkish Broadcast News // Proc of the IEEE International Conference on Acoustics,Speech and Signal Processing.Las Vegas,USA,2008: 5244-5247
[5]	Mertens T,Schneider D.Efficient Subword Lattice Retrieval for German Spoken Term Detection // Proc of the IEEE International Conference on Acoustics,Speech and Signal Processing.Taipei,China,2009: 4885-4888
[6]	Wallace R,Vogt R,Sridharan S.Spoken Term Detection Using Fast Phonetic Decoding // Proc of the IEEE International Conference on Acoustics,Speech and Signal Processing.Taipei,China,2009: 4881-4884
[7]	Ni Chongjia,Liu Wenju,Xu Bo.Research on Large Vocabulary Continuous Speech Recognition System for Mandarin Chinese.Journal of Chinese Information Processing,2009,23(1): 112-123 (in Chinese)(倪崇嘉,刘文举,徐波.汉语大词汇量连续语音识别系统研究进展.中文信息学报,2009,23(1): 112-123)
[8]	Meng Sha,Yu Peng,Seide F,et al.A Study of Lattice-Based Spoken Term Detection for Chinese Spontaneous Speech // Proc of the IEEE Workshop on Automatic Speech Recognition and Understanding.Kyoto,Japan,2007: 635-640
[9]	Meng Sha,Liu Jia.Out-of-Vocabulary Issue in Chinese Spoken Term Detection and a Two-Stage Chinese Speech Retrieval Method.Journal of Chinese Information Processing,2009,23(6): 91-97 (in Chinese)(孟莎,刘加.汉语语音检索的集外词问题与两阶段检索方法.中文信息学报,2009,23(6): 91-97)
[10]	Liu Hongzhi. Research on Chinese Word Segmentation Techniques.Computer Development Applications,2010,23(3): 1-3 (in Chinese)(刘红芝.中文分词技术的研究.电脑开发与应用,2010,23(3): 1-3)
[11]	Hopcroft J E,Motwani R,Ullman J D.Introduction to Automata Theory,Languages,and Computation.2nd Edition.New Jersey,USA: Addison-Wesley,2001
[12]	Wan Jiancheng,Yang Chunhua.An Algorithm Model of Word Omni-Segmentation for Written Chinese.Mini-Micro Systems,2003,24(7): 1247-1251 (in Chinese)(王建成,杨春花.书面汉语的全切分分词算法模型.小型微型计算机系统,2003,24(7): 1247-1251)
[13]	Li Wei,Wu Ji,Wang Zhiguo.Fast Lattice Generation Algorithm.Journal of Tsinghua University (Science and Technology),2009,49(SI): 1254-1257 (in Chinese)(李伟,吴及,王智国.一种快速的语音识别词图生成算法.清华大学学报(自然科学版),2009,49(SI): 1254-1257)
[14]	Martin A,Doddington G,Kamm T,et al.The DET Curve in Assessment of Detection Task Performance // Proc of the 5th European Conference on Speech Communication and Technology.Rhodes,Greece,1997: 1895-1898
[15]	Griaule Biometrics.Equal Error Rate (EER) [EB/OL].[2011-2-17].

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133