OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

中国图象图形学报 2015

低资源语言的无监督语音关键词检测技术综述

DOI: 10.11834/jig.20150207

杨鹏,谢磊,张艳宁

Keywords: 语音关键词检测,低资源,动态时间规整

Full-Text Cite this paper Add to My Lib

Abstract:

目的低资源(low-resource)语言的无监督的关键词检测技术近年来引起了广泛的研究兴趣.低资源语言由于缺乏足够的标注数据及相关的专家知识,使得传统的基于大词汇量语音识别系统的关键词检测技术无法使用.近年来,研究者试图寻找一种无监督的技术来完成针对低资源语言的语音关键词检测.方法首先阐述了该技术目前面临的问题与挑战,然后介绍了该技术使用的主流的基于动态时间规整的算法框架,并从特征表示、模板匹配方法、效率提升等几个重要方面介绍了近几年来主要的研究成果,最后介绍了该任务常用的系统评价标准及目前所能达到的水平,讨论了未来可能的研究方向.结果该任务的研究目前取得了很多成果,但仍处于实验室阶段,多系统融合策略导致系统庞大,而且目前还没有好的进行索引的方法,导致检测时间过长,对于低资源语音的关键词检测技术,还有很多研究工作要做.结论期望通过对目前低资源语言的无监督的关键词检测技术做出一个全面的综述,从而给研究者的工作带来便利.

References

[1]	Saraclar M, Sproat R. Lattice-based search for spoken utterance retrieval[C]//Proceedings of the 2004 Conference of the North American Chapter of the Association for Computational Linguistics. North American: Association for Computational Linguistics, 2004:129-136.
[2]	Glass J. Towards unsupervised speech processing[C]//Proceedings of the 11th International Conference on Information Sciences, Signal Processing and their Applications. Montreal, USA:IEEE, 2012: 1-4.
[3]	Metze F, Rajput N, Anguera X, et al. The spoken WEB search task at mediaeval 2011[C]//Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. Kyoto, Japan:IEEE, 2012: 5165-5168.
[4]	Metze F, Anguera X, Barnard E, et al. The spoken web search task at mediaEval 2012[C]//Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. Vancouver, Canada:IEEE, 2013: 8121-8125.
[5]	Rodriguez-Fuentes L J, Varona A, Penagarikano M, et al. High-performance query-by-example spoken term detection on the SWS 2013 evaluation[C]//Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. Florence, Italy:IEEE, 2014: 7869-7873.
[6]	Abad A, Rodriguez-Fuentes L J, Pena-garikano M, et al. On the calibration and fusion of heterogeneous spoken term detection systems[C]//Proceedings of the 17th Annual Conference of the International Speech Communication Association. Lyon, France:ISCA, 2013.
[7]	Wang H, Lee T, Leung C C, et al. Using parallel tokenizers with DTW matrix combination for low-resource spoken term detection[C]//Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. Vancouver, Canada:IEEE, 2013: 8545-8549.
[8]	Wang H, Leung C C, Lee T, et al. An acoustic segment modeling approach to query-by-example spoken term detection[C]//Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. Kyoto, Japan:IEEE, 2012: 5157-5160.
[9]	Müller M. Dynamic time warping [J]. Information Retrieval for Music and Motion, 2007: 69-84.
[10]	Hazen T J, Shen W, White C. Query-by-example spoken term detection using phonetic posteriorgram templates[C]//Proceedings of the IEEE Automatic Speech Recognition & Understanding. Merona, Italy:IEEE, 2009: 421-426.
[11]	Zhang Y, Glass J R. Unsupervised spoken keyword spotting via segmental DTW on Gaussian posteriorgrams[C]//Proceedings of the IEEE International Workshop on Automatic Speech Recognition & Understanding. Merano, Italy:IEEE, 2009: 398-403.
[12]	Lee C, Glass J. A nonparametric bayesian approach to acoustic model discovery[C]//Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics. Jeju island, Korea:IEEE, 2012: 40-49.
[13]	Yang P, Xie L. Mandarin speech pattern discovery using segmental dynamic time warping and posteriorgram features [J]. Journal of Tsinghua University (Sci. & Tech.), 2013, 53(6):903-907.[杨鹏,谢磊.基于动态时间规整和后验特征的中文语音模式发现[J]. 清华大学学报: 自然科学版,2013,53(6):903-907].
[14]	Jansen A, Niyogi P. Intrinsic spectral analysis [J]. IEEE Transaction on Signal Processing, 2013, 61(7):1698-1710.
[15]	Yang P, Xie L, Leung C C, et al. Intrinsic spectral analysis based on temporal context features for query by example spoken term detection[C]//Proceedings of the 18th Annual Conference of the International Speech Communication Association. Singapore:ISCA, 2014, 1722-1726.
[16]	更多...
[17]	Anguera X, Ferrarons M. Memory efficient subsequence DTW for query-by-example spoken term detection[C]//Proceedings of the IEEE International Conference on Multimedia Association. Singapore:IEEE, 2011: 1909-1912.
[18]	Chelba C, Hazen T J, Sarac,lar M. Retrieval and browsing of spoken content[J]. IEEE Signal Process. Mag., 2008, 25(3): 39-49.
[19]	Zhang Y, Salakhutdinov R, Chang H A, et al. Resource configurable spoken query detection using deep Boltzmann machines[C]//Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. Kyoto, Japan:IEEE, 2012: 5161-5164.
[20]	Wilpon J, Juang B, Rabiner L. An investigation on the use of acoustic sub-word units for automatic speech recognition[C]//Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing. Texas, USA:IEEE, 1987: 821-824.
[21]	Qiao Y, Shimomura N, Minematsu N. Unsupervised optimal phoneme segmentation: objectives, algorithm and comparisons [C]//Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. Las Vegas, USA:IEEE, 2008: 3989-3992.
[22]	Gales M J F. Maximum likelihood linear transformations for HMM-based speech recognition [J]. Computer Speech & Language, 1998, 12(2): 75-98.
[23]	Zhang Y, Adl K, Glass J. Fast spoken query detection using lower-bound dynamic time warping on graphical processing units[C]//Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. Kyoto, Japan:IEEE, 2012: 5173-5176.
[24]	Yang P, Xie L, Luan Q, et al. A tighter lower bound estimate for dynamic time warping[C]//Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. Vancouver, Canada:IEEE, 2013: 8525-8529.
[25]	Mantena G, Achanta S, Prahallad K. Query-by-Example spoken term detection using frequency domain linear prediction and non-segmental dynamic time warping [J]. IEEE Transaction on Audio, Speech, and Language Processing, 2013, 22(5): 946-955.
[26]	Chan C, Lee L. Integrating frame-based and segment-based dynamic time warping for unsupervised spoken term detection with spoken queries[C]//Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. Prague, Czech:IEEE, 2011: 5652-5655.
[27]	Mantena G, Anguera X. Speed improvements to information retrieval-based dynamic time warping using hierarchical k-means clustering[C]//Proceedings of the IEEE International Conference on Acoustic, Speech and Signal Processing. Vancouver, Canada:IEEE, 2013:

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133