Saraclar M, Sproat R. Lattice-based search for spoken utterance retrieval[C]//Proceedings of the 2004 Conference of the North American Chapter of the Association for Computational Linguistics. North American: Association for Computational Linguistics, 2004:129-136.
[2]
Glass J. Towards unsupervised speech processing[C]//Proceedings of the 11th International Conference on Information Sciences, Signal Processing and their Applications. Montreal, USA:IEEE, 2012: 1-4.
[3]
Metze F, Rajput N, Anguera X, et al. The spoken WEB search task at mediaeval 2011[C]//Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. Kyoto, Japan:IEEE, 2012: 5165-5168.
[4]
Metze F, Anguera X, Barnard E, et al. The spoken web search task at mediaEval 2012[C]//Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. Vancouver, Canada:IEEE, 2013: 8121-8125.
[5]
Rodriguez-Fuentes L J, Varona A, Penagarikano M, et al. High-performance query-by-example spoken term detection on the SWS 2013 evaluation[C]//Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. Florence, Italy:IEEE, 2014: 7869-7873.
[6]
Abad A, Rodriguez-Fuentes L J, Pena-garikano M, et al. On the calibration and fusion of heterogeneous spoken term detection systems[C]//Proceedings of the 17th Annual Conference of the International Speech Communication Association. Lyon, France:ISCA, 2013.
[7]
Wang H, Lee T, Leung C C, et al. Using parallel tokenizers with DTW matrix combination for low-resource spoken term detection[C]//Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. Vancouver, Canada:IEEE, 2013: 8545-8549.
[8]
Wang H, Leung C C, Lee T, et al. An acoustic segment modeling approach to query-by-example spoken term detection[C]//Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. Kyoto, Japan:IEEE, 2012: 5157-5160.
[9]
Müller M. Dynamic time warping [J]. Information Retrieval for Music and Motion, 2007: 69-84.
[10]
Hazen T J, Shen W, White C. Query-by-example spoken term detection using phonetic posteriorgram templates[C]//Proceedings of the IEEE Automatic Speech Recognition & Understanding. Merona, Italy:IEEE, 2009: 421-426.
[11]
Zhang Y, Glass J R. Unsupervised spoken keyword spotting via segmental DTW on Gaussian posteriorgrams[C]//Proceedings of the IEEE International Workshop on Automatic Speech Recognition & Understanding. Merano, Italy:IEEE, 2009: 398-403.
[12]
Lee C, Glass J. A nonparametric bayesian approach to acoustic model discovery[C]//Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics. Jeju island, Korea:IEEE, 2012: 40-49.
[13]
Yang P, Xie L. Mandarin speech pattern discovery using segmental dynamic time warping and posteriorgram features [J]. Journal of Tsinghua University (Sci. & Tech.), 2013, 53(6):903-907.[杨鹏,谢磊.基于动态时间规整和后验特征的中文语音模式发现[J]. 清华大学学报: 自然科学版,2013,53(6):903-907].
[14]
Jansen A, Niyogi P. Intrinsic spectral analysis [J]. IEEE Transaction on Signal Processing, 2013, 61(7):1698-1710.
[15]
Yang P, Xie L, Leung C C, et al. Intrinsic spectral analysis based on temporal context features for query by example spoken term detection[C]//Proceedings of the 18th Annual Conference of the International Speech Communication Association. Singapore:ISCA, 2014, 1722-1726.
[16]
更多...
[17]
Anguera X, Ferrarons M. Memory efficient subsequence DTW for query-by-example spoken term detection[C]//Proceedings of the IEEE International Conference on Multimedia Association. Singapore:IEEE, 2011: 1909-1912.
[18]
Chelba C, Hazen T J, Sarac,lar M. Retrieval and browsing of spoken content[J]. IEEE Signal Process. Mag., 2008, 25(3): 39-49.
[19]
Zhang Y, Salakhutdinov R, Chang H A, et al. Resource configurable spoken query detection using deep Boltzmann machines[C]//Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. Kyoto, Japan:IEEE, 2012: 5161-5164.
[20]
Wilpon J, Juang B, Rabiner L. An investigation on the use of acoustic sub-word units for automatic speech recognition[C]//Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing. Texas, USA:IEEE, 1987: 821-824.
[21]
Qiao Y, Shimomura N, Minematsu N. Unsupervised optimal phoneme segmentation: objectives, algorithm and comparisons [C]//Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. Las Vegas, USA:IEEE, 2008: 3989-3992.
[22]
Gales M J F. Maximum likelihood linear transformations for HMM-based speech recognition [J]. Computer Speech & Language, 1998, 12(2): 75-98.
[23]
Zhang Y, Adl K, Glass J. Fast spoken query detection using lower-bound dynamic time warping on graphical processing units[C]//Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. Kyoto, Japan:IEEE, 2012: 5173-5176.
[24]
Yang P, Xie L, Luan Q, et al. A tighter lower bound estimate for dynamic time warping[C]//Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. Vancouver, Canada:IEEE, 2013: 8525-8529.
[25]
Mantena G, Achanta S, Prahallad K. Query-by-Example spoken term detection using frequency domain linear prediction and non-segmental dynamic time warping [J]. IEEE Transaction on Audio, Speech, and Language Processing, 2013, 22(5): 946-955.
[26]
Chan C, Lee L. Integrating frame-based and segment-based dynamic time warping for unsupervised spoken term detection with spoken queries[C]//Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. Prague, Czech:IEEE, 2011: 5652-5655.
[27]
Mantena G, Anguera X. Speed improvements to information retrieval-based dynamic time warping using hierarchical k-means clustering[C]//Proceedings of the IEEE International Conference on Acoustic, Speech and Signal Processing. Vancouver, Canada:IEEE, 2013: