全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...

Direct Recovery of Clean Speech Using a Hybrid Noise Suppression Algorithm for Robust Speech Recognition System

DOI: 10.5402/2012/306305

Full-Text   Cite this paper   Add to My Lib

Abstract:

A new log-power domain feature enhancement algorithm named NLPS is developed. It consists of two parts, direct solution of nonlinear system model and log-power subtraction. In contrast to other methods, the proposed algorithm does not need prior speech/noise statistical model. Instead, it works by direct solution of the nonlinear function derived from the speech recognition system. Separate steps are utilized to refine the accuracy of estimated cepstrum by log-power subtraction, which is the second part of the proposed algorithm. The proposed algorithm manages to solve the speech probability distribution function (PDF) discontinuity problem caused by traditional spectral subtraction series algorithms. The effectiveness of the proposed filter is extensively compared using the standard database, AURORA2. The results show that significant improvement can be achieved by incorporating the proposed algorithm. The proposed algorithm reaches a recognition rate of over 86% for noisy speech (average from SNR 0?dB to 20?dB), which means a 48% error reduction over the baseline Mel-frequency Cepstral Coefficient (MFCC) system. 1. Introduction The main objective of speech recognition is to get a higher recognition rate. However, lots of factors tend to degrade the performance of automatic speech recognition (ASR) system, such as environmental noise, channel distortion, and speaker variability [1, 2]. Generally, automatic speech recognition system consists of two parts, feature extraction and pattern matching. Therefore, methods which aim to improve the performance of ASR system can be mainly divided into two categories, the “model” approach and the “feature” approach. The “model” approach mainly focuses on improving the speech recognizer, where the speech features are classified into different patterns developed from the statistical properties of speech. As for “feature" approach, emphasis is put on improving the robustness of speech features. The method proposed by this paper belongs to this category. Noise reduction or clean speech estimation is a straight forward “feature” approach to improve the performance of ASR systems. There are different ways to get the estimation. minimum mean square Error is one of the most important ones. Ephraim derived the short-time spectral amplitude (STSA) estimator using minimum mean square error (MMSE) in 1984 [3], which has become a standard approach for clean speech estimation in speech processing. The advantage of MMSE estimator is very obvious. It is mathematically optimized, which theoretically can get a good estimation of the

References

[1]  L. Rabiner and B.-H. Juang, Fundamentals of Speech Recognition, Prentice-Hall, Englewood Cliffs, Ny, USA, 1993.
[2]  B. Gold and N. Morgan, Speech and Audio Signal Processing—Processing and Perception of Speech and Music, John Wiley & Sons, 2000.
[3]  Y. Ephraim and D. Malah, “Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 32, no. 6, pp. 1109–1121, 1984.
[4]  D. Yu, L. Deng, J. Droppo, J. Wu, Y. Gong, and A. Acero, “Robust speech recognition using a cepstral minimum-mean-square-error- motivated noise suppressor,” IEEE Transactions on Audio, Speech and Language Processing, vol. 16, no. 5, pp. 1061–1070, 2008.
[5]  K. M. Indrebo, R. J. Povinelli, and M. T. Johnson, “Minimum mean-squared error estimation of mel-frequency cepstral coefficients using a novel distortion model,” IEEE Transactions on Audio, Speech and Language Processing, vol. 16, no. 8, pp. 1654–1661, 2008.
[6]  J. Chen, J. Benesty, Y. Huang, and S. Doclo, “New insights into the noise reduction Wiener filter,” IEEE Transactions on Audio, Speech and Language Processing, vol. 14, no. 4, pp. 1218–1233, 2006.
[7]  L. Deng, J. Droppo, and A. Acero, “Estimating cepstrum of speech under the presence of noise using a joint prior of static and dynamic features,” IEEE Transactions on Speech and Audio Processing, vol. 12, no. 3, pp. 218–233, 2004.
[8]  S. F. Boll, “Suppression of acoustic noise in speech using spectral subtraction,” IEEE Trans Acoust Speech Signal Process, vol. 27, no. 2, pp. 113–120, 1979.
[9]  European Telecommunications Standards Institute (ETSI), ETSI ES 202 050 V1.1.5, 2007.
[10]  C. Chia-Ping and A. B. Jeff, “MVA processing of speech features,” IEEE Transactions on Audio, Speech and Language Processing, vol. 15, no. 1, pp. 257–270, 2007.
[11]  A. Acero, Acoustical and environmental robustness in automatic speech recognition [Ph.D. thesis], Department of Electrical and Computer Engineering, Carnegie Mellon University, 1990.
[12]  K. Ishizuka, T. Nakatani, M. Fujimoto, and N. Miyazaki, “Noise robust voice activity detection based on periodic to aperiodic component ratio,” Speech Communication, vol. 52, no. 1, pp. 41–60, 2010.
[13]  R. Martin, “Noise power spectral density estimation based on optimal smoothing and minimum statistics,” IEEE Transactions on Speech and Audio Processing, vol. 9, no. 5, pp. 504–512, 2001.
[14]  H. Hirsch and D. Pearce, “The Aurora experimental framework for the performance evaluations of speech recognition system under noisy conditions,” in Proceedings of the 7th international conference on Information, communications and signal processing (ICICS '09), Paris, France, 2000.
[15]  ITU-T, Recommendation G.712. Transmission Performance Characteristics for Pulse Code Modulation Channels, Geneva, Switzerland, 1996.
[16]  R. G. Leonard, “A database for speaker independent digit recognition,” in Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP '84, vol. 3, pp. 42–53, 1984.
[17]  M. Brookes, Voicebox, http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html.
[18]  J. Droppo, A. Acero, and L. Deng, “Evaluation of the SPLICE algorithm on the Aurora 2 database,” in Proceedings of the Eurospeech Conference, International Speech Communication Association, Aalbodk, Denmark, September 2001.
[19]  R. Martin, “Spectral subtraction based on minimum statistics,” in Proceedings of the European Signal Processing Conference (EUSIPCO '96), pp. 1182–1185, 1994.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133