OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

Advances in Acoustics and Vibration 2014

Subband DCT and EMD Based Hybrid Soft Thresholding for Speech Enhancement

DOI: 10.1155/2014/765454

Erhan Deger,Md. Khademul Islam Molla,Keikichi Hirose,Nobuaki Minematsu,Md. Kamrul Hasan

Full-Text Cite this paper Add to My Lib

Abstract:

This paper presents a two-stage soft thresholding algorithm based on discrete cosine transform (DCT) and empirical mode decomposition (EMD). In the first stage, noisy speech is decomposed into eight frequency bands and a specific noise variance is calculated for each one. Based on this variance, each band is denoised using soft thresholding in DCT domain. The remaining noise is eliminated in the second stage through a time domain soft thresholding strategy adapted to the intrinsic mode functions (IMFs) derived by applying EMD on the signal obtained from the first stage processing. Significantly better SNR improvement and perceptual speech quality results for different noise types prove the superiority of the proposed algorithm over recently reported techniques. 1. Introduction In many speech related systems, the desired signal is not available directly; rather it is mostly contaminated with some interference sources. These background noise signals degrade the quality and intelligibility of the original speech, resulting in a severe drop in the performance of the post applications. Speech enhancement aims at improving the perceptual quality and intelligibility of such speech signals degraded in noisy environments, mainly through noise reduction algorithms [1]. Due to its significant importance in today’s information technology, many methods have been developed for this purpose. A major problem in most algorithms is that the enhanced speech signal has distortions compared to the original one which results in loss of some speech details. The residual noise is another problem which affects the performance of the postprocessing systems. Soft thresholding is a powerful technique used for removing the noise components by subtracting a constant value from the coefficients of the noisy speech signal obtained by the analyzing transformation. However, such type of direct subtraction results in a degradation of the speech components. Unlike the conventional constant noise-level subtraction rule [2, 3], a new soft thresholding strategy based on frequency frames was proposed in [4]. The later one is able to remove the noise components while giving significantly less damage to the speech signal. This enables even signals with high SNRs to be processed effectively. However due to the thresholding criteria, a noticeable amount of noise still remains in the enhanced signal. Another disadvantage is the lack of robustness of the algorithm to different noise types. The empirical mode decomposition (EMD), recently pioneered by Huang et al. [5] as a new and powerful data

References

[1]	J. R. Deller, J. G. Proakis, and J. H. L. Hansen, Discrete-Time Processing of Speech Signals, IEEE Press, New York, NY, USA, 2000.
[2]	D. L. Donoho, “De-noising by soft-thresholding,” IEEE Transactions on Information Theory, vol. 41, no. 3, pp. 613–627, 1995.
[3]	M. Bahoura and J. Rouat, “Wavelet speech enhancement based on the Teager energy operator,” IEEE Signal Processing Letters, vol. 8, no. 1, pp. 10–12, 2001.
[4]	S. Salahuddin, S. Z. Al Islam, M. K. Hasan, and M. R. Khan, “Soft thresholding for DCT speech enhancement,” Electronics Letters, vol. 38, no. 24, pp. 1605–1607, 2002.
[5]	N. E. Huang, Z. Shen, S. R. Long, et al., “The empirical mode decomposition and Hilbert spectrum for non-linear and non-stationary time series analysis,” Proceedings of the Royal Society A, vol. 454, pp. 903–995, 1998.
[6]	P. Flandrin, G. Rilling, and P. Gon？alvés, “Empirical mode decomposition as a filter bank,” IEEE Signal Processing Letters, vol. 11, no. 2, pp. 112–114, 2004.
[7]	M. C. Ivan and G. B. Richard, “Empirical mode decomposition based frequency attributes,” in Proceedings of the 69th SEG Meeting, Houston, Tex, USA, 1999.
[8]	Z. Wu and N. E. Huang, “Ensemble empirical mode decomposition: a noise-assisted data analysis method,” Advances in Adaptive Data Analysis, vol. 1, no. 1, pp. 1–41, 2009.
[9]	D. P. Madic, N. U. Rehman, Z. Wu, and N. E. Huang, “Empirical mode decomposition based time-frequency analysis of multivariate signals: the power of adaptive data analysis,” IEEE Signal Processing Magazine, vol. 30, no. 6, pp. 74–86, 2013.
[10]	N. U. Rehman, C. Park, N. E. Huang, and D. P. Mandic, “EMD via MEMD: multivariate noise-aided computation of standard EMD,” Advances in Adaptive Data Analysis, vol. 5, no. 2, pp. 1–25, 2013.
[11]	M. K. Hasan, M. S. A. Zilany, and M. R. Khan, “DCT speech enhancement with hard and soft thresholding criteria,” Electronics Letters, vol. 38, no. 13, pp. 669–670, 2002.
[12]	A. W. Rix, J. G. Beerends, M. P. Hollier, and A. P. Hekstra, “Perceptual evaluation of speech quality (PESQ)—a new method for speech quality assessment of telephone networks and codecs,” in Proceedings of the IEEE Interntional Conference on Acoustics, Speech, and Signal Processing, vol. 2, pp. 749–752, May 2001.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133