%0 Journal Article
%T Subband DCT and EMD Based Hybrid Soft Thresholding for Speech Enhancement
%A Erhan Deger
%A Md. Khademul Islam Molla
%A Keikichi Hirose
%A Nobuaki Minematsu
%A Md. Kamrul Hasan
%J Advances in Acoustics and Vibration
%D 2014
%I Hindawi Publishing Corporation
%R 10.1155/2014/765454
%X This paper presents a two-stage soft thresholding algorithm based on discrete cosine transform (DCT) and empirical mode decomposition (EMD). In the first stage, noisy speech is decomposed into eight frequency bands and a specific noise variance is calculated for each one. Based on this variance, each band is denoised using soft thresholding in DCT domain. The remaining noise is eliminated in the second stage through a time domain soft thresholding strategy adapted to the intrinsic mode functions (IMFs) derived by applying EMD on the signal obtained from the first stage processing. Significantly better SNR improvement and perceptual speech quality results for different noise types prove the superiority of the proposed algorithm over recently reported techniques. 1. Introduction In many speech related systems, the desired signal is not available directly; rather it is mostly contaminated with some interference sources. These background noise signals degrade the quality and intelligibility of the original speech, resulting in a severe drop in the performance of the post applications. Speech enhancement aims at improving the perceptual quality and intelligibility of such speech signals degraded in noisy environments, mainly through noise reduction algorithms [1]. Due to its significant importance in today’s information technology, many methods have been developed for this purpose. A major problem in most algorithms is that the enhanced speech signal has distortions compared to the original one which results in loss of some speech details. The residual noise is another problem which affects the performance of the postprocessing systems. Soft thresholding is a powerful technique used for removing the noise components by subtracting a constant value from the coefficients of the noisy speech signal obtained by the analyzing transformation. However, such type of direct subtraction results in a degradation of the speech components. Unlike the conventional constant noise-level subtraction rule [2, 3], a new soft thresholding strategy based on frequency frames was proposed in [4]. The later one is able to remove the noise components while giving significantly less damage to the speech signal. This enables even signals with high SNRs to be processed effectively. However due to the thresholding criteria, a noticeable amount of noise still remains in the enhanced signal. Another disadvantage is the lack of robustness of the algorithm to different noise types. The empirical mode decomposition (EMD), recently pioneered by Huang et al. [5] as a new and powerful data
%U http://www.hindawi.com/journals/aav/2014/765454/