%0 Journal Article
%T Complex Cepstrum Based Voice Conversion Using Radial Basis Function
%A Jagannath Nirmal
%A Suprava Patnaik
%A Mukesh Zaveri
%A Pramod Kachare
%J ISRN Signal Processing
%D 2014
%R 10.1155/2014/357048
%X The complex cepstrum vocoder is used to modify the speaker specific characteristics of the source speaker speech to that of the target speaker speech. The low time and high time liftering are used to split the calculated cepstrum into the vocal tract and the source excitation parameters. The obtained mixed phase vocal tract and source excitation parameters with finite impulse response preserve the phase properties of the resynthesized speech frame. The radial basis function is explored to capture the nonlinear mapping function for modifying the complex cepstrum based real and imaginary components of the vocal tract and source excitation of the speech signal. The state-of-the-art Mel cepstrum envelope and the fundamental frequency ( ) are considered to represent the vocal tract and the source excitation of the speech frame, respectively. Radial basis function is used to capture and formulate the nonlinear relations between the Mel cepstrum envelope of the source and target speakers. Mean and standard deviation approach is employed to modify the fundamental frequency ( ). The Mel log spectral approximation filter is used to reconstruct the speech signal from the modified Mel cepstrum envelope and fundamental frequency. A comparison of the proposed complex cepstrum based model has been made with the state-of-the-art Mel Cepstrum Envelope based voice conversion model with objective and subjective evaluations. The evaluation measures reveal that the proposed complex cepstrum based voice conversion system approximate the converted speech signal with better accuracy than the model based on the Mel cepstrum envelope based voice conversion. 1. Introduction The voice conversion (VC) system extracts the features of the source and the target speaker sound＊s and formulates the mapping function to modify the features of the source speaker sound＊s such that the resynthesized speech sound＊s as if spoken by a target speaker [1]. Application of VC includes the personification of text to speech, design of multispeaker based speech synthesis system, audio dubbing, karaoke applications, security related system, the design of speaking aids for the speech impaired patient, broadcasting, and multimedia applications [2每4]. The VC involves the transformation of speaker specific characteristics such as vocal tract parameters, source excitation, and long term prosodic parameters with that of desired speaker parameters [5]. The vocal tract parameters are relatively more prominent for identifying the speaker uniqueness than the source excitation [5]. Several methods have been
%U http://www.hindawi.com/journals/isrn.signal.processing/2014/357048/