All Title Author
Keywords Abstract


A New Method of Voiced/Unvoiced Classification Based on Clustering

DOI: 10.4236/jsip.2011.24048, PP. 336-347

Keywords: Speech, Voiced, Unvoiced, Clustering, Cepstrum, Autocorrelation, Zero crossing

Full-Text   Cite this paper   Add to My Lib

Abstract:

In this paper, a new method for making v/uv decision is developed which uses a multi-feature v/uv classification algorithm based on the analysis of cepstral peak, zero crossing rate, and autocorrelation function (ACF) peak of short-time segments of the speech signal by using some clustering methods. This v/uv classifier achieved excellent results for identification of voiced and unvoiced segments of speech.

References

[1]  E. Fisher, J. Tabrikian and S. Dubnov, “Generalized Likelihood Ratio Test for Voiced-Unvoiced Decision in Noisy Speech Using the Harmonic Model,” IEEE Transactions on Audio, Speech, and Language Processing, Vol. 14, No. 2, 2006, pp. 502-510. doi:10.1109/TSA.2005.857806
[2]  Y. Qi and B. R. Hunt, “Voiced-Unvoiced-Silence Classifications of Speech Using Hybrid Features and a Network Classifier,” IEEE Transactions on Speech and Audio Processing, Vol. 1, No. 2, 2002, pp. 250-255. doi:10.1109/89.222883
[3]  B. Atal and L. Rabiner, “A Pattern Recognition Approach to Voicedunvoiced-Silence Classification with Applications to Speech Recognition,” IEEE Transactions on Acoustics, Speech and Signal Processing, Vol. 24, No. 3, 2003, pp. 201-212. doi:10.1109/TASSP.1976.1162800
[4]  F. Y. Qi and C. C. Bao, “A Method for Voiced/Unvoiced/Silence Classification of Speech with Noise Using SVM,” Acta Electronica Sinica, Vol. 34, No. 4, 2006, pp. 605-611.
[5]  P. Jancovic and M. Kokuer, “Estimation of Voicing-Character of Speech Spectra Based on Spectral Shape,” IEEE Signal Processing Letters, Vol. 14, No. 1, 2006, pp. 66-69. doi:10.1109/LSP.2006.881517
[6]  B. Atal and M. Schroeder, “Predictive Coding of Speech Signals and Subjective Error Criteria,” IEEE Transactions on Acoustics, Speech and Signal Processing, Vol. 27, No. 3, 2003, pp. 247-254. doi:10.1109/TASSP.1979.1163237
[7]  L. Hui, B. Dai and L. Wei, “A Pitch Detection Algorithm Based on AMDF and ACF,” 2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Toulouse, 14-19 May 2006.
[8]  P. A. Naylor, A. Kounoudes, J. Gudnason and M. Brookes, “Estimation of Glottal Closure Instants in Voiced Speech Using the DYPSA Algorithm,” IEEE Transactions on Audio, Speech and Language Processing, Vol. 15, No. 1, 2007, pp. 34-43. doi:10.1109/TASL.2006.876878
[9]  A. V. Oppenheim, “Speech Spectrograms Using the Fast Fourier Transform,” IEEE Spectrum, Vol. 7, No. 8, 2009, pp. 57-62. doi:10.1109/MSPEC.1970.5213512
[10]  J. R. Deller, J. G. Proakis and J. H. L. Hansen, “Discrete-Time Processing of Speech Signals,” 2nd Edition, IEEE Press, New York, 2000.
[11]  Z. D. Zhao, X. M. Hu and J. F. Tian, “An Effective Pitch Detection Method for Speech Signals with Low Signal-to-Noise Ratio,” International Conference on Machine Learning and Cybernetics, Vol. 5, 2008, pp. 2775-2778.
[12]  S. Imai, “Cepstral Analysis Synthesis on the Mel Frequency Scale,” IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. 8, 2003, pp. 93-96.
[13]  J. K. Shah, A. N. Iyer, B. Y. Smolenski and R. E. Yantorno, “Robust Voiced/Unvoiced Classification Using Novel Features and Gaussian Mixture Model,” IEEE International Conference on Acoustics, Speech, and Signal Processing, Philadelphia, 2004, pp. 17-21.
[14]  R. G. Bachu, S. Kopparthi, B. Adapa and B. D. Barkana, “Separation of Voiced and Unvoiced Using Zero Crossing Rate and Energy of the Speech Signal,” American Society for Engineering Education (ASEE) Zone Conference Proceedings, 2008, pp. 1-7.
[15]  L. Rabiner, “On the Use of Autocorrelation Analysis for Pitch Detection,” IEEE Transactions on Acoustics, Speech and Signal Processing, Vol. 25, No. 1, 2003. pp. 24-33. doi:10.1109/TASSP.1977.1162905
[16]  M. S. Rahman and T. Shimamura, “Pitch Determination Using Autocorrelation Function in Spectral Domain,” Eleventh Annual Conference of the International Speech Communication Association, Makuhari, 2010, pp. 653-656.
[17]  R. J. McAulay and T. F. Quatieri, “Pitch Estimation and Voicing Detection Based on a Sinusoidal Speech Model,” International Conference on Acoustics, Speech, and Signal Processing, Vol. 1, 1990, pp. 249-252. doi:10.1109/ICASSP.1990.115585
[18]  L. Siegel, “A Procedure for Using Pattern Classification Techniques to Obtain a Voiced/Unvoiced Classifier,” IEEE Transactions on Acoustics, Speech and Signal Processing, Vol. 27, No. 1, 2003, pp. 83-89. doi:10.1109/TASSP.1979.1163186
[19]  L. Siegel and A. Bessey, “Voiced/Unvoiced/Mixed Excitation Classification of Speech,” IEEE Transactions on Acoustics, Speech and Signal Processing, Vol. 30, No. 3, 2003, pp. 451-460. doi:10.1109/TASSP.1982.1163910
[20]  S. Ahmadi and A. S. Spanias, “Cepstrum-Based Pitch Detection Using a New Statistical V/UV Classification Algorithm,” IEEE Transactions on Speech and Audio Processing, Vol. 7, No. 3, 2002, pp. 333-338. doi:10.1109/89.759042
[21]  M. Radmard, M. Hadavi, S. Ghaemmaghami and M. M. Nayebi, “Clustering Based Voiced/Unvoiced Decision for Speech Signals,” Signal Processing Symposium (SPS), Poland, 2011.
[22]  A. M. Noll, “Clipstrum Pitch Determination,” The Journal of the Acoustical Society of America, Vol. 44, No. 6, 1968, pp. 1585-1591. doi:10.1121/1.1911300
[23]  J. A. Hartigan and M. A. Wong, “A K-Means Clustering Algorithm,” Journal of the Royal Statistical Society. Series C, Vol. 28, No. 1, 1979, pp. 100-108.
[24]  H. V. Poor, “An Introduction to Signal Detection and Estimation,” Springer, Berlin, 1994.

Full-Text

comments powered by Disqus