All Title Author
Keywords Abstract

Superwideband Bandwidth Extension Using Normalized MDCT Coefficients for Scalable Speech and Audio Coding

DOI: 10.1155/2013/909124

Full-Text   Cite this paper   Add to My Lib


A bandwidth extension (BWE) algorithm from wideband to superwideband (SWB) is proposed for a scalable speech/audio codec that uses modified discrete cosine transform (MDCT) coefficients as spectral parameters. The superwideband is first split into several subbands that are represented as gain parameters and normalized MDCT coefficients in the proposed BWE algorithm. We then estimate normalized MDCT coefficients of the wideband to be fetched for the superwideband and quantize the fetch indices. After that, we quantize gain parameters by using relative ratios between adjacent subbands. The proposed BWE algorithm is embedded into a standard superwideband codec, the SWB extension of G.729.1 Annex E, and its bitrate and quality are compared with those of the BWE algorithm already employed in the standard superwideband codec. It is shown from the comparison that the proposed BWE algorithm relatively reduces the bitrate by around 19% with better quality, compared to the BWE algorithm in the SWB extension of G.729.1 Annex E. 1. Introduction In early speech communication services, narrowband codecs having a bandwidth of around 3.4?kHz were commonly used since the available network bandwidth was quite limited. These services could provide sufficient quality for comprehension, but it was generally agreed that they did not satisfy users' increasing expectations for higher sound quality. Due to the advances in network technologies, however, this transmission bandwidth has recently been increased [1–3]. Thus, a great deal of research has been focused on further extending the bandwidth of speech and/or audio signals from narrowband to wideband, superwideband, and audio band [4–6]. There are two different kinds of approaches for extending the bandwidth according to whether or not the side information is available, as shown in Figure 1. As depicted in Figure 1(a), it is usual to realize bandwidth extension by using the side information that is transmitted from the encoder. On the other hand, it is also possible to extend bandwidth only at the decoder without any side information [7], which is shown in Figure 1(b). In other words, instead of using the side information, artificial bandwidth extension can estimate the higher band signal from the lower band signal by using a pattern recognition algorithm such as hidden Markov models (HMMs) [8] Gaussian mixture models (GMMs) [9] and [10–15]. While artificial bandwidth extension algorithms do not require any additional bits for sending the side information, their performance is somewhat restricted depending on the performance


[1]  C. Lamblin, “Recent audio/speech coding developments in ITU-T and future trends,” in Proceedings of the European Signal Processing Conference (EUSIPCO '08), Plenary Lecture, Lausanne, Switzerland, 2008.
[2]  J. A. Kang and H. K. Kim, “An adaptive packet loss recovery method based on real-time speech quality assessment and redundant speech transmission,” International Journal of Innovative Computing, Information and Control, vol. 7, no. 12, pp. 6773–6783, 2011.
[3]  J. A. Kang and H. K. Kim, “Adaptive redundant speech transmission over wireless multimedia sensor networks based on estimation of perceived speech quality,” Sensors, vol. 11, no. 9, pp. 8469–8484, 2011.
[4]  “ITU-T Temporal Document 298 R1,” Report of Q23/16 Rapporteur’s Meeting, 2008.
[5]  N. I. Park and H. K. Kim, “Artificial bandwidth extension of narrowband speech applied to CELP-type speech coding,” Information-International Interdisciplinary Journal, vol. 16, no. 3(B), pp. 3153–3164, 2013.
[6]  Y. R. Oh, Y. G. Kim, M. Kim, H. K. Kim, M. S. Lee, and H. J. Bae, “Phonetically balanced text corpus design using a similarity measure for a stereo super-wideband speech database,” IEICE Transactions on Information and Systems, vol. E94-D, no. 7, pp. 1459–1466, 2011.
[7]  P. Vary and R. Martin, Digital Speech Transmission: Enhancement, Coding and Error Concealment, Wiley, Chichester, UK, 2006.
[8]  P. Jax and P. Vary, “On artificial bandwidth extension of telephone speech,” Signal Processing, vol. 83, no. 8, pp. 1707–1719, 2003.
[9]  G.-B. Song and P. Martynovich, “A study of HMM-based bandwidth extension of speech signals,” Signal Processing, vol. 89, no. 10, pp. 2036–2044, 2009.
[10]  U. Kornagel, “Techniques for artificial bandwidth extension of telephone speech,” Signal Processing, vol. 86, no. 6, pp. 1296–1306, 2006.
[11]  H. Pulakka, L. Laaksonen, M. Vainio, J. Pohjalainen, and P. Alku, “Evaluation of an artificial speech bandwidth extension method in three languages,” IEEE Transactions on Audio, Speech and Language Processing, vol. 16, no. 6, pp. 1124–1137, 2008.
[12]  K.-T. Kim, M.-K. Lee, and H.-G. Kang, “Speech bandwidth extension using temporal envelope modeling,” IEEE Signal Processing Letters, vol. 15, pp. 429–432, 2008.
[13]  J. H. Park, H. K. Kim, M. B. Kim, and S. R. Kim, “A user voice reduction algorithm based on binaural signal separation for portable digital imaging devices,” IEEE Transactions on Consumer Electronics, vol. 58, no. 2, pp. 679–684, 2012.
[14]  J. A. Kang, C. J. Chun, H. K. Kim, M. B. Kim, and S. R. Kim, “A smart background music mixing algorithm for portable digital imaging devices,” IEEE Transactions on Consumer Electronics, vol. 57, no. 3, pp. 1258–1263, 2011.
[15]  Y. R. Oh, J. S. Yoon, H. K. Kim, M. B. Kim, and S. R. Kim, “A voice-driven scene-mode recommendation service for portable digital imaging devices,” IEEE Transactions on Consumer Electronics, vol. 55, no. 4, pp. 1739–1747, 2009.
[16]  J. Herre and M. Dietz, “MPEG-4 high-efficiency AAC coding,” IEEE Signal Processing Magazine, vol. 25, no. 3, pp. 137–142, 2008.
[17]  M. Tammi, L. Laaksonen, A. R?m?, and H. Toukomaa, “Scalable superwideband extension for wideband coding,” in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '09), pp. 161–164, Taiwan, April 2009.
[18]  B. Geiser, P. Jax, P. Vary et al., “Bandwidth extension for hierarchical speech and audio coding in ITU-T Rec. G.729.1,” IEEE Transactions on Audio, Speech and Language Processing, vol. 15, no. 8, pp. 2496–2509, 2007.
[19]  H. Ehara, T. Morii, and K. Yoshida, “Predictive vector quantization of wideband LSF using narrowband LSF for bandwidth scalable coders,” Speech Communication, vol. 49, no. 6, pp. 490–500, 2007.
[20]  Y. H. Lee, H. K. Kim, M. S. Lee, and D. Y. Kim, “Bandwidth extension of a narrowband speech coder for music delivery over IP,” Lecture Notes in Artificial Intelligence, vol. 4413, pp. 198–208, 2007.
[21]  T. Painter and A. Spanias, “Perceptual coding of digital audio,” Proceedings of the IEEE, vol. 88, no. 4, pp. 451–515, 2000.
[22]  EBU Tech Document 3253, Sound Quality Assessment Material (SQAM), 1988.
[23]  ITU-T WP3/16, Processing Test Plan for the ITU-T Joint (G.718/G.729.1) SWB/Stereo Extension Optimisation/Characterization Phase, 2008.


comments powered by Disqus