|
声源定位问题中卷积神经网络结构研究
|
Abstract:
声源定位是声源信号处理中非常重要的研究目标。传统方法容易受到噪声和混响的干扰。随着深度学习算法在诸多领域的成功应用,本文探究了使用深度学习算法解决声源定位问题。本文对使用卷积神经网络结构实现基于麦克风信号的声源定位性能分析,并基于仿真实验探究相同房间条件和声源条件下,不同卷积层和卷积核数量对于声源定位性能的影响。实验表明,声音信号的基于相位变换加权的广义互相关特征作为卷积神经网络输入信号,在声音信噪比10 dB~40 dB,混响在200~600 ms的常规房间条件设定下,相比于其他方法其声源定位准确率最高,且卷积网络中包含6个卷积层,首层卷积层卷积核为4时其网络定位精度和计算效率之间取得了较好的平衡。
Sound source localization is a crucial research objective in sound source signal processing. Traditional methods are prone to interference from noise and reverberation. With the successful application of deep learning algorithms in many fields, this paper explores the use of deep learning algorithms to solve the problem of sound source localization. This study analyzes the performance of sound source localization based on microphone signals using a convolutional neural network (CNN) structure. Through simulation experiments, we investigate the impact of different numbers of convolutional layers and convolutional kernels on sound source localization performance under the same room and sound source conditions. The experiments show that the sound signal after the generalized cross-correlation phase transform operation is used as the input signal of the convolutional neural network, undertypical room conditions with a signal-to-noise ratio of 10 dB~40 dB and reverberation times of 200~600 ms, this method achieves the highest localization accuracy compared to other methods. Furthermore, when the network contains 6 convolutional layers and the first layer has 4 convolutional kernels, a good balance between localization accuracy and computational efficiency is achieved.
[1] | Alameda-Pineda, X. and Horaud, R. (2014) A Geometric Approach to Sound Source Localization from Time-Delay Estimates. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22, 1082-1095. https://doi.org/10.1109/taslp.2014.2317989 |
[2] | Yook, D., Lee, T. and Cho, Y. (2016) Fast Sound Source Localization Using Two-Level Search Space Clustering. IEEE Transactions on Cybernetics, 46, 20-26. https://doi.org/10.1109/tcyb.2015.2391252 |
[3] | Li, X., Deng, Z.D., Rauchenstein, L.T. and Carlson, T.J. (2016) Contributed Review: Source-Localization Algorithms and Applications Using Time of Arrival and Time Difference of Arrival Measurements. Review of Scientific Instruments, 87, Article 041502. https://doi.org/10.1063/1.4947001 |
[4] | Yang, X.Y., Liu, Y., Lian, Y.C., et al. (2020) GCC-Phat Based Acoustic Source Localization Using Incremental Learning with Adaptive Threshold. IEEE Access, 8, 71702-71716. |
[5] | Jekateryńczuk, G. and Piotrowski, Z. (2023) A Survey of Sound Source Localization and Detection Methods and Their Applications. Sensors, 24, Article 68. https://doi.org/10.3390/s24010068 |
[6] | Soumitro, C. and Emanuel, A.P. (2019) Multi-Speaker DOA Estimation Using Deep Convolutional Networks Trained with Noise Signals. 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, 20-23 October 2019, 64-68. |
[7] | Ju, C.Y., Wu, C.L., Chen, W.J., et al. (2022) Neural Networks Based Subspace Learning Method for Direction-of-Arrival Estimation. 2022 IEEE International Conference on Consumer Electronics-Taiwan, Taipei, 6-8 July 2022, 1-2. |
[8] | 高春艳, 赖光金, 吕晓玲, 等. 基于卷积神经网络的移动机器人声源定位方法综述[J]. 科学技术与工程, 2024, 24(7): 2617-2624. |
[9] | Gao, C.Y., Lai, G.J., Lü, X.L., et al. (2024) Review of CNN-Based Methods for Mobile Robot Sound Source Localization. Science Technology and Engineering, 24, 2617-2624. |
[10] | 焦琛, 张涛, 孙建红. 基于卷积神经网络的室内麦克风阵列声源定位算法[J]. 激光与光电子学进展, 2020, 57(8): 187-192. |
[11] | Jiao, C., Zhang, T. and Sun, J.H. (2020) Convolutional Neural Network Based Indoor Microphone Array Sound Source Localization. Advances in Laser and Optoelectronics, 57, 195-200. |
[12] | Kwon, B., Park, Y. and Park, Y. (2010) Analysis of the GCC-PHAT Technique for Multiple Sources. ICCAS 2010, Gyeonggi-do, 27-30 October 2010, 2070-2073. https://doi.org/10.1109/iccas.2010.5670137 |
[13] | Alien, J.B. and Berkley, D.A. (1976) Image Method for Efficiently Simulating Small-Room Acoustics. The Journal of the Acoustical Society of America, 60, S9-S9. https://doi.org/10.1121/1.2003643 |
[14] | Krizhevsky, A., Sutskever, I. and Hinton, G.E. (2017) Imagenet Classification with Deep Convolutional Neural Networks. Communications of the ACM, 60, 84-90. https://doi.org/10.1145/3065386 |
[15] | Simonyan, K. and Zisserman, A. (2015) Very Deep Convolutional Networks for Large-Scale Image Recognition. 2015 International Conference on Learning Representations, San Diego, 7-9 May 2015, 1-4. |
[16] | Lecun, Y., Bottou, L., Bengio, Y. and Haffner, P. (1998) Gradient-Based Learning Applied to Document Recognition. Proceedings of the IEEE, 86, 2278-2324. https://doi.org/10.1109/5.726791 |
[17] | Zhang, Y.M., Guo, Q.L., Lu, Y., et al. (2023) Nonlinear Weighted GCC-Phat for Acoustic Source Localization. IEEE Signal Processing Letters, 30, 1-5. |