OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

Computer Science and Application 2025

在多特征下基于卷积神经网络与注意力机制的环境声分类研究
Research on Environmental Sound Classification Based on Convolutional Neural Network and Attention Mechanism under Multiple Features

DOI: 10.12677/csa.2025.153070, PP. 180-188

赵乾曜, 田益民, 李纪元, 孙兆永

Keywords: 噪音分类，混合特征，卷积网络，注意力机制
Noise Classification, Hybrid Features, Convolutional Networks, Attention Mechanisms

Full-Text Cite this paper Add to My Lib

Abstract:

为解决传统城市噪音分类中数据过少而导致模型泛化效果不好，鲁棒性过高，同时传统的噪音特征不能解决关键数据丢失问题导致模型准确率下降。本文提出了一种基于MFCC + GFCC混合特征和噪音语谱图特征的双路卷积模型。该模型首先对噪音数据进行MFCC，GFCC和语谱图变化，提取特征数据，将MFCC和GFCC数据分别进行卷积压缩处理，并在混合后进行分类。对于噪音的语谱图特征进行卷积后，使用注意力机制模块对其各个通道进行权重标记后进行分类，将两路的分类结果进行贝叶斯数值融合，从而实现对城市噪音的正确分类。实验结果表明，识别的准确率比传统模型网络在大数据样本的数据集下有了8%左右以上的提升。
In order to solve the problem of poor generalization effect and high robustness of the model due to too little data in the traditional urban noise classification, the accuracy of the model decreases due to the fact that the traditional noise features cannot solve the problem of key data loss. In this paper, a two-way convolution model based on MFCC + GFCC hybrid features and noise spectral features is proposed. Firstly, the noise data is changed by MFCC, GFCC and spectrogram, the feature data is extracted, the MFCC and GFCC data are convoluted and compressed respectively, and the classification is carried out after mixing. After convoluting the spectral features of noise, the attention mechanism module is used to classify each channel by weighting labeling, and the classification results of the two channels are fused with Bayesian numerical values, so as to achieve the correct classification of urban noise. Experimental results show that the accuracy of recognition is improved by more than 8% compared with the traditional model network under the dataset of big data samples.

References

[1]	Muhammad, G., Alotaibi, Y.A., Alsulaiman, M. and Huda, M.N. (2010) Environment Recognition Using Selected MPEG-7 Audio Features and Mel-Frequency Cepstral Coefficients. 2010 5th International Conference on Digital Telecommunications, Athens, 13-19 June 2010, 11-16. https://doi.org/10.1109/icdt.2010.10
[2]	Luz, J.S., Oliveira, M.C., Araújo, F.H.D. and Magalhães, D.M.V. (2021) Ensemble of Handcrafted and Deep Features for Urban Sound Classification. Applied Acoustics, 175, Article ID: 107819. https://doi.org/10.1016/j.apacoust.2020.107819
[3]	蔡尚, 金鑫, 高盛翔, 等. 用于噪音鲁棒性语音识别的子带能量规整感知线性预测系数[J]. 声学学报, 2012, 37(6): 667-672.
[4]	Cao, J., Cao, M., Wang, J., Yin, C., Wang, D. and Vidal, P. (2018) Urban Noise Recognition with Convolutional Neural Network. Multimedia Tools and Applications, 78, 29021-29041. https://doi.org/10.1007/s11042-018-6295-8
[5]	孙陈影, 沈希忠. LSTM和GRU在城市声音分类中的应用[J]. 应用技术学报, 2020, 20(2): 158-164.
[6]	Pillos, A., et al. (2016) A Real-Time Environmental Sound Recognition System for the Android OS. Detection and Classification of Acoustic Scenes and Events 2016, Budapest, 3 September 2016, 1-5.
[7]	Boddapati, V., Petef, A., Rasmusson, J. and Lundberg, L. (2017) Classifying Environmental Sounds Using Image Recognition Networks. Procedia Computer Science, 112, 2048-2056. https://doi.org/10.1016/j.procs.2017.08.250
[8]	Theodorou, T., Mporas, I. and Fakotakis, N. (2015) Automatic Sound Recognition of Urban Environment Events. 17th International Conference, SPECOM 2015, Athens, 20-24 September 2015, 129-136. https://doi.org/10.1007/978-3-319-23132-7_16
[9]	Zhang, X., Zou, Y. and Shi, W. (2017) Dilated Convolution Neural Network with Leakyrelu for Environmental Sound Classification. 2017 22nd International Conference on Digital Signal Processing (DSP), London, 23-25 August 2017, 1-5. https://doi.org/10.1109/icdsp.2017.8096153
[10]	Chu, S., Narayanan, S., Kuo, C. and Mataric, M. (2006) Where Am I? Scene Recognition for Mobile Robots Using Audio Features. 2006 IEEE International Conference on Multimedia and Expo, Toronto, 9-12 July 2006, 885-888. https://doi.org/10.1109/icme.2006.262661
[11]	Barchiesi, D., Giannoulis, D., Stowell, D. and Plumbley, M.D. (2015) Acoustic Scene Classification: Classifying Environments from the Sounds They Produce. IEEE Signal Processing Magazine, 32, 16-34. https://doi.org/10.1109/msp.2014.2326181
[12]	Muhammad, G., Alotaibi, Y.A., Alsulaiman, M. and Huda, M.N. (2010) Environment Recognition Using Selected MPEG-7 Audio Features and Mel-Frequency Cepstral Coefficients. 2010 5th International Conference on Digital Telecommunications, Athens, 13-19 June 2010, 11-16. https://doi.org/10.1109/icdt.2010.10
[13]	Bountourakis, V., Vrysis, L. and Papanikolaou, G. (2015) Machine Learning Algorithms for Environmental Sound Recognition: Towards Soundscape Semantics. Proceedings of the Audio Mostly 2015 on Interaction with Sound, Thessaloniki, 7-9 October 2015, 1-7. https://doi.org/10.1145/2814895.2814905
[14]	Mushtaq, Z. and Su, S. (2020) Environmental Sound Classification Using a Regularized Deep Convolutional Neural Network with Data Augmentation. Applied Acoustics, 167, Article ID: 107389. https://doi.org/10.1016/j.apacoust.2020.107389
[15]	Sang, J., Park, S. and Lee, J. (2018) Convolutional Recurrent Neural Networks for Urban Sound Classification Using Raw Waveforms. 2018 26th European Signal Processing Conference (EUSIPCO), Rome, 3-7 September 2018, 2444-2448. https://doi.org/10.23919/eusipco.2018.8553247
[16]	Gencoglu, O., Virtanen, T. and Huttunen, H. (2014) Recognition of Acoustic Events Using Deep Neural Networks. 2014 22nd European Signal Processing Conference (EUSIPCO), Lisbon, 1-5 September 2014, 506-510.
[17]	Khamparia, A., Gupta, D., Nguyen, N.G., Khanna, A., Pandey, B. and Tiwari, P. (2019) Sound Classification Using Convolutional Neural Network and Tensor Deep Stacking Network. IEEE Access, 7, 7717-7727. https://doi.org/10.1109/access.2018.2888882
[18]	Yao, K., Yang, J., Zhang, X., Zheng, C. and Zeng, X. (2019) Robust Deep Feature Extraction Method for Acoustic Scene Classification. 2019 IEEE 19th International Conference on Communication Technology (ICCT), Xi’an, 16-19 October 2019, 198-202. https://doi.org/10.1109/icct46805.2019.8947252
[19]	Piczak, K.J. (2015) Environmental Sound Classification with Convolutional Neural Networks. 2015 IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP), Boston, 17-20 September 2015, 1-6. https://doi.org/10.1109/mlsp.2015.7324337
[20]	周萍, 沈昊, 郑凯鹏. 基于MFCC与GFCC混合特征参数的说话人识别[J]. 应用科学学报, 2019, 37(1): 24-32.
[21]	Zhang, Z., Xu, S., Zhang, S., Qiao, T. and Cao, S. (2021) Attention Based Convolutional Recurrent Neural Network for Environmental Sound Classification. Neurocomputing, 453, 896-903. https://doi.org/10.1016/j.neucom.2020.08.069

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133

在多特征下基于卷积神经网络与注意力机制的环境声分类研究Research on Environmental Sound Classification Based on Convolutional Neural Network and Attention Mechanism under Multiple Features

在多特征下基于卷积神经网络与注意力机制的环境声分类研究
Research on Environmental Sound Classification Based on Convolutional Neural Network and Attention Mechanism under Multiple Features