%0 Journal Article
%T 基于轻量化卷积神经网络的音频场景分类研究
Research on Audio Scene Recognition Based on Lightweight Convolutional Neural Network
%A 毛柯翔
%A 谢颖华
%J Computer Science and Application
%P 995-1005
%@ 2161-881X
%D 2023
%I Hans Publishing
%R 10.12677/CSA.2023.135097
%X 为了提升用于音频场景识别的低复杂度神经网络的特征提取能力和性能,本文研究了以卷积神经网络(CNN)为主要方法的音频场景分类方法,在传统CNN结构上加入并改进了单独的注意力映射层,改进并对比了两种可用于轻量化卷积网络的注意力机制,在部分卷积层采用深度可分离卷积降低整体网络的参数量。使用较低成本的分组条状卷积替换原始卷积,采用了时频分离方法对整体卷积进行了设计,最终提出了SFAC (Sequence Frequency Attention CNN)网络模型。在语音场景多分类数据集(TAU Urban Acoustic Scenes、UrbanSound8K)上对比了SFAC和多个基于VGG结构的基线卷积网络模型,结果表明,本文提出的神经网络在保持较低的复杂度的前提下,对比基线模型能获得更高的准确度。
In order to improve the feature extraction ability and performance of low complexity neural net-works for audio scene recognition, this paper investigates the audio scene recognition method with Convolutional Neural Network (CNN) as the main method, adds and improves a separate attention mapping layer on the traditional CNN structure, improves and compares two attention mechanisms that can be used for lightweight convolutional networks, and uses deep separable convolution in some convolutional layers to reduce the number of parameters of the overall network. The original convolution is replaced by a low-cost grouping strip convolution, and the time-frequency separation method is used to design the overall convolution. Finally, the SFAC (Sequence Frequency Attention CNN) network model is proposed. The SFAC and multiple baseline convolutional network models based on VGG structure are compared on the speech scene multi-classification datasets (TAU Urban Acoustic Scenes, UrbanSound8K). The results show that the neural network proposed in this paper can obtain higher accuracy than the baseline model while maintaining lower complexity.
%K 音频场景识别,卷积神经网络,注意力卷积,异形卷积,通道注意力
Audio Scene Recognition
%K Convolutional Neural Network
%K Attention Convolution
%K Special-Shaped Convolution
%K Channel Attention
%U http://www.hanspub.org/journal/PaperInformation.aspx?PaperID=65450