%0 Journal Article
%T 语音识别中深度神经网络目标值优化
%A 陈梦喆
%A 张晴晴
%A 潘接林
%A 颜永红
%J 工程科学与技术
%D 2016
%R 10.15961/j.jsuese.2016.01.025
%X 中文摘要: 训练深度神经网络声学模型时，所采用的强制对齐得到的目标值存在法精准地表示出语音实际状况的问题。针对这一问题，提出一种利用前后向算法得到非0-1分布目标值的方法。由于用于强制对齐的模型可能与处理语句不完全匹配，以及发音连续性导致的过渡边界难以分离等问题，强制对齐得到的目标值存在不合理性。新的目标值可以表示某一帧以一定概率属于邻近各状态的分布情况，更详细地描述建模单元之间的过渡，进一步还原语音的原貌，提升模型的鲁棒性。同时，为寻求模型鲁棒性和建模单元区分度之间的平衡，对算法得到的目标值进行加窗处理。在中文客服问答领域进行实验，在小数据量上验证了目标值对于训练的较大影响，并且选取窗长宽度这一参数。最后将训练数据量提升至60、80以及100 h，结果显示，新的目标值优化方法训练得到的模型在识别性能上获得提升，相对字错误率下降为1.10%～3.65%。多组实验验证新的目标值优化方法对模型训练有一定效果，在训练数据量上升的情况下依然具有有效性。&lt;/br&gt;Abstract:In order to improve the targets for training acoustic model which cannot reflect the nature of speech exactly,a new kind of target obtained by forward backward algorithm was proposed.In the proposed target,a speech frame was aligned to several adjacent states with different probabilities.The new target improved the robustness of the model,as could describe the transition boundary and reflect the nature of speech much more exactly.Meanwhile,for a trade off between the model robustness and the distinction among modeling units,the targets obtained by forward backward algorithm were windowed.The experiments were carried out on Mandarin conversational speech recognition in the customer service domain. In the experiments,a small set of training data were used to verify the importance of the targets in the training and determine the parameter of the window length.Finally,the durations of training data were increased to 60,80 and 100 hours.The results showed that the proposed system achieved consistent improvements,and the relative character error rate reduction ranged from 1.10% to 3.65%.All of the experiments verified the effectiveness of the proposed target.
%K 语音识别 深度神经网络 前后向算法 目标值优化&lt
%K /br&gt
%K speechrecognition deepneuralnetwork forward-backwardalgorithm targetoptimization
%U http://jsuese.ijournals.cn/jsuese_cn/ch/reader/view_abstract.aspx?file_no=201500339&flag=1