%0 Journal Article %T 基于注意力LSTM和多任务学习的远场语音识别<br>Long short-term memory with attention and multitask learning for distant speech recognition %A 张宇 %A 张鹏远 %A 颜永红 %J 清华大学学报(自然科学版) %D 2018 %R 10.16511/j.cnki.qhdxxb.2018.25.016 %X 由于背景噪声、混响以及人声干扰等因素,远场语音识别任务一直充满挑战性。该文针对远场语音识别任务,提出基于注意力机制和多任务学习框架的长短时记忆递归神经网络(long short-term memory,LSTM)声学模型。模型中嵌入的注意力机制使其自动学习调整对扩展上下文特征输入的关注度,显著提升了模型对远场语音的建模能力。为进一步提高模型的鲁棒性,引入多任务学习框架,使其联合预测声学状态和干净特征。AMI数据集上的实验结果表明:与基线模型相比,引入注意力机制和多任务学习框架的LSTM模型获得了1.5%的绝对词错误率下降。<br>Abstract:Distant speech recognition remains a challenging task owning to background noise, reverberation, and competing acoustic sources. This work describes a long short-term memory (LSTM) based acoustic model with an attention mechanism and a multitask learning architecture for distant speech recognition. The attention mechanism is embedded in the acoustic model to automatically tune its attention to the spliced context input which significantly improves the ability to model distant speech. A multitask learning architecture, which is trained to predict the acoustic model states and the clean features, is used to further improve the robustness. Evaluations of the model on the AMI meeting corpus show that the model reduces word error rate (WER) by 1.5% over the baseline model. %K 语音识别 %K 长短时记忆 %K 声学模型 %K 注意力机制 %K 多任务学习 %K < %K br> %K speech recognition %K long short-term memory %K acoustic model %K attention mechanism %K multitask learning %U http://jst.tsinghuajournals.com/CN/Y2018/V58/I3/249