%0 Journal Article
%T 基于Kinect辅助的机器人带噪语音识别&lt;br&gt;Automatic speech recognition by a Kinect sensor for a robot under ego noises
%A 王建荣
%A 高永春
%A 张句
%A 魏建国
%A 党建武
%J 清华大学学报（自然科学版）
%D 2017
%R 10.16511/j.cnki.qhdxxb.2017.26.041
%X 音视频信息融合可以提升机器人在噪声环境下的语音识别性能。然而受说话者的头部旋转、唇部尺寸不一、距摄像头距离不固定以及光照等因素影响，唇部信息不能得到有效的全面表征。该文提出融合机器人与Kinect的多模态系统。该系统采用Kinect获取3-D数据和视觉信息，并使用3-D数据重构侧唇来补充音视频信息。一系列基于特征融合和决策融合方法的结果表明：该文提出的多模态系统优于基于音视频单流和双流的语音识别系统，能够辅助机器人在自身噪声环境下的语音识别。&lt;br&gt;Abstract：Audio-visual integration can effectively improve automatic speech recognition for robots under ego noises. However, head rotations lips movement differences, camera-subject distance and lighting variations degrade the automatic speech recognition (ASR) accuracy. This paper describes robot with a Kinect sensor in a multi-modal system. The Kinect provides 3-D data and visual information. The lip profiles are rebuilt using the 3-D data to get more accurate information from the video. Different fusion methods were investigated to incorporate the available multimodal information. Tests under ego noises of the robot demonstrate that the multi-modal system is superior to traditional automatic audio and audio-visual speech recognition with improved speech recognition robustness.
%K 仿人机器人
%K 自身噪声
%K 自动语音识别
%K Kinect
%K 多模态系统
%K &lt
%K br&gt
%K humanoid robot
%K ego noises
%K automatic speech recognition
%K Kinect multi-sensor
%K multi-modal system
%U http://jst.tsinghuajournals.com/CN/Y2017/V57/I9/921