|
Noise Suppression in Tele-Lectures using Bi-Modal Feature ExtractionKeywords: ASR , Audio-visual automatic speech recognition , Feature extraction , Multi-stream HMM Abstract: Automatic Speech Recognition (ASR) is an essential componentin many Human-Computer Interaction systems. A variety ofapplications in the field of ASR have reached high performancelevels but only for condition-controlled environments. In thisproject, we reduce the noise in the video lectures using bi-modalfeature extraction. Audio signal features need to be enhancedwith additional sources of complementary information toovercome problems due to large amounts of acoustic noise.Visual Information extracted from speaker’s mouth region seemsto be promising and appropriate for giving audio-onlyrecognition a boost. Lip/Mouth detection and tracking combinedwith traditional Image Processing methods may offer a variety ofsolutions for the construction of the visual front-end schema.Furthermore, Audio and Visual stream fusion appears to be evenmore challenging and crucial for designing an efficient AVRecognizer. In this project, we investigate some problems in thefield of Audio-Visual Automatic Speech Recognition (AV-ASR)concerning visual feature extraction and audio-visual integrationto reduce noise in the video lectures.
|