Search Results: 1 - 10 of 100 matches for " "
All listed articles are free for downloading (OA Articles)
Page 1 /100
Display every page Item
A Robust Front-End Processor combining Mel Frequency Cepstral Coefficient and Sub-band Spectral Centroid Histogram methods for Automatic Speech Recognition  [PDF]
R. Thangarajan,A.M. Natarajan
International Journal of Signal Processing, Image Processing and Pattern Recognition , 2009,
Abstract: Environmental robustness is an important area of research in speech recognition. Mismatch between trained speech models and actual speech to be recognized is due to factors like background noise. It can cause severe degradation in the accuracy of recognizers whichare based on commonly used features like mel-frequency cepstral co-efficient (MFCC) and linear predictive coding (LPC). It is well understood that all previous auditory based feature extraction methods perform extremely well in terms of robustness due to the dominantfrequency information present in them. But these methods suffer from high computational cost. Another method called sub-band spectral centroid histograms (SSCH) integrates dominant-frequency information with sub-band power information. This method is based onsub-band spectral centroids (SSC) which are closely related to spectral peaks for both clean and noisy speech. Since SSC can be computed efficiently from short-term speech power spectrum estimate, SSCH method is quite robust to background additive noise at a lowercomputational cost. It has been noted that MFCC method outperforms SSCH method in the case of clean speech. However in the case of speech with additive noise, MFCC method degrades substantially. In this paper, both MFCC and SSCH feature extraction have beenimplemented in Carnegie Melon University (CMU) Sphinx 4.0 and trained and tested on AN4 database for clean and noisy speech. Finally, a robust speech recognizer which automatically employs either MFCC or SSCH feature extraction methods based on the variance of shortterm power of the input utterance is suggested.
Parameter Compensation for Mel-LP based Noisy Speech Recognition
Md. Mahfuzur Rahman,Md. Robiul Hoque,M. Babul Islam
Research Journal of Information Technology , 2012,
Abstract: This study deals with a noise robust distributed speech recognizer for real-world applications by deploying feature parameter compensation technique. To realize this objective, Mel-LP based speech analysis has been used in speech coding on the linear frequency scale by applying a first-order all-pass filter instead of a unit delay. To minimize the mismatch between training and test phases, Cepstral Mean Normalization (CMN) and Blind Equalization (BEQ) have been applied to enhance Mel-LP cepstral coefficients as an effort to reduce the effect of additive noise and channel distortion. The performance of the proposed system has been evaluated on Aurora-2 database which is a subset of TIDigits database contaminated by additive noises and channel effects. The baseline performance, that is, for Mel-LPC the average word accuracy for test set A has found to be 59.05%. By applying the CMN and BEQ with the Mel-LP cepstral coefficients, the performance has been improved to 68.02 and 65.65%, respectively.
Comparative Study Of Mfcc And Lpc For Marathi Isolated Word Recognition System
International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering , 2013,
Abstract: This Paper presents Marathi database and isolated word recognition system using Mel-frequency cepstrum coefficients (MFCC) and vector quantization (VQ) technique. It also compares the performances of MFCC and LPC features under VQ environment. Marathi speech database is recorded in noisy environment aiming language learning tool as an application. The database consists of simple Marathi words starting with both vowels and consonants. Each word has been repeated 10 times by one male and one female speaker. This paper presents comparative plots of MFCC and LPC features.
On the Use of Different Feature Extraction Methods for Linear and Non Linear kernels  [PDF]
Imen Trabelsi,Dorra Ben Ayed
Computer Science , 2014,
Abstract: The speech feature extraction has been a key focus in robust speech recognition research; it significantly affects the recognition performance. In this paper, we first study a set of different features extraction methods such as linear predictive coding (LPC), mel frequency cepstral coefficient (MFCC) and perceptual linear prediction (PLP) with several features normalization techniques like rasta filtering and cepstral mean subtraction (CMS). Based on this, a comparative evaluation of these features is performed on the task of text independent speaker identification using a combination between gaussian mixture models (GMM) and linear and non-linear kernels based on support vector machine (SVM).
Development an Automatic Speech to Facial Animation Conversion for Improve Deaf Lives  [cached]
S. Hamidreza Kasaei,S. Mohammadreza Kasaei,S. Alireza Kasaei
Brain. Broad Research in Artificial Intelligence and Neuroscience , 2011,
Abstract: In this paper, we propose design and initial implementation of a robust system which can automatically translates voice into text and text to sign language animations. Sign Language Translation Systems could significantly improve deaf lives especially in communications, exchange of information and employment of machine for translation conversations from one language to another has. Therefore, considering these points, it seems necessary to study the speech recognition. Usually, the voice recognition algorithms address three major challenges. The first is extracting feature form speech and the second is when limited sound gallery are available for recognition, and the final challenge is to improve speaker dependent to speaker independent voice recognition. Extracting feature form speech is an important stage in our method. Different procedures are available for extracting feature form speech. One of the commonest of which used in speech recognition systems is Mel-Frequency Cepstral Coefficients (MFCCs). The algorithm starts with preprocessing and signal conditioning. Next extracting feature form speech using Cepstral coefficients will be done. Then the result of this process sends to segmentation part. Finally recognition part recognizes the words and then converting word recognized to facial animation. The project is still in progress and some new interesting methods are described in the current report.
Feature Extraction Using LPC-Residual and MelFrequency Cepstral Coefficients in Forensic Speaker Recognition  [PDF]
Jose B. Trangol Curipe,Abel Herrera Camacho
International Journal of Computer and Electrical Engineering , 2013, DOI: 10.7763/ijcee.2013.v5.658
Abstract: In this paper, we investigated the form toimprove the performance in the recognition, involved at theforensic area. To improve, we use Linear Predictive Coding(LPC) and it residual; it was compared with Mel Frequency Cepstral Coefficients (MFCC). The classification techniquewas Gaussian Mixture Model (GMM). The collection data is inSpanish language, using spontaneous speech from 37 malespeaker of Mexican Spanish, we have three recordings,between each recording exist 3 week and one month ofseparation respectively, this allows us in real condition work,we use non contemporaneous recording and scarcity of data totraining and testing the performance of the forensicrecognition task. Two conclusions can be drawn from theresults, the first, MFCC has better performance with longrecording, LPC-residual has better performance with shortrecording.
Improved MFCC Feature Extraction Combining Symmetric ICA Algorithm for Robust Speech Recognition  [cached]
Huan Zhao,Kai Zhao,He Liu,Fei Yu
Journal of Multimedia , 2012, DOI: 10.4304/jmm.7.1.74-81
Abstract: Independent component analysis (ICA), instead of the traditional discrete cosine transform (DCT), is often used to project log Mel spectrum in robust speech feature extraction. The paper proposed using symmetric orthogonalization in ICA for projecting log Mel spectrum into a new feature space as a substitute in extracting speech features to solve the problem of cumulative error and unequal weights that deflation orthogonalization brings, so as to improve the robustness of speech recognition systems, and increase the efficiency of estimation at the same time. Furthermore, the paper studied the nonlinearities of the objective function in ICA and their coefficients, tested them in all kinds of environments, finding that they influenced the recognition rate greatly in speech recognition systems, and applied a new coefficient in the proposed method. Experiments based on HMM and Aurora-2 speech corpus suggested that the new method was superior to deflation-based ICA and MFCC.
Modified Mel Filter Bank to Compute MFCC of Subsampled Speech  [PDF]
Kiran Kumar Bhuvanagiri,Sunil Kumar Kopparapu
Computer Science , 2014,
Abstract: Mel Frequency Cepstral Coefficients (MFCCs) are the most popularly used speech features in most speech and speaker recognition applications. In this work, we propose a modified Mel filter bank to extract MFCCs from subsampled speech. We also propose a stronger metric which effectively captures the correlation between MFCCs of original speech and MFCC of resampled speech. It is found that the proposed method of filter bank construction performs distinguishably well and gives recognition performance on resampled speech close to recognition accuracies on original speech.
PCA-Based Speech Enhancement for Distorted Speech Recognition  [cached]
Tetsuya Takiguchi,Yasuo Ariki
Journal of Multimedia , 2007, DOI: 10.4304/jmm.2.5.13-18
Abstract: We investigated a robust speech feature extraction method using kernel PCA (Principal Component Analysis) for distorted speech recognition. Kernel PCA has been suggested for various image processing tasks requiring an image model, such as denoising, where a noise-free image is constructed from a noisy input image. Much research for robust speech feature extraction has been done, but it remains difficult to completely remove additive or convolution noise (distortion). The most commonly used noise-removal techniques are based on the spectraldomain operation, and then for speech recognition, the MFCC (Mel Frequency Cepstral Coefficient) is computed, where DCT (Discrete Cosine Transform) is applied to the mel-scale filter bank output. This paper describes a new PCA-based speech enhancement algorithm using kernel PCA instead of DCT, where the main speech element is projected onto low-order features, while the noise or distortion element is projected onto high-order features. Its effectiveness is confirmed by word recognition experiments on distorted speech.
Omesh Wadhwani
Journal of Global Research in Computer Science , 2011,
Abstract: Vernacular language spoken in various countries creates a limitation on software associated with speech recognition. This paper is an attempt to overcome such problem. The suggested work makes use of Linear Predictive Technique for better interpretation of spoken words. The rule based structure of fuzzy suits very well with closeness of vernacular speech recognition. In this paper we study the feasibility of Speech Recognition with fuzzy neural Networks for discrete Words Different Technical methods are used for speech recognition. Most of these methods are based on transfiguration of the speech signals for phonemes and syllables of the words. We use the expression "word Recognition" (because in our proposed method there is no need to catch the phonemes of words.). In our proposed method, LPC coefficients for discrete spoken words are used for compaction and learning the data and then the output is sent to a fuzzy system and an expert system for classifying the conclusion. The experimental results show good precisions. The recognition precision of our proposed method with fuzzy conclusion is around 90 percent. Keywords: Vernacular, Words Recognition, Linear Predictive Coding, Feature Extraction, Automatic Speech Recognition, LPC Coefficients, Word error rate
Page 1 /100
Display every page Item

Copyright © 2008-2017 Open Access Library. All rights reserved.