oalib
Search Results: 1 - 10 of 100 matches for " "
All listed articles are free for downloading (OA Articles)
Page 1 /100
Display every page Item
The Effect of Listener Accent Background on Accent Perception and Comprehension  [cached]
Ayako Ikeno,John H. L. Hansen
EURASIP Journal on Audio, Speech, and Music Processing , 2007, DOI: 10.1155/2007/76030
Abstract: Variability of speaker accent is a challenge for effective human communication as well as speech technology including automatic speech recognition and accent identification. The motivation of this study is to contribute to a deeper understanding of accent variation across speakers from a cognitive perspective. The goal is to provide perceptual assessment of accent variation in native and English. The main focus is to investigate how listener's accent background affects accent perception and comprehensibility. The results from perceptual experiments show that the listeners' accent background impacts their ability to categorize accents. Speaker accent type affects perceptual accent classification. The interaction between listener accent background and speaker accent type is significant for both accent perception and speech comprehension. In addition, the results indicate that the comprehensibility of the speech contributes to accent perception. The outcomes point to the complex nature of accent perception, and provide a foundation for further investigation on the involvement of cognitive processing for accent perception. These findings contribute to a richer understanding of the cognitive aspects of accent variation, and its application for speech technology.
The Effect of Listener Accent Background on Accent Perception and Comprehension  [cached]
Ikeno Ayako,Hansen John HL
EURASIP Journal on Audio, Speech, and Music Processing , 2007,
Abstract: Variability of speaker accent is a challenge for effective human communication as well as speech technology including automatic speech recognition and accent identification. The motivation of this study is to contribute to a deeper understanding of accent variation across speakers from a cognitive perspective. The goal is to provide perceptual assessment of accent variation in native and English. The main focus is to investigate how listener's accent background affects accent perception and comprehensibility. The results from perceptual experiments show that the listeners' accent background impacts their ability to categorize accents. Speaker accent type affects perceptual accent classification. The interaction between listener accent background and speaker accent type is significant for both accent perception and speech comprehension. In addition, the results indicate that the comprehensibility of the speech contributes to accent perception. The outcomes point to the complex nature of accent perception, and provide a foundation for further investigation on the involvement of cognitive processing for accent perception. These findings contribute to a richer understanding of the cognitive aspects of accent variation, and its application for speech technology.
FPGA Implementation for GMM-Based Speaker Identification  [PDF]
Phaklen EhKan,Timothy Allen,Steven F. Quigley
International Journal of Reconfigurable Computing , 2011, DOI: 10.1155/2011/420369
Abstract: In today's society, highly accurate personal identification systems are required. Passwords or pin numbers can be forgotten or forged and are no longer considered to offer a high level of security. The use of biological features, biometrics, is becoming widely accepted as the next level for security systems. Biometric-based speaker identification is a method of identifying persons from their voice. Speaker-specific characteristics exist in speech signals due to different speakers having different resonances of the vocal tract. These differences can be exploited by extracting feature vectors such as Mel-Frequency Cepstral Coefficients (MFCCs) from the speech signal. A well-known statistical modelling process, the Gaussian Mixture Model (GMM), then models the distribution of each speaker's MFCCs in a multidimensional acoustic space. The GMM-based speaker identification system has features that make it promising for hardware acceleration. This paper describes the hardware implementation for classification of a text-independent GMM-based speaker identification system. The aim was to produce a system that can perform simultaneous identification of large numbers of voice streams in real time. This has important potential applications in security and in automated call centre applications. A speedup factor of ninety was achieved compared to a software implementation on a standard PC. 1. Introduction Speaker recognition is an important branch of speech processing. It is the process of automatically recognizing who is speaking by using speaker-specific information included in the speech waveform. It is receiving increasing attention due to its practical value and has applications ranging from police work to automation of call centers. Speaker recognition can be classified into speaker identification (discovering identity) and speaker verification (authenticating a claim of identity). A closed-set speaker identification system selects the speaker in the training set who best matches the unknown speaker. Open-set speaker identification allows for the possibility that the unknown speaker may not exist in the training set; thus, an additional decision alternative is required for the unknown speaker who does not match any of the models in the training set [1]. Reconfigurable computing systems use reconfigurable hardware to augment a CPU-based system. The application is decomposed into parts running on the CPU and parts running on the reconfigurable hardware, which is used to form a custom hardware accelerator for the portions of the algorithm that are capable of
Speaker Identification Using MFCC-Domain Support Vector Machine
S.M. Kamruzzaman,A.N.M. Rezaul Karim,Saiful Islam,Emdadul Haque
International Journal of Electrical and Power Engineering , 2012,
Abstract: Speech recognition and speaker identification are important for authentication and verification in security purpose, but they are difficult to achieve. Speaker identification methods can be divided into text-independent and text-dependent. This study presents a technique of text-dependent speaker identification using MFCC-domain Support Vector Machine (SVM). In this research, Mel-Frequency Cepstrum Coefficients (MFCCs) and their statistical distribution properties are used as features, which will be inputs to the neural network. This research firstly used Sequential Minimum Optimization (SMO) learning technique for SVM that improve performance over traditional techniques Chunking, Osuna. The cepstrum coefficients representing the speaker characteristics of a speech segment are computed by nonlinear filter bank analysis and discrete cosine transform. The speaker identification ability and convergence speed of the SVMs are investigated for different combinations of features. Extensive experimental results on several samples show the effectiveness of the proposed approach.
Speaker Identification using MFCC-Domain Support Vector Machine  [PDF]
S. M. Kamruzzaman,A. N. M. Rezaul Karim,Md. Saiful Islam,Md. Emdadul Haque
Computer Science , 2010, DOI: 10.3923/ijepe.2007.274.278
Abstract: Speech recognition and speaker identification are important for authentication and verification in security purpose, but they are difficult to achieve. Speaker identification methods can be divided into text-independent and text-dependent. This paper presents a technique of text-dependent speaker identification using MFCC-domain support vector machine (SVM). In this work, melfrequency cepstrum coefficients (MFCCs) and their statistical distribution properties are used as features, which will be inputs to the neural network. This work firstly used sequential minimum optimization (SMO) learning technique for SVM that improve performance over traditional techniques Chunking, Osuna. The cepstrum coefficients representing the speaker characteristics of a speech segment are computed by nonlinear filter bank analysis and discrete cosine transform. The speaker identification ability and convergence speed of the SVMs are investigated for different combinations of features. Extensive experimental results on several samples show the effectiveness of the proposed approach.
Robust Support Vector Machines for Speaker Verification Task  [PDF]
Kawthar Yasmine Zergat,Abderrahmane Amrouche
International Journal of Computer Science Issues , 2013,
Abstract: An important step in speaker verification is extracting features that best characterize the speaker voice. This paper investigates a front-end processing that aims at improving the performance of speaker verification based on the SVMs classifier, in text independent mode. This approach combines features based on conventional Mel-cepstral Coefficients (MFCCs)and Line Spectral Frequencies (LSFs) to constitute robust multivariate feature vectors. To reduce the high dimensionality required for training these feature vectors, we use a dimension reduction method called principal component analysis (PCA). In order to evaluate the robustness of these systems, different noisy environments have been used. The obtained results using TIMIT database showed that, using the paradigm that combines these spectral cues leads to a significant improvement in verification accuracy, especially with PCA reduction for low signal-to-noise ratio noisy environment.
Robust Support Vector Machines for Speaker Verification Task  [PDF]
Kawthar Yasmine Zergat,Abderrahmane Amrouche
Computer Science , 2013,
Abstract: An important step in speaker verification is extracting features that best characterize the speaker voice. This paper investigates a front-end processing that aims at improving the performance of speaker verification based on the SVMs classifier, in text independent mode. This approach combines features based on conventional Mel-cepstral Coefficients (MFCCs) and Line Spectral Frequencies (LSFs) to constitute robust multivariate feature vectors. To reduce the high dimensionality required for training these feature vectors, we use a dimension reduction method called principal component analysis (PCA). In order to evaluate the robustness of these systems, different noisy environments have been used. The obtained results using TIMIT database showed that, using the paradigm that combines these spectral cues leads to a significant improvement in verification accuracy, especially with PCA reduction for low signal-to-noise ratio noisy environment.
Speaker Identification in Network Environment  [PDF]
Dr.V. RADHA
International Journal of Engineering Science and Technology , 2010,
Abstract: The main objective of this paper is to identify a speaker in a networking environment. Automatic speaker identification is a fundamental task in speech processing. The proposed work is conducted in three phases, (1) Speech detection, (2) Clustering and (3) Speaker Identification. The process of capturing speech from a remote computer in an intranet environment involves recording huge volume of data or frames of speech. This is done by recording continuously the sound signals from the remote computer. The unwanted data in the recorded speech data is removed and the resultant data is taken to identification. MFCC technique is used for feature extraction and vector quantization is used for quantizing the features identified. To reduce the search space while performing the matching process between the speaker models and the input signal, a modified K-Means algorithm is used. After identifying the cluster that closely matches the input data, the present research work uses an innovative multi-step approach involving four tests for speaker identification. They are, cross correlation, frequency multiplication, frequency crosscorrelation and peak signal comparison. The Neyman-Pearson likelihood ratio test was used to combine the result of the four tests. The result of these tests ensures that the correct speaker is identified. The proposed system reduced computation time by 50-90% without affecting the identification process.
Wavelet-Based Mel-Frequency Cepstral Coefficients for Speaker Identification using Hidden Markov Models  [PDF]
Mahmoud I. Abdalla,Hanaa S. Ali
Computer Science , 2010,
Abstract: To improve the performance of speaker identification systems, an effective and robust method is proposed to extract speech features, capable of operating in noisy environment. Based on the time-frequency multi-resolution property of wavelet transform, the input speech signal is decomposed into various frequency channels. For capturing the characteristic of the signal, the Mel-Frequency Cepstral Coefficients (MFCCs) of the wavelet channels are calculated. Hidden Markov Models (HMMs) were used for the recognition stage as they give better recognition for the speaker's features than Dynamic Time Warping (DTW). Comparison of the proposed approach with the MFCCs conventional feature extraction method shows that the proposed method not only effectively reduces the influence of noise, but also improves recognition. A recognition rate of 99.3% was obtained using the proposed feature extraction technique compared to 98.7% using the MFCCs. When the test patterns were corrupted by additive white Gaussian noise with 20 dB S/N ratio, the recognition rate was 97.3% using the proposed method compared to 93.3% using the MFCCs.
A FORGOTTEN ACCENT
Dr. Nicolae GEORGESCU
Diversité et Identité Culturelle en Europe (DICE) , 2010,
Abstract: Unlike its European sisters, French, Italian or Spanish, the Romanian language remains the only Romance language in which the graphic accent is not marked. Of course there are studies attempting to establish accentuation rules on a series of words - but the rules are few, the exceptions many - and the words of a language ... are very, very many, of the hundreds of thousands order. Romanian remains in the situation of English or Russian, where the accent is a matter of habit - or it can simply shift out of the speakers’ desire / ignorance. We are talking about Romanian words which have different meanings according to where we put the accent. As it concerns the verse 84 from Epigonii ("Epigones') by M. Eminescu, where the Present form /voi/ merge i is considered by certain editors as an Imperfect form: /voi/ mergea i, we compare the Imperfect forms stressed by M. Eminescu himself with the same forms unstressed by the poet, and we conclude that the poet stressed only under rhythm, with a prosodic aim, generating what he called "the ethic accent", i.e. the word is not stressed so as to underline its relevance in the context. To sum up, the accentual forms must be kept different from the unstressed ones, as it is the author's personal writing system, which has its poetic meaning, and must be understood.
Page 1 /100
Display every page Item


Home
Copyright © 2008-2017 Open Access Library. All rights reserved.