Search Results: 1 - 10 of 100 matches for " "
All listed articles are free for downloading (OA Articles)
Page 1 /100
Display every page Item
Technical Evaluation Report 4: Internet Audio Products  [cached]
Duane Weaver,Deborah Guspie,Nolan Cox,Jon Baggaley
International Review of Research in Open and Distance Learning , 2002,
Abstract: Online audio methods have evolved as a means of providing unlimited and inexpensive/free international audio communication. They are becoming popular in distance education (DE) as an alternative to the asynchronous conferencing methods (Report 3 in this series). Current types of Internet audio connectivity provide: (a) direct between individuals (Internet phone); (b) shared places or forums on the Internet where groups can meet (audio-conferencing); and (c) a variety of PC-to-PC and PC-to-phone methods. However, products differ in terms of lag time, delay, quality of voice, and stability of service and connection. The report compares current online audio packages in terms of their technical features and reliability.
Feature Fusion Based Audio-Visual Speaker Identification Using Hidden Markov Model under Different Lighting Variations  [PDF]
Md. Rabiul Islam,Md. Abdus Sobhan
Applied Computational Intelligence and Soft Computing , 2014, DOI: 10.1155/2014/831830
Abstract: The aim of the paper is to propose a feature fusion based Audio-Visual Speaker Identification (AVSI) system with varied conditions of illumination environments. Among the different fusion strategies, feature level fusion has been used for the proposed AVSI system where Hidden Markov Model (HMM) is used for learning and classification. Since the feature set contains richer information about the raw biometric data than any other levels, integration at feature level is expected to provide better authentication results. In this paper, both Mel Frequency Cepstral Coefficients (MFCCs) and Linear Prediction Cepstral Coefficients (LPCCs) are combined to get the audio feature vectors and Active Shape Model (ASM) based appearance and shape facial features are concatenated to take the visual feature vectors. These combined audio and visual features are used for the feature-fusion. To reduce the dimension of the audio and visual feature vectors, Principal Component Analysis (PCA) method is used. The VALID audio-visual database is used to measure the performance of the proposed system where four different illumination levels of lighting conditions are considered. Experimental results focus on the significance of the proposed audio-visual speaker identification system with various combinations of audio and visual features. 1. Introduction Human speaker identification is bimodal in nature [1, 2]. In a face-to-face conversation, we listen to what others say and at the same time observe their lip movements, facial expressions, and gestures. Especially, if we have a problem in listening due to environmental noise, the visual information plays an important role for speech understanding [3]. Even in the clean environment, speech recognition performance is improved when the talking face is visible [4]. Generally, it is true that audio-only speaker identification system is not sufficiently adequate to meet the variety of user requirements for person identification. The AVSI system promises to alleviate some of the drawbacks encountered by audio-only identification. Visual speech information can play an important role in the improvement of natural and robust human-computer interaction [5, 6]. Indeed, various important human-computer components, such as speaker identification, verification [7], localization [8], speech event detection [9], speech signal separation [10], coding [11], video indexing and retrieval [12], and text-to-speech [13], have been shown to benefit from the visual channel [14]. Audio-visual identification system can significantly improve the performance of
Dynamic Audio-Visual Client Recognition modelling  [PDF]
Tijjani Adam Shuwa, U. Hashim
International Journal of Computer Science and Security , 2011,
Abstract: This paper contains a report on an Visual Client Recognition System using Matlab software whichidentifies five clients and can be improved to identify as many clients as possible depending onthe number of clients it is trained to identify which was successfully implemented. Theimplementation was accomplished first by visual recognition system implemented using ThePrincipal Component Analysis, Linear Discriminant Analysis and Nearest Neighbor Classifier. Asuccessful implementation of second part was achieved by audio recognition using Mel-Frequency Cepstrum Coefficient, Linear Discriminant Analysis and Nearest Neighbor Classifierthe system was tested using images and sounds that have not been trained to the system to seewhether it can detect an intruder which lead us to a very successful result with précised responseto intruder and also explored another means implementing the visual recognition section using aNeural Network The work on visual recognition system was converted into a simulink block setwhich was then implemented in a Signal wave.
Multimodal Transfer Deep Learning for Audio Visual Recognition  [PDF]
Seungwhan Moon,Suyoun Kim,Haohan Wang
Computer Science , 2014,
Abstract: We propose a multimodal deep learning framework that can transfer the knowledge obtained from a single-modal neural network to a network with a different modality. For instance, we show that we can leverage the speech data to fine-tune the network trained for video recognition, given an initial set of audio-video parallel dataset within the same semantics. Our approach learns the analogy-preserving embeddings between the abstract representations learned from each network, allowing for semantics-level transfer or reconstruction of the data among different modalities. Our method is thus specifically useful when one of the modalities is more scarce in labeled data than other modalities. While we mainly focus on applying transfer learning on the audio-visual recognition task as an application of our approach, our framework is flexible and thus can work with any multimodal datasets. In this work-in-progress report, we show our preliminary results on the AV-Letters dataset.
Dynamic Bayesian Networks for Audio-Visual Speech Recognition  [cached]
Kevin Murphy,Xiaoxing Liu,Xiaobo Pi,Luhong Liang
EURASIP Journal on Advances in Signal Processing , 2002, DOI: 10.1155/s1687617202206083
Abstract: The use of visual features in audio-visual speech recognition (AVSR) is justified by both the speech generation mechanism, which is essentially bimodal in audio and visual representation, and by the need for features that are invariant to acoustic noise perturbation. As a result, current AVSR systems demonstrate significant accuracy improvements in environments affected by acoustic noise. In this paper, we describe the use of two statistical models for audio-visual integration, the coupled HMM (CHMM) and the factorial HMM (FHMM), and compare the performance of these models with the existing models used in speaker dependent audio-visual isolated word recognition. The statistical properties of both the CHMM and FHMM allow to model the state asynchrony of the audio and visual observation sequences while preserving their natural correlation over time. In our experiments, the CHMM performs best overall, outperforming all the existing models and the FHMM.
Audio-Visual Authentication System over the Internet Protocol
Yee Wan Wong,Kah Phooi Seng,Li Minn Ang
Lecture Notes in Engineering and Computer Science , 2009,
Significance Of Audio-visual Aids In Teaching English
Shivkumar Rautrao
Indian Streams Research Journal , 2012,
Abstract: Audio-visual aids are instructional materials and devices through which teaching and learning are done in colleges more effectively. Examples of learning aids include visual aids, audio-visual aids, real objects and many others. The visual aids are designed materials that may be locally made or commercially produced. They come in form of wall-charts illustrated pictures, pictorial materials and other two dimensional objects. There are also audio-visual aids. These are teaching machines like radio, television, mobile and all sorts of projectors with sound attributes. Audio-visual aids can change the teacher-learning situation if various types of visual aids are employed in teaching English. They may be described as aids that facilitate the understanding of the written and the spoken words in teaching-learning situation. The use of audio-visual aids would upgrade the teaching of English and give students with learning experiences in active participation in all phases of learning activities. This research paper designed to show how tools and techniques enhance teaching English language more effectively.
Technical Evaluation Report 31: Internet Audio Products (3/ 3)
Linda Schwartz,Adrienne de Schutter,Patricia Fahrni,Jim Rudolph
International Review of Research in Open and Distance Learning , 2004,
Abstract: Two contrasting additions to the online audio market are reviewed: iVocalize, a browser-based audio-conferencing software, and Skype, a PC-to-PC Internet telephone tool. These products are selected for review on the basis of their success in gaining rapid popular attention and usage during 2003-04. The iVocalize review emphasizes the product’s role in the development of a series of successful online audio communities – notably several serving visually impaired users. The Skype review stresses the ease with which the product may be used for simultaneous PC-to-PC communication among up to five users. Editor’s Note: This paper serves as an introduction to reports about online community building, and reviews of online products for disabled persons, in the next ten reports in this series. JPB, Series Ed.
Audio-Visual Speech Recognition Using MPEG-4 Compliant Visual Features  [cached]
Aggelos K. Katsaggelos,Zhilin Wu,Jay J. Williams,Petar S. Aleksic
EURASIP Journal on Advances in Signal Processing , 2002, DOI: 10.1155/s1687617202206162
Abstract: We describe an audio-visual automatic continuous speech recognition system, which significantly improves speech recognition performance over a wide range of acoustic noise levels, as well as under clean audio conditions. The system utilizes facial animation parameters (FAPs) supported by the MPEG-4 standard for the visual representation of speech. We also describe a robust and automatic algorithm we have developed to extract FAPs from visual data, which does not require hand labeling or extensive training procedures. The principal component analysis (PCA) was performed on the FAPs in order to decrease the dimensionality of the visual feature vectors, and the derived projection weights were used as visual features in the audio-visual automatic speech recognition (ASR) experiments. Both single-stream and multistream hidden Markov models (HMMs) were used to model the ASR system, integrate audio and visual information, and perform a relatively large vocabulary (approximately 1000 words) speech recognition experiments. The experiments performed use clean audio data and audio data corrupted by stationary white Gaussian noise at various SNRs. The proposed system reduces the word error rate (WER) by 20% to 23% relatively to audio-only speech recognition WERs, at various SNRs (0 ¢ € “30 dB) with additive white Gaussian noise, and by 19% relatively to audio-only speech recognition WER under clean audio conditions.
Deo Brat Ojha
Journal of Global Research in Computer Science , 2011,
Abstract: Human beings are always in search of process through which transmission of audio/visual content will become authentic, secure, speedy, compact, integrated and error free between two communicators. In this paper, the requirement specially to obtain the error free message with the help of error correction function with qualities above said.
Page 1 /100
Display every page Item

Copyright © 2008-2017 Open Access Library. All rights reserved.