%0 Journal Article
%T Environmental Sound Recognition Using Time-Frequency Intersection Patterns
%A Xuan Guo
%A Yoshiyuki Toyoda
%A Huankang Li
%A Jie Huang
%A Shuxue Ding
%A Yong Liu
%J Applied Computational Intelligence and Soft Computing
%D 2012
%I Hindawi Publishing Corporation
%R 10.1155/2012/650818
%X Environmental sound recognition is an important function of robots and intelligent computer systems. In this research, we use a multistage perceptron neural network system for environmental sound recognition. The input data is a combination of time-variance pattern of instantaneous powers and frequency-variance pattern with instantaneous spectrum at the power peak, referred to as a time-frequency intersection pattern. Spectra of many environmental sounds change more slowly than those of speech or voice, so the intersectional time-frequency pattern will preserve the major features of environmental sounds but with drastically reduced data requirements. Two experiments were conducted using an original database and an open database created by the RWCP project. The recognition rate for 20 kinds of environmental sounds was 92%. The recognition rate of the new method was about 12% higher than methods using only an instantaneous spectrum. The results are also comparable with HMM-based methods, although those methods need to treat the time variance of an input vector series with more complicated computations. 1. Introduction Understanding environmental sounds is an essential function of human hearing. For example, people can recognize the beginning of a rain shower by the rain sound, be cautious when they hear footsteps coming from behind at night, and open the door to welcome visitors after the sound of the door-knocking. Environmental sound recognition is also important for intelligent robots and computer systems. An intelligent robot can be aware of the environments by the audition and use its hearing function to complement its vision [1]. In recent years, environmental sound recognition has received increasing attention, and we have seen some pioneering research in this field. An environmental sound database (RWCP-DB) has been created for research use [2]. The sounds in the database were recorded in an anechoic environment with durations of 250 to 500？ms. In total, there are 105 instances, with each instance including 100 samples. We reclassified this database into 12 types and 45 kinds as listed in Table 1. For many sounds, there are multiple instances with similar but different materials. Table 1: The RWCP environmental sound database. An environmental sound recognition method using the instantaneous spectrum at the power peak was proposed [3]. It was reported that the rate of recognition was about 80% for 20 instances of environmental sounds. In this research, the target sounds are limited to impact sounds that have a single power peak followed by
%U http://www.hindawi.com/journals/acisc/2012/650818/