%0 Journal Article %T A Very Fast Decision Tree Algorithm for Real-Time Data Mining of Imperfect Data Streams in a Distributed Wireless Sensor Network %A Hang Yang %A Simon Fong %A Guangmin Sun %A Raymond Wong %J International Journal of Distributed Sensor Networks %D 2012 %I Hindawi Publishing Corporation %R 10.1155/2012/863545 %X Wireless sensor networks (WSNs) are a rapidly emerging technology with a great potential in many ubiquitous applications. Although these sensors can be inexpensive, they are often relatively unreliable when deployed in harsh environments characterized by a vast amount of noisy and uncertain data, such as urban traffic control, earthquake zones, and battlefields. The data gathered by distributed sensors¡ªwhich serve as the eyes and ears of the system¡ªare delivered to a decision center or a gateway sensor node that interprets situational information from the data streams. Although many other machine learning techniques have been extensively studied, real-time data mining of high-speed and nonstationary data streams represents one of the most promising WSN solutions. This paper proposes a novel stream mining algorithm with a programmable mechanism for handling missing data. Experimental results from both synthetic and real-life data show that the new model is superior to standard algorithms. 1. Introduction It is anticipated that wireless sensor networks (WSNs) will enable the technology of today to be employed in future applications ranging from tracking, monitoring, and spying systems to various other technologies likely to improve aspects of everyday life. WSNs offer an inexpensive way to collect data over a distributed environment that may be harsh in nature, such as biochemical contamination sites, seismic zones, and terrain subject to extreme weather or battlegrounds. The sensors employed in WSNs¡ªwhich are miniatures embedded computing devices¡ªcontinue to produce large volumes of streaming data obtained from their environment until the end of their lifetime. It is known that when the battery power in such sensors is exhausted, the likelihood of erroneous data being generated will grow rapidly [1]. Both uncertain environmental factors and the low cost of the sensors may contribute to an intermittent transmission loss and inaccurate measurement. Even when they seldom occur, errors and noises in data streams sensed by a large number of sensors may be misinterpreted as outliers; they frequently trigger false alarms that might either lead to undesirable consequences in critical applications or reduce measurement sensitivity. Data classification is a popular data mining technique used to determine predefined classes (verdicts) to which unseen data freshly obtained from a WSN map, thereby providing situational information about current events in an environment covered by a dense network of sensors. At the core of the classification technique is a decision %U http://www.hindawi.com/journals/ijdsn/2012/863545/