Data streams are continuous and always keep evolving in nature. Because
of these reasons it becomes difficult to handle such data with simple and
static strategies. Data stream poses four main challenges to researchers. These
are infinite length, concept-evolution, concept-drift and feature evolution.
Infinite-length is because of the amount of data having no bounds.
Concept-drift is due to slow changes in the concept of stream. Concept-evolution
occurs due to presence of unknown classes in data. Feature-evolution is because
of new features continuously keeping appearing in the stream and older ones
start disappearing. For performing any analysis on such data we first need to
convert it into some knowledgeable form and also need to handle the above mentioned
challenges. Various strategies have been proposed to tackle these difficulties.
But most of them focus on handling the problem of infinite-length and
concept-drift. In this paper, we make efforts to propose a string based
strategy to handle infinite-length, concept-evolution and concept-drift.
Cite this paper
Singh, R. and Chandak, M. B. (2015). Classification and Novel Class Detection in Data Streams Using Strings. Open Access Library Journal, 2, e1507. doi: http://dx.doi.org/10.4236/oalib.1101507.
Aggarwal, C.C., Han, J., Wang, J. and
Yu, P.S. (2006) A Framework for On-Demand Classification of Evolving Data Streams. IEEE Transactions on Knowledge and Data
Engineering, 18, 577-589. http://dx.doi.org/10.1109/TKDE.2006.69
Yang, Y., Wu, X. and
Zhu, X. (2005) Combining Proactive and Reactive Predictions for Data Streams. In: Proceedings of the 11th ACM SIGKDD International Conference on
Knowledge Discovery in Data Mining,ACM,
New York 710- 715.
Spinosa,
E.J., de Leon F. de Carvalho, A.P. and Gama, J. (2008) Cluster-Based
Novel Concept Detection in Data streams Applied to Intrusion Detection in Computer
Networks. In: Proceedings of the 2008 ACM Symposium on Applied
Computing, ACM,
New York, 976-980.
Masud,
M.M., Gao, J., Khan, L., Han, J. and Thuraisingham, B.M. (2009) Integrating Novel Class Detection with Classification for
Concept-Drifting Data Streams. Proceedings
of the European Conference on Machine Learning and Knowledge Discovery
in Databases (ECML PKDD), 79-94.
Masud, M.M., Chen, Q., Gao, J., Khan, L., Han, J. and Thuraisingham, B.M.
(2010) Classification and Novel Class
Detection of DataStreams in a Dynamic Feature Space. Lecture Notes in Computer Science, 6322, 337-352. http://dx.doi.org/10.1007/978-3-642-15883-4_22
Masud,
M.M., Chen, Q., Khan, L., Aggarwal, C., Gao, J., Han, J. and Thuraisingham, B.M. (2010) Addressing
Concept-Evolution in Concept-Drifting Data Streams. Proceedings
of the IEEE International Conference on Data Mining (ICDM), 929-934.
Spinosa,
E.J., de Leon F.de Carvalho, A.P. and Gama, J. (2007)
OLINDDA: A Cluster Based Approach for Detecting
Novelty and Concept-Drift in Data Stream. In: Proceedings of the 2007 ACM Symposium on Applied Computing, ACM, New York, 448-452.
Wenerstrom, B. and
Giraud-Carrier, C. (2006) Temporal Data Mining in Dynamic Feature Spaces. Sixth International Conference onData Mining (ICDM), Hong Kong, 18-22 December 2006, 1141-1145. http://dx.doi.org/10.1109/ICDM.2006.157
Masud,
M.M., Gao, J., Khan, L., Han, J. and Thuraisingham, B.M. (2011) Classification and Novel Class Detection in Concept-Drifting Data Streams
under Time Constraints. IEEE Transactions on
Knowledge and Data Engineering, 23,
859-874.
Masud, M.M., Gao, J., Khan, L., Han, J.
and Thuraisingham, B.M. (2013) Classification and Novel Class
Detection in Feature Based Stream Data. IEEE Transactions on Knowledge and Data Engineering, 25, No. 7.
Bopche, A., Nagle, M. and Gupta, H. (2014) A Review of Method of Stream Data Classification through Optimized
Feature Evolution Process. InternationalJournal of Engineering and Computer Science, 3, 3778-3783.