全部 标题 作者
关键词 摘要

OALib Journal期刊
ISSN: 2333-9721
费用:99美元

查看量下载量

相关文章

更多...
Sensors  2013 

Seeding and Harvest: A Framework for Unsupervised Feature Selection Problems

DOI: 10.3390/s130100292

Keywords: feature selection, seeding and harvest, noise injection

Full-Text   Cite this paper   Add to My Lib

Abstract:

Feature selection, also known as attribute selection, is the technique of selecting a subset of relevant features for building robust object models. It is becoming more and more important for large-scale sensors applications with AI capabilities. The core idea of this paper is derived from a straightforward and intuitive principle saying that, if a feature subset (pattern) has more representativeness, it should be more self-organized, and as a result it should be more insensitive to artificially seeded noise points. In the light of this heuristic finding, we established the whole set of theoretical principles, based on which we proposed a two-stage framework to evaluate the relative importance of feature subsets, called seeding and harvest (S&H for short). At the first stage, we inject a number of artificial noise points into the original dataset; then at the second stage, we resort to an outlier detector to identify them under various feature patterns. The more precisely the seeded points can be extracted under a particular feature pattern, the more valuable and important the corresponding feature pattern should be. Besides, we compared our method with several state-of-the-art feature selection methods on a number of real-life datasets. The experiment results significantly confirm that our method can accomplish feature reduction tasks with high accuracy as well as low computing complexity.

References

[1]  Han, J.; Kamber, M. Data Mining: Concepts and Techniques; Elsevier: Singapore, 2006; pp. 5–9.
[2]  Gopal, R.; Marsden, J.R.; Vanthienen, J. Information mining—Reflections on recent advancements and the road ahead in data, text, and media mining. Decis. Support Syst. 2011, 51, 727–731.
[3]  Lonardi, S.; Chen, J. Data mining in bioinformatics: Selected papers from BIOKDD. IEEE ACM Trans. Comput. Bi. 2010, 7, 195–196.
[4]  Lindenbaum, M.; Markovitch, S.; Rusakov, D. Selective sampling for nearest neighbor classifiers. Mach. Learn. 2004, 54, 125–152.
[5]  Hall, M.A. Correlation-Based Feature Selection for Machine Learning. PhD Thesis, Department of Computer Science, University of Waikato, Hamilton, New Zealand, 1999.
[6]  Zhang, D.; Zhou, L. Discovering golden nuggets: Data mining in financial application. IEEE Trans. Syst. Man Cybern. C Appl. Rev. 2004, 34, 513–522.
[7]  Bellman, R.E. Adaptive Control Processes—A Guided Tour.; Princeton University Press: Princeton, NJ, USA, 1961; p. 255.
[8]  Kohavi, R.; John, G.H. Wrappers for Feature Subset Selection. Artif. Intell. 1997, 97, 273–324.
[9]  Parsons, L.; Haque, E.; Liu, H. Subspace clustering for high dimensional data: A review. SIGKDD Explor. Newsl. 2004, 6, 90–105.
[10]  Kusiak, A. Feature transformation methods in data mining. IEEE Trans. Electron. Packag. Manuf. 2001, 24, 214–221.
[11]  Mitchell, T.M. Machine Learning; McGraw Hill: New York, NY, USA, 1997; p. 414.
[12]  Yang, J.; Honavar, V. Feature subset selection using a genetic algorithm. IEEE Intell. Syst. 1998, 13, 44–49.
[13]  Bhatt, R.B.; Gopal, M. On fuzzy-rough sets approach to feature selection. Pattern Recogn. Lett. 2005, 26, 965–975.
[14]  Quinlan, J.R. C4.5: Programs for Machine Learning; Morgan Kaufmann Publishers Inc.: Burlington, MA, USA, 1993.
[15]  Guyon, I.; Elisseeff, A. An introduction to variable and feature selection. J. Mach. Learn. Res. 2003, 3, 1157–1182.
[16]  Chen, G.; Cai, Y.; Shi, J. Ordinal Isolation: An Efficient and Effective Intelligent Outlier Detection Algorithm. Proceedings of the 1st IEEE International Conference on Cyber Technology in Automation, Control, and Intelligent Systems, Kunming, China, 20– 23 March 2011; pp. 32–37.
[17]  Breunig, M.M.; Kriegel, H.P.; Ng, R.T.; Sander, J. LOF: Identifying density-based local outliers. ACM SIGMOD Record 2000, 29, 93–104.
[18]  Liu, F.T.; Ting, K.M.; Zhou, Z.H. Isolation Forest. Proceedings of the 8th IEEE International Conference on Data Mining (ICDM'08), Pisa, Italy, 15– 19 December 2008; pp. 413–422.
[19]  Hall, M.A.; Holmes, G. Benchmarking attribute selection techniques for discrete class data mining. IEEE Trans. Knowl. Data Eng. 2003, 15, 1437–1447.
[20]  Shannon, C.E. A mathematical theory of communication. SIGMOBILE Mob. Comput. Commun. Rev. 2001, 5, 3–55.
[21]  Liu, H.; Sun, J.; Liu, L.; Zhang, H. Feature selection with dynamic mutual information. Pattern Recogn. 2009, 42, 1330–1339.
[22]  Yan, H.; Yuan, X.; Yan, S.; Yang, J. Correntropy based feature selection using binary projection. Pattern Recogn. 2011, 44, 2834–2842.
[23]  Kira, K.; Rendell, L.A. A Practical Approach to Feature Selection. In Proceeding: ML92 Proceedings of the Ninth International Workshop on Machine Learning; Morgan Kaufmann Publishers Inc.: Burlington, MA, USA, 1992; pp. 249–256.
[24]  Kononenko, I. Estimating Attributes: Analysis and Extensions of RELIEF. Proceedings of European Conference on Machine Learning, Catana, Italy, 6–8April 1994; pp. 171–182.
[25]  Demsar, J. Algorithms for subsetting attribute values with relief. Mach. Learn. 2010, 78, 421–428.
[26]  Hall, M.A. Correlation-Based Feature Selection for Discrete and Numeric Class Machine Learning. Proceedings of the Seventeenth International Conference on Machine Learning (ICML 2000), Stanford, CA, USA, 29 June–2 July 2000; pp. 359–366.
[27]  Liu, H.; Setiono, R. A Probabilistic Approach to Feature Selection—A Filter Solution. Proceedings of the 13th International Conference on Machine Learning, Montreal, QC, Canada, 14–18 June 1996; pp. 319–327.
[28]  Almuallim, H.; Dietterich, T.G. Learning with Many Irrelevant Features. Proceedings of the Ninth National Conference on Artificial Intelligence (AAAI-91), Anaheim, CA, USA, 14–19 July 1991. Volume 2; pp. 547–552.
[29]  Dy, J.G.; Brodley, C.E. Feature selection for unsupervised learning. J. Mach. Learn. Res. 2004, 5, 845–889.
[30]  Hong, Y.; Kwong, S.; Chang, Y.; Ren, Q. Unsupervised feature selection using clustering ensembles and population based incremental learning algorithm. Pattern Recogn. 2008, 41, 2742–2756.
[31]  Cai, D.; Zhang, C.; He, X. Unsupervised Feature Selection for Multi-cluster Data. Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '10), Washington, DC, USA, 25– 28 July 2010; pp. 333–342.
[32]  Zhao, Z.; Liu, H. Spectral Feature Selection for Supervised and Unsupervised Learning. Proceedings of the 24th International Conference on Machine Learning (ICML '07),, Corvalis, OR, USA, 20–24 June 2007; pp. 1151–1157.
[33]  Mitra, P.; Murthy, C.; Pal, S. Unsupervised feature selection using feature similarity. IEEE Trans. Pattern Anal. 2002, 24, 301–312.
[34]  Lloyd, S. Least squares quantization in PCM. IEEE Inform. Theory. 1982, 28, 129–137.
[35]  Johnson, R.; Wichern, D. Chapter 1. Aspects of Multivariate Analysis. In Applied Multivariate Statistical Analysis, 6th ed. ed.; Pearson Prentice Hall: Upper Saddle River, NJ, USA, 2007; pp. 19–22.
[36]  Gallager, R. Principles of Digital Communication; Cambridge University Press: Cambridge, UK, 2008.
[37]  Balas, E.; Glover, F.; Zionts, S. An additive algorithm for solving linear programs with zero-one variables. Oper. Res. 1965, 13, 517–549.
[38]  Ho, Y.C.; Zhao, Q.C.; Jia, Q.S. Chapter 2 Ordinal Optimazation Fundamentals. In Ordinal Optimization: Soft Optimization for Hard Problems; Springer: New York, NY, USA, 2007; pp. 7–9.
[39]  Wikipedia. Plagiarism—Wikipedia, The Free Encyclopedia. 2011. Available online: http://en.wikipedia.org/wiki/Plagarism (accessed on 30 December 2011).
[40]  Frank, A.; Asuncion, A. UCI Machine Learning Repository. Available online: http://archive.ics.uci.edu/ml/ (accessed on 30 December 2010).
[41]  Witten, I.H.; Frank, E. Data Mining: Practical Machine Learning Tools and Techniques, 2nd ed. ed.; Morgan Kaufmann: Burlington, MA, USA, 2005.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133