We propose a preprocessing method to improve the performance of Principal Component Analysis (PCA) for classification problems composed of two steps; in the first step, the weight of each feature is calculated by using a feature weighting method. Then the features with weights larger than a predefined threshold are selected. The selected relevant features are then subject to the second step. In the second step, variances of features are changed until the variances of the features are corresponded to their importance. By taking the advantage of step 2 to reveal the class structure, we expect that the performance of PCA increases in classification problems. Results confirm the effectiveness of our proposed methods. 1. Introduction In many real world applications, we faced databases with a large set of features. Unfortunately, in the high-dimensional spaces, data become extremely sparse and far apart from each other. Experiments show that in this situation once the number of features linearly increases, the required number of examples for learning exponentially increases. This phenomenon is commonly known as the curse of dimensionality. Dimensionality reduction is an effective solution to the problem of curse of dimensionality [1, 2]. Dimensionality reduction is to extract or select a subset of features to describe the target concept. The selection and extraction are based on finding a relevant subset of original features and generating a new feature space through transformation, respectively [1, 3]. The proper design of selection or extraction process improves the complexity and the performance of learning algorithms [4]. Feature selection concerns representing the data by selecting a small subset of its features in its original format [5]. The role of feature selection is critical, especially in applications involving many irrelevant features. Given a criterion function, feature selection is reduced to a search problem [4, 6]. Exhaustive search, when the number of the features is too large, is infeasible and heuristic search can be employed. These algorithms, such as sequential forward and/or backward selection [7, 8], have shown successful results in practical applications. However, none of them can provide any guarantee of optimality. This problem can be alleviated by using feature weighting, which assigns a real-value number to each feature to indicate its relevancy to the learning problem [6]. Among the existing feature weighting algorithms, ReliefF [5] is considered as one of the most successful ones due to its simplicity and effectiveness [9]. A
References
[1]
M. Dash and H. Liu, “Dimensionality reductionin,” in Encyclopedia of Computer Science and Engineering, B. W. Wah, Ed., vol. 2, pp. 958–966, John Wiley & Sons, Hoboken, NJ, USA, 2009.
[2]
H. Liu and H. Motoda, Computational Methods of Feature Selection, Taylor & Francis Group, 2008.
[3]
M. Yang, F. Wang, and P. Yang, “A novel feature selection algorithm based on hypothesis-margin,” Journal of Computers, vol. 3, no. 12, pp. 27–34, 2008.
[4]
Y. Sun and D. Wu, “A RELIEF based feature extraction algorithm,” in Proceedings of the 8th SIAM International Conference on Data Mining, pp. 188–195, April 2008.
[5]
K. Kira and L. A. Rendell, “A practical approach to feature selection,” in Proceedings of the 9th International Conference on Machine Learning, pp. 249–256, 1992.
[6]
Y. Sun, “Iterative RELIEF for feature weighting: algorithms, theories, and applications,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 29, no. 6, pp. 1035–1051, 2007.
[7]
S. C. Yusta, “Different metaheuristic strategies to solve the feature selection problem,” Pattern Recognition Letters, vol. 30, no. 5, pp. 525–534, 2009.
[8]
P. Pudil and J. Novovi?ová, “Novel methods for subset selection with respect to problem knowledge,” IEEE Intelligent Systems and Their Applications, vol. 13, no. 2, pp. 66–74, 1998.
[9]
T. G. Dietterich, “Machine-learning research: four current directions,” AI Magazine, vol. 18, no. 4, pp. 97–136, 1997.
[10]
D. Wettschereck, D. W. Aha, and T. Mohri, “A review and empirical evaluation of feature weighting methods for a class of lazy learning algorithms,” Artificial Intelligence Review, vol. 11, no. 1–5, pp. 273–314, 1997.
[11]
J. Yan, B. Zhang, N. Liu et al., “Effective and efficient dimensionality reduction for large-scale and streaming data preprocessing,” IEEE Transactions on Knowledge and Data Engineering, vol. 18, no. 3, pp. 320–332, 2006.
[12]
R. Jensen and Q. Shen, Computational Intelligence and Feature Selection Rough and Fuzzy Approach, Press Series on Computational Intelligence, John Wiley & Sons, 2008.
[13]
I. T. Jolliffe, Principal Component Analysis, Wiley, 2nd edition, 2002.
[14]
M. Turk and A. Pentland, “Eigenfaces for recognition,” Journal of Cognitive Neuroscience, vol. 3, no. 1, pp. 71–86, 1991.
[15]
H. Murase, F. Kimura, M. Yoshimura, and Y. Miyake, “An improvement of the autocorrelation matrix in pattern matching method and its applicationto handprinted ‘HIRAGANA’,” IECE Transactions, vol. 64, no. 3, pp. 276–283, 1981.
[16]
H. Murase and S. K. Nayar, “Visual learning and recognition of 3-d objects from appearance,” International Journal of Computer Vision, vol. 14, no. 1, pp. 5–24, 1995.
[17]
S. K. Nayar, S. A. Nene, and H. Murase, “Subspace methods for robot vision,” IEEE Transactions on Robotics and Automation, vol. 12, no. 5, pp. 750–758, 1996.
[18]
S. Chen and Y. Zhu, “Subpattern-based principle component analysis,” Pattern Recognition, vol. 37, no. 5, pp. 1081–1083, 2004.
[19]
J. F. Pinto Da Costa, H. Alonso, and L. Roque, “A weighted principal component analysis and its application to gene expression data,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 8, no. 1, pp. 246–252, 2011.
[20]
K. Honda, A. Notsu, and H. Ichihashi, “Variable weighting in PCA-guided κ-means and its connection with information summarization,” Journal of Advanced Computational Intelligence and Intelligent Informatics, vol. 15, no. 1, pp. 83–89, 2011.
[21]
A. M. Martinez and A. C. Kak, “PCA versus LDA,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, no. 2, pp. 228–233, 2001.
[22]
I. Kononenko, “Estimating attributes: analysis and extensions of RELIEF,” in Proceedings of the European Conference on Machine Learning (ECML '54), pp. 71–182, 1994.
[23]
R. Gilad-Bachrach, A. Navot, and N. Tishby, “Margin based feature selection—theory and algorithms,” in Proceeding of the 21st International Conference on Machine Learning (ICML '04), pp. 337–344, July 2004.
[24]
J. Z. Huang, M. K. Ng, H. Rong, and Z. Li, “Automated variable weighting in k-means type clustering,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 5, pp. 657–668, 2005.
[25]
M. Dash, H. Liu, and J. Yao, “Dimensionality reduction of unsupervised data,” in Proceedings if the 9th IEEE International Conference on Tools with Artificial Intelligence (ICTAI '97), pp. 532–539, November 1997.