%0 Journal Article
%T RPCA: A Novel Preprocessing Method for PCA
%A Samaneh Yazdani
%A Jamshid Shanbehzadeh
%A Mohammad Taghi Manzuri Shalmani
%J Advances in Artificial Intelligence
%D 2012
%I Hindawi Publishing Corporation
%R 10.1155/2012/484595
%X We propose a preprocessing method to improve the performance of Principal Component Analysis (PCA) for classification problems composed of two steps; in the first step, the weight of each feature is calculated by using a feature weighting method. Then the features with weights larger than a predefined threshold are selected. The selected relevant features are then subject to the second step. In the second step, variances of features are changed until the variances of the features are corresponded to their importance. By taking the advantage of step 2 to reveal the class structure, we expect that the performance of PCA increases in classification problems. Results confirm the effectiveness of our proposed methods. 1. Introduction In many real world applications, we faced databases with a large set of features. Unfortunately, in the high-dimensional spaces, data become extremely sparse and far apart from each other. Experiments show that in this situation once the number of features linearly increases, the required number of examples for learning exponentially increases. This phenomenon is commonly known as the curse of dimensionality. Dimensionality reduction is an effective solution to the problem of curse of dimensionality [1, 2]. Dimensionality reduction is to extract or select a subset of features to describe the target concept. The selection and extraction are based on finding a relevant subset of original features and generating a new feature space through transformation, respectively [1, 3]. The proper design of selection or extraction process improves the complexity and the performance of learning algorithms [4]. Feature selection concerns representing the data by selecting a small subset of its features in its original format [5]. The role of feature selection is critical, especially in applications involving many irrelevant features. Given a criterion function, feature selection is reduced to a search problem [4, 6]. Exhaustive search, when the number of the features is too large, is infeasible and heuristic search can be employed. These algorithms, such as sequential forward and/or backward selection [7, 8], have shown successful results in practical applications. However, none of them can provide any guarantee of optimality. This problem can be alleviated by using feature weighting, which assigns a real-value number to each feature to indicate its relevancy to the learning problem [6]. Among the existing feature weighting algorithms, ReliefF [5] is considered as one of the most successful ones due to its simplicity and effectiveness [9]. A
%U http://www.hindawi.com/journals/aai/2012/484595/