This paper discusses and proposes a rough set model for an incomplete information system, which defines an extended tolerance relation using frequency of attribute values in such a system. It first discusses some rough set extensions in incomplete information systems. Next, “probability of matching” is defined from data in information systems and then measures the degree of tolerance. Consequently, a rough set model is developed using a tolerance relation defined with a threshold. The paper discusses the mathematical properties of the newly developed rough set model and also introduces a method to derive reducts and the core. 1. Introduction Rough set theory [1, 2] was first proposed by Pawlak as a means to analyze vague descriptions of items. The original rough sets approach presupposes that all objects in an information system have precise attribute values. Problems arise when some of the values are unknown, which sometimes happens in the real world. Therefore, it is necessary to develop a theory which enables classifications of objects even if there is only partial information available. The rough set model proposed by Kryszkiewicz [3, 4], for example, introduced indiscernibility based on tolerance relation to deal with missing values in the information system. In these approaches, a missing value was considered as a special value that may take any possible value. However, tolerance relation sometimes leads to a poor result with respect to approximation. Stefanowski and Tsoukiàs [5, 6] discussed the limitation and introduced similarity relation to refine the results obtained by using tolerance relation approach. Wang [7] gave some examples to prove that similarity relation may results in lost information and proposed limited tolerance relation. Yang et al. [8] also generalized a reasonable and flexible classification in incomplete information system by “new binary relation.” In fact, there is an array of methods to handle incomplete objects [9, 10]. Some approaches replace missing values with the most common value [11], while the other considers “unknown” itself as a new value for the attribute and treats it in the same way as ordinary values [10]. Actually, the method of handling missing values should be chosen depending on the characteristics and requirements of applications. In general, approaches deal with unavailable values based on one of the following two interpretations [12]. The first is “lost value” in which unknown values of attributes are already lost. Similarity relation [5] is one example of this semantics. The second is “do not care,”
References
[1]
Z. Pawlak, “Rough sets,” International Journal of Computer & Information Sciences, vol. 11, no. 5, pp. 341–356, 1982.
[2]
Z. Pawlak, Rough Sets. Theoretical Aspects of Reasoning About Data, Kluwer Academic Publishers, 1991.
[3]
M. Kryszkiewicz, “Rough set approach to incomplete information systems,” Information Sciences, vol. 112, no. 1–4, pp. 39–49, 1998.
[4]
M. Kryszkiewicz, “Rules in incomplete information systems,” Information Sciences, vol. 113, no. 3-4, pp. 271–292, 1999.
[5]
J. Stefanowski and A. Tsoukias, On the Extension Ofrough Sets under Incomplete Information, vol. 1711 of Lecture Notes in ArtificialIntelligence, 1999.
[6]
J. Stefanowski and A. Tsoukiàs, “Incomplete information tables and rough classification,” Computational Intelligence, vol. 17, no. 3, pp. 545–566, 2001.
[7]
G. Y. Wang, “Extension of rough set under incomplete information systems,” in Proceedings of the IEEE International Conference on Fuzzy Systems (FUZZ-IEEE '02), pp. 1098–1103, 2002.
[8]
X. Yang, X. Song, and X. Hu, “Generalisation of rough set for rule induction in incomplete system,” International Journal of Granular Computing, Rough Sets and Intelligent Systems, vol. 2, article 1, pp. 37–50, 2011.
[9]
J. W. Grzymala-Busse and M. Hu, “A comparison of several approaches to missing attribute values in data mining,” in Proceedings of the International Conference on Rough Sets and Current Trends in Computing (RSCTC '00), vol. 2005 of Lecture Notes in Computer Science, pp. 378–385, Springer, 2000.
[10]
J. W. Grzymala-Busse and W. J. Grzymala-Busse, “Handling missing attribute values,” in The Data Mining and Knowledge Discovery Handbook, O. Maimon and L. Rokach, Eds., pp. 37–57, Springer, Berlin, Germany, 2005.
[11]
P. Clark and T. Niblett, “The CN2 induction algorithm,” Machine Learning, vol. 3, no. 4, pp. 261–283, 1989.
[12]
J. W. Grzymala-Busse, “On the unknown attribute values in learning from examples,” in Proceedings of the 6th International Symposium on Methodologies for Intelligent Systems (ISMIS '91), vol. 542 of Lecture Notes in Artificial Intelligence, pp. 368–377, Springer, Charlotte, North Carolina, October 1991.
[13]
J. W. Grzymala-Busse, “Rough set strategies to data with missing attribute values,” in Proceedings of the Workshop on Foundations and New Directions in Data Mining, associated with the third IEEE International Conference on Data Mining, 2003.
[14]
J. W. Grzymala-Busse, “Data with missing attribute values: generalization of indiscernibility relation and rule induction,” in Transactions on Rough Sets, vol. 3100 of Lecture Notes in Computer Science, pp. 78–95, Springer, Berlin, Germany, 2004.
[15]
J. W. Grzymala-Busse, “Characteristic relations for incomplete data: a generalization of the indiscernibility relation,” in Proceedings of the 4th International Conference on Rough Sets and Current Trends in Computing (RSCTC '04), pp. 244–253, Uppsala, Sweden, 2004.
[16]
J. W. Grzymala-Busse, “Incomplete data and generalization of indiscernibility relation, definability, and approximations,” in Proceedings of the 10th International Conference on Rough Sets, Fuzzy Sets, data Mining, and Granular Computing (RSFDGrC '05), pp. 244–253, Springer, 2005.
[17]
J. W. Grzymala-Busse, “A rough set approach to data with missing attribute values,” in Proceedings of the 1st International Conference on Rough Sets and Knowledge Technology (RSKT '06), vol. 4062 of Lecture Notes in Computer Science, pp. 58–67, Springer, 2006.
[18]
E. A. Rady, M. M. E. Abd El-Monsef, and W. A. Adb El-Latif , “A modified rough sets approach to incomplete information systems,” Journal of Applied Mathematics and Decision Sciences, vol. 2007, Article ID 058248, 13 pages, 2007.
[19]
J. W. Grzymala-Busse and Wojciech Rzasa, “Definability and other properties of appoximations for generalized indiscernibility relations,” in Transactions on Rough Sets XI, vol. 5946 of Lecture Notes in Computer Science, pp. 14–39, Springer, Berlin, Germany, 2010.
[20]
Y. Y. Yao, “On generalizing rough set theory,” in Proceedings of the 9th International Conference of Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing (RSFDGrC '03), vol. 2639 of Lecture Notes in Computer Science, pp. 44–51, 2003.
[21]
Y. Y. Yao, “Relational interpretations of neighborhood operators and rough set approximation operators,” Information Sciences, vol. 111, no. 1–4, pp. 239–259, 1998.
[22]
Y. Y. Yao, “Rough Sets: neighborhood systems, and granular computing,” in Proceedings of the IEEE Canadian Conference on Electrical and Computer Engineering, pp. 1553–1558, 1999.