This paper deals with transactions with their classes. The classes represent the difference of conditions in the data collection. This paper redefines two kinds of supports: characteristic support and possible support. The former one is based on specific classes assigned to specific patterns. The latter one is based on the minimum class in the classes. This paper proposes a new method that efficiently discovers patterns whose characteristic supports are larger than or equal to the predefined minimum support by using their possible supports. Also, this paper verifies the effect of the method through numerical experiments based on the data registered in the UCI machine learning repository and the RFID (radio frequency identification) data collected from two apparel shops. 1. Introduction Owing to the progress of computer environment and network environment, we can easily collect large amount of data and cheaply store it. We believe that the data includes useful knowledge which can help our decision making. Many researchers tackle on the discovery of the knowledge from the data since the mid-1990s. Various discovery tasks are studied in order to deal with various kinds of data. The discovery task of frequent patterns composed of items from transactions is one of the tasks. Each transaction is composed of an item set. In the retail field, a receipt and a sales item correspond to a transaction and an item, respectively. In the initial researches, [1] proposes a method that efficiently generates candidate patterns and discovers frequent patterns by using the Apriori property. Here, the property shows that the frequencies of patterns monotonically decrease as items composing the patterns increase. Reference [2] proposes a bitmap index, [3] proposes a vertical ID list, and [4] proposes a frequent pattern tree (FP-tree) in order to speedily access the transactions and efficiently calculate the frequencies of the patterns. It is possible for these improvements to speedily discover the frequent patterns. However, the discovered frequent patterns are not always the ones that are attractive for analysts. The discovery of patterns with different features is tried. For example, [5] tries to discover patterns whose orders based on the frequency are higher than the predefined order. Reference [6] does closed patterns representing many frequent patterns. Reference [7] does long patterns including many items. Reference [8] does patterns reflecting weights of items. Reference [9] does patterns reflecting hierarchical relationships among items. It is anticipated that more
References
[1]
R. Agrawal and R. Srikant, “Fast algorithms for mining association rules in large databases,” in Proceedings of the 20th International Conferance on Very Large Data Bases, pp. 487–499, Santiago, Chile, 1994.
[2]
T. Morzy and M. Zakrzewicz, “Group Bitmap Index: a structure for association rules retrieval,” in Proceedings of the International Conference on Knowledge Discovery and Data Mining, pp. 284–288, New York, NY, USA, 1998.
[3]
M. J. Zaki, S. Parthasarathy, M. Ogihara, and W. Li, “New algorithms for fast discovery of association rules,” in Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining, pp. 283–286, Newport Beach, Calif, USA, 1997.
[4]
J. Han, J. Pei, and Y. Yin, “Mining frequent patterns without candidate generation,” in Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 1–12, Dallas, Tex, USA, June 2000.
[5]
Z. H. Deng and G. D. Fang, “Mining top-rank-K frequent patterns,” in Proceedings of the 6th International Conference on Machine Learning and Cybernetics (ICMLC '07), pp. 851–856, Hong Kong, August 2007.
[6]
X. Yan, J. Han, and R. Afshar, “CloSpan: mining closed sequential patterns in large datasets,” in Proceedings of the SIAM International Conference on Data Mining, pp. 166–177, San Fransisco, Calif, USA, 2003.
[7]
R. J. Bayardo Jr., “Efficiently mining long patterns from databases,” in Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 85–93, Seattle, Wash, USA, June 1998.
[8]
C. H. Cai, A. W. C. Fu, C. H. Cheng, and W. W. Kwong, “Mining association rules with weighted items,” in Proceedings of the International Database Engineering and Applications Sympoium, pp. 68–77, Cardiff, UK, 1998.
[9]
M. Pater and D. E. Popescu, “Multi-level database mining using AFOPT data structure and adaptive support constrains,” International Journal of Computers, Communications & Control, vol. 3, pp. 437–441, 2008.
[10]
S. Sakurai and K. Mori, “Discovery of characteristic patterns from tabular structured data including missing values,” International Journal of Business Intelligence and Data Mining, vol. 5, no. 3, pp. 213–230, 2010.
[11]
A. Ragel and B. Crémilleux, “Treatment of missing values for association rules,” in Proceedings of the 2nd Pacific-Asia Conference on Research and Development in Knowledge Discovery and Data Mining, pp. 258–270, Melbourne, Australia, 1998.
[12]
T. Calders, B. Goethals, and M. Mampaey, “Mining itemsets in the presence of missing values,” in Proceedings of the ACM Symposium on Applied Computing, pp. 404–408, Seoul, Korea, March 2007.
[13]
University of California Irvine, UCI Machine Learning Repository, 2011, http://archive.ics.uci.edu/ml/.
[14]
S. Sakurai, “Prediction of sales volume based on the RFID data collected from apparel shops,” International Journal of Space-Based and Situated Computing, vol. 1, no. 2-3, pp. 174–182, 2011.