This paper improves a method which predicts whether evaluation objects such as companies and products are to be attractive in near future. The attractiveness is evaluated by trend rules. The trend rules represent relationships among evaluation objects, keywords, and numerical changes related to the evaluation objects. They are inductively acquired from text sequential data and numerical sequential data. The method assigns evaluation objects to the text sequential data by activating a topic dictionary. The dictionary describes keywords representing the numerical change. It can expand the amount of the training data. It is anticipated that the expansion leads to the acquisition of more valid trend rules. This paper applies the method to a task which predicts attractive stock brands based on both news headlines and stock price sequences. It shows that the method can improve the detection performance of evaluation objects through numerical experiments. 1. Introduction Recently, various kinds of sequential data are easily and cheaply collected from real world and virtual world. It is anticipated that the data includes the knowledge that brings smart life to us. Therefore, many researches aggressively tackle on the knowledge discovery task from the data [1–5]. On the other hand, the knowledge discovery task depends on features of the data and types of the knowledge. It is impossible to deal with all features and all types by only a method. It is indispensable to develop a discovery method reflecting target features and types. We try to develop a method which predicts whether evaluation objects such as companies and products are to be attractive in near future. This is because target data is easily collected from internet environments and it is easy for the prediction task to quantitatively evaluate the accuracy. The method deals with both text sequential data and numerical sequential data related to evaluation objects. It discovers trend rules from them. Each trend rule represents a relationship among evaluation objects, keywords, and numerical changes. The method applies the trend rules to text sequential data collected in the designated period and predicts attractive evaluation objects in the next period. It regards evaluation objects whose trends change as attractive evaluation objects. This paper aims at discovering more valid trend rules in order to improve detection performance in the prediction. It focuses on the expansion of the training data because many machine learning researches show that the expansion brings about better learning results. This
References
[1]
R. Agrawal and R. Srikant, “Mining sequential patterns,” in Proceedings of the 1995 International Conference on Knowledge Discovery and Data Mining, pp. 3–14, March 1995.
[2]
J. Pei, J. Han, B. Mortazavi-Asl et al., “PrefixSpan: mining sequential patterns efficiently by prefix-projected pattern growth,” in Proceedings of the 17th International Conference on Data Engineering, pp. 215–224, April 2001.
[3]
S. Sakurai and R. Orihara, “Discovery of important threads from bulletin board sites,” International Journal of Information Technology and Intelligent Computing, vol. 1, no. 1, pp. 217–228, 2006.
[4]
S. Sakurai and K. Ueno, “Analysis of daily business reports based on sequential text mining method,” in Proceedings of the IEEE International Conference on Systems, Man and Cybernetics (SMC '04), vol. 4, pp. 3279–3284, October 2004.
[5]
S. Yen, “Mining interesting sequential patterns for intelligent systems,” International Journal of Intelligent Systems, vol. 20, no. 1, pp. 73–87, 2005.
[6]
S. Sakurai, K. Makino, and S. Matsumoto, “A discovery method of trend rules from complex sequential data,” in Proceedings of the 26th IEEE International Conference on Advanced Information Networking and Applications Workshops (AINA '12), pp. 950–955, 2012.
[7]
W. Antweiler and M. Z. Frank, “Is all that talk just noise? The information content of Internet stock message boards,” Journal of Finance, vol. 59, no. 3, pp. 1259–1294, 2004.
[8]
J. Bollen, H. Mao, and X. Zeng, “Twitter mood predicts the stock market,” October 2010, http://arxiv.org/PS_cache/arxiv/pdf/1010/1010.3003v1.pdf.
[9]
X. Zhang, H. Fuehres, and P. A. Gloor, “Predicting Stock Market Indicators through Twitter, ‘I hope it is not as bad as I fear’,” Procedia, vol. 26, pp. 55–62, 2011.
[10]
G. P. C. Fung, J. X. Yu, and W. Lam, “News sensitive stock trend prediction,” in Proceedngs of the 6th Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 481–493, 2002.
[11]
M. Mittermayer and G. F. Knolmayer, “NewsCATS: a news categorization and trading system,” in Proceedings of the 6th International Conference on Data Mining, pp. 1002–1007, December 2006.
[12]
D. Peramunetilleke and R. K. Wong, “Currency exchange rate forecasting from news headlines,” in Proceedings of the 13th Australasian Database Conference, vol. 5, pp. 131–139, 2002.
[13]
M. de Choudhury, H. Sundaram, A. John, and D. D. Seligmann, “Can blog communication dynamics be correlated with stock market activity?” in Proceedings of the 19th ACM Conference on Hypertext and Hypermedia (HT '08), pp. 55–60, June 2008.
[14]
Y. Seo, J. A. Giampapa, and K. P. Sycaratech, “Financial news analysis for intelligent portfolio management,” Report CMU-RI-TR-04-04, Robotics Institute, Carnegie Mellon University, 2004.
[15]
S. Sakurai, K. Makino, H. Suzuki, and Y. Masaoka, “Ranking of evaluation targets based on complex sequential data,” in Proceedings of the 25th Annual Conference of the Japanese Society for Artificial Intelligence, 2G2-01, 2011, (Japanese).
J. Han, J. Pei, and Y. Yin, “Mining frequent patterns without candidate generation,” Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, vol. 29, no. 2, pp. 1–12, 2000.