OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

Applied Computational Intelligence and Soft Computing 2012

An Application of Improved Gap-BIDE Algorithm for Discovering Access Patterns

DOI: 10.1155/2012/593147

Xiuming Yu,Meijing Li,Taewook Kim,Seon-phil Jeong,Keun Ho Ryu

Full-Text Cite this paper Add to My Lib

Abstract:

Discovering access patterns from web log data is a typical sequential pattern mining application, and a lot of access pattern mining algorithms have been proposed. In this paper, we propose an improved approach of Gap-BIDE algorithm to extract user access patterns from web log data. Compared with the previous Gap-BIDE algorithm, a process of getting a large event set is proposed in the provided algorithm; the proposed approach can find out the frequent events by discarding the infrequent events which do not occur continuously in an accessing time before generating candidate patterns. In the experiment, we compare the previous access pattern mining algorithm with the proposed one, which shows that our approach is very efficient in discovering access patterns in large database. 1. Introduction The web has become an important channel for conducting business transactions and e-commerce. Also, it provides a convenient means for us to communicate with each other worldwide. With the rapid development of web technology, the web has become an important and preferred platform for distributing and acquiring information. The data collected automatically by the web and application web servers represent the navigational behavior of web users, and such data is called web log data. Web mining is a technology to discover and extract useful information from web log data. Because of the tremendous growth of information sources, increasing interest of various research communities, and the recent interest in e-commerce, the area of web mining has become vast and more interesting. It deals with data related to the web, such as data hidden in web contents, data presented on web pages, and data stored on web servers. Based on the kinds of data, there are three categories of web mining: web content mining, web structure mining, and web usage mining [1]. The Web usage data includes the data from web server access logs, proxy server logs, and browser logs. It is also known as web access patterns. Web usage mining tries to discover the access patterns from web log files. Web access tracking can be defined as web page history [2]; the mining task is a process of extracting interesting patterns in web access logs. There are so many techniques of mining web usage data including statistical analysis [3], association rules [4], sequential patterns [5–7], classification [8–10], and clustering [11–13]. Access pattern mining is a popular approach of sequential pattern mining, which extracts frequent subsequences from a sequence database [14]. Further, discovering access patterns is an

References

[1]	L. K. J. Grace, V. Maheswari, and D. Nagamalai, “Analysis of web logs and web user in web mining,” International Journal of Network Security & Its Applications, vol. 3, no. 1, 2011.
[2]	K. Saxena and R. Shukla, “Significant interval and frequent pattern discovery in web log data,” International Journal of Computer Science Issue, vol. 7, no. 1, 2010.
[3]	K. Suresh and S. Paul, “Distributed linear programming for weblog data using mining techniques in distributed environment,” International Journal of Computer Applications (0975–8887), vol. 11, no. 7, 2010.
[4]	Y. Wang, J. Le, and D. Huang, “A method for privacy preserving mining of association rules based on web usage mining,” in International Conference on Web Information Systems and Mining (WISM '10), vol. 1, pp. 33–37, IEEE Computer Society Washington, Washington, DC, USA, 2010.
[5]	C. Wei, W. Sen, Z. Yuan, and L. C. Chang, “Algorithm of mining sequential patterns for web personalization services,” ACM SIGMIS Database, vol. 40, no. 2, pp. 57–66, 2009.
[6]	J. Zhu, H. Wu, and G. Gao, “An efficient method of web sequential pattern mining based on session filter and transaction identification,” Journal of Networks, vol. 5, no. 9, pp. 1017–1024, 2010.
[7]	X. Yu, M. Li, and H. Kim, “Mining access patterns using temporal interval relational rules from web logs,” in Proceedings of the 4th International Conference (FITAT/DBMI '11), pp. 80–83, 2011.
[8]	M. Santini, “Cross-testing a genre classification model for the web,” Genres on the Web, vol. 42, Part 3, pp. 87–128, 2011.
[9]	J. J. Rho, B. J. Moon, Y. J. Kim, and D. H. Yang, “Internet customer segmentation using web log data,” Journal of Business & Economics Research, vol. 2, no. 11, 2004.
[10]	N. Kej？ar, S. K. èerne, and V. Batagelj, “Network analysis of works on clustering and classification from web of science,” in Proceedings of the 11th Conference of the International Federation of Classification Societies (IFCS '10), Part 3, pp. 525–536, 2010.
[11]	G. Xu, Y. Zong, and P. Dolog, “Co-clustering analysis of weblogs using bipartite spectral projection approach,” in Proceedings of the 14th International Conference on Knowledge-Based and Intelligent Information and Engineering Systems (KES '10), vol. 6278, pp. 398–407, 2010.
[12]	A. A. O. Makanju, A. N. Zincir-Heywood, and E. E. Milios, “Clustering event logs using iterative partitioning,” in Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '09), pp. 1255–1263, July 2009.
[13]	J. Wang, Y. Mo, B. Huang, and J. Wen, “Web search results clustering based on a novel suffix tree structure,” in Proceedings of the 5th International Conference on Autonomic and Trusted Computing (ATC '08), vol. 5060, pp. 540–554, 2008.
[14]	J. Chen and T. Cook, “Mining contiguous sequential patterns from web logs,” in Proceedings of the 16th International World Wide Web Conference (WWW '07), pp. 1177–1178, May 2007.
[15]	M. Saravanan and B. Valaramathi, “Generalization of web log datas using WUM technique,” in Proceedings of the 12th International Conference on Networking, VLSI and signal processing (ICNVS '10), pp. 157–165, 2010.
[16]	N. R. Mabroukeh and C. I. Ezeife, “A taxonomy of sequential pattern mining algorithms,” ACM Computing Surveys, vol. 43, no. 1, article 3, 2010.
[17]	S. Ramakrishnan and A. Rakesh, “Mining sequential patterns: generalizations and performance improvements,” Lecture Notes in Computer Science, vol. 1057, pp. 3–17, 1996.
[18]	J. Wang, J. Han, and C. Li, “Frequent closed sequence mining without candidate maintenance,” IEEE Transactions on Knowledge and Data Engineering, vol. 19, no. 8, pp. 1042–1056, 2007.
[19]	C. Li and J. Wang, “Efficiently mining closed subsequences with gap constraints,” in Proceedings of International Conference on Data Mining (SIAM '08), April 2008.
[20]	X. Yu, M. Li, D. G. Lee, K. D. Kim, and K. H. Ryu, “Application of closed gap-constrained sequential pattern mining in web log data,” in Proceedings of the 2nd International Conference of Electrical and Electronics Engineering (ICEEE '11), pp. 649–657, 2011.
[21]	X. Yu, M. Li, H. Kim, D. G. Lee, and K. H. Ryu, “A novel approach to mining access patterns,” in Proceedings of the 3rd International Conference on Awareness Science and Technology, pp. 346–352, 2011.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133