OALib Journal期刊
ISSN: 2333-9721
费用：99美元

投递稿件

查看量	下载量

相关文章
更多...

计算机科学技术学报 2005

Multi-Scaling Sampling: An Adaptive Sampling Method for Discovering Approximate Association Rules

Cai-Yan Jia,Xie-Ping Gao,
Cai-Yan,Jia,and,Xie-Ping,Gao

Keywords: data mining,association rule,frequent itemset,sample error,-multi-scaling sampling
数据挖掘,结合规则,抽样误差,通用换算

Full-Text Cite this paper Add to My Lib

Abstract:

One of the obstacles of the efficient association rule mining is the explosive expansion of data sets since it is costly or impossible to scan large databases, esp., for multiple times. A popular solution to improve the speed and scalability of the association rule mining is to do the algorithm on a random sample instead of the entire database. But how to effectively define and efficiently estimate the degree of error with respect to the outcome of the algorithm, and how to determine the sample size needed are entangling researches until now. In this paper, an effective and efficient algorithm is given based on the PAC (Probably Approximate Correct) learning theory to measure and estimate sample error. Then, a new adaptive, on-line, fast sampling strategy - multi-scaling sampling - is presented inspired by MRA (Multi-Resolution Analysis) and Shannon sampling theorem, for quickly obtaining acceptably approximate association rules at appropriate sample size. Both theoretical analysis and empirical study have showed that the sampling strategy can achieve a very good speed-accuracy trade-off.

Full-Text

Contact Us

service@oalib.com

QQ:3279437679

WhatsApp +8615387084133