|
- 2016
基于析取规则对不确定数据挖掘的优化研究
|
Abstract:
在商业、医疗等数据分析中,不知道存在或不存在的一些项目事件组成的数据叫做不确定数据,它的特点是离散型随机变量的概率分布,不确定数据中的挖掘算法研究是当前大数据分析中的重要方向。针对不确定数据的随机变量性特征,有效提高挖掘结果的置信度和提高算法运行时间,本文提出一种挖掘析取关联规则的算法DRUD,利用模糊集的方法选取2元频繁项集,对比最小支持度,完成有效的析取规则提取。经过在大量不同不确定数据库中仿真表明,对比类似算法UApriori和PFCIM,本文所提出的DRUD算法产生的规则置信度得到提高,算法效率有较好改进,新的算法更加适用于不确定数据中的大数据挖掘应用。
In the data analysis of commercial, medical etc, the database that consist of some transactions we don’t know if they will appear or not callded uncertain database, the occurrence of data is characterized as discrete random variables and thus represented by probability distributions. Association rules mining from uncertain databases is one of the hot problems from data mining. Aming at the characteristics of uncertain database, this paper proposes a disjunctive rules mining algorithm called DRUD. The algorithm first to select all possible pairs of frequent itemsets, comparing the minimum support, and then extract the effective disjunctive rules. Simulation show that ,compared with UApriori and PFCIM, the confidence of the rules generated by algorithm DRUD has improved, the run time of DRUD also hao improved, so the new algorithm DRUD is more applicable to the massive uncertain database ming applications