基于双阈值Apriori算法和非频繁项集的关联规则挖掘方法被引量：19

Association rule mining method based on double threshold Apriori algorithm and infrequent itemsets

作　　者：阮梦黎吴磊[2] Ruan Mengli;Wu Lei(School of Information Engineering,Shandong Management University,Jinan 250357,China;School of Information Science ＆ Engineering,Shandong Normal University,Jinan 250358,China)

机构地区：[1]山东管理学院信息工程学院,济南250357 [2]山东师范大学信息科学与工程学院,济南250358

出　　处：《计算机应用研究》2018年第12期3579-3583,共5页Application Research of Computers

基　　金：国家自然科学基金资助项目(61602287);山东省社会科学规划研究项目(17CQXJ11);山东省高等学校科技计划资助项目(J16LN70)

摘　　要：针对从数据集中的正负关联规则挖掘问题,提出一种基于双阈值Apriori算法和非频繁项集的挖掘方法。首先,对通过逆文档频率(IDF)对语料库中的项(项集)进行加权,筛选出前N%的项集;然后,通过提出的双支持度阈值Apriori算法来提取频繁项集和非频繁项集,以此降低非频繁项集的数量;最后,通过置信度和升降度阈值的判断,分别从频繁项集和非频繁项集中挖掘正负关联规则。其中,创新性地利用了非频繁项集来挖掘正负关联规则。在一个医学文本数据集上的实验结果表明,提出的方法能够有效地挖掘出正负关联规则,且能够大大降低项集和规则数量。For the issues that mining positive and negative association rules from the dataset,this paper proposed a mining method based on double threshold Apriori algorithm and infrequent itemsets. Firstly,it weighted the items in the corpus by the inverse document frequency（ IDF） to filter out the top N% of the itemsets. Then,it extracted the frequent itemsets and the nonfrequent itemsets through the proposed double support threshold Apriori algorithm,to reduce the number of infrequent itemsets.Finally,it excavated the positive and negative association rules respectively from the frequent itemsets and the infrequent itemsets through the judgment of the confidence level and lifting. Among them,it innovative used of infrequent itemsets to mining positive and negative association rules. The experimental results on a medical text dataset show that the proposed method can effectively mine the positive and negative association rules and can greatly reduce the number of itemsets and rules.

关键词：正负关联规则挖掘双支持度阈值 APRIORI算法非频繁项集 IDF加权

分类号：TP181[自动化与计算机技术—控制理论与控制工程]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于双阈值Apriori算法和非频繁项集的关联规则挖掘方法被引量：19

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于双阈值Apriori算法和非频繁项集的关联规则挖掘方法 被引量：19

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于双阈值Apriori算法和非频繁项集的关联规则挖掘方法被引量：19