检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:孔明 魏东 冉义兵 毕国鹏 KONG Ming;WEI Dong;RAN Yi-bin;BI Guo-peng(School of Electrical and Information Engineering,Beijing University of Civil Engineering and Architecture,Beijing 100044,China;Beijing Key Laboratory of Intelligent Processing for Building Big Data,Beijing Municipal Science and Technology Commission,Beijing 100044,China;Beijing Telesound Electronics Company,Beijing 100094,China)
机构地区:[1]北京建筑大学电气与信息工程学院,北京100044 [2]北京市科学技术委员会建筑大数据智能处理方法研究北京市重点实验室,北京100044 [3]北京声讯电子股份有限公司,北京100094
出 处:《小型微型计算机系统》2023年第2期239-247,共9页Journal of Chinese Computer Systems
基 金:国家自然科学基金项目(61871020)资助;北京市属高校高水平创新团队建设计划项目(IDHT20190506)资助;北京市教委科技计划重点项目(KZ201810016019)资助.
摘 要:信息系统产生的大量事务日志数据蕴含着潜在的伴随模式,伴随模式是指在时空上频繁共现的一组对象.由于传统的滑动窗口算法和FP-Growth算法只能调用单一线程进行计算,随着数据规模的扩张,会导致挖掘伴随模式的时间急剧增加.为此本文提出了一种基于Fork/Join并行技术的伴随模式挖掘框架,其能够实现从单线程到多线程的迁移,充分利用多核配置的加速性能.该框架由划定伴随数据集、频繁项集挖掘和关联规则挖掘三部分组成.首先,提出了基于Fork/Join的多核并行滑动窗口算法,以缩短从事务日志中划定伴随数据集的时间;然后,提出基于Fork/Join的多核并行FP-Growth算法,以并行地挖掘伴随数据集中的频繁项集;最后,引入支持度、置信度和提升度3个参数,对伴随模式中各对象间的关联规则进行挖掘.基于门禁刷卡数据的实验结果表明,相比传统算法,本文所提出的框架能够挖掘出更多的伴随模式,同时挖掘效率较高.A co-occurrence pattern(CP)is a group of objects appearing simultaneously,which widely exists inside the large number of transaction log data generated by information systems.However,as the size of transaction log increases,traditional sliding window methods,and FP-Growth algorithm that only apply single thread capability will leads to huge computational time.The developed method of this research defines a new CP mining framework based on Fork/Join parallelism technique,which can take full advantage of multi-core architectures and support multi-threads executing concurrently.The framework consists of three phases.The first phase proposes a multi-core parallel sliding window algorithm based on Fork/Join to improve the efficiency of delineating CP dataset from the transaction log.The second one tends to implement a Fork/Join based multi-core parallel FP-Growth algorithm for mining frequent itemsets,which allows fast construction and information retrieval.And then three parameters of support,confidence and lift are introduced to evaluate performance for the association rules among items in the CPs.The approach was thoroughly tested and compared to well-known data mining algorithms on real access control card data.The results show that the approach can mine more CPs from transaction logs with higher mining efficiency.
关 键 词:事务日志 伴随模式 Fork/Join框架 滑动窗口 FP-GROWTH算法
分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.16.50.164