三元概念的分布式并行构造算法  

Distributed Parallel Construction Algorithm for Triadic Concepts

在线阅读下载全文

作  者:李金海 王坤[1,2] 陈强强 LI Jinhai;WANG Kun;CHEN Qiangqiang(Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500;Data Science Research Center,Kunming University of Science and Technology,Kunming 650500;Faculty of Science,Kunming University of Science and Technology,Kunming 650500)

机构地区:[1]昆明理工大学信息工程与自动化学院,昆明650500 [2]昆明理工大学数据科学研究中心,昆明650500 [3]昆明理工大学理学院,昆明650500

出  处:《模式识别与人工智能》2024年第10期873-886,共14页Pattern Recognition and Artificial Intelligence

基  金:国家自然科学基金项目(No.62476114);云南省基础研究计划项目(No.202401AV070009)资助。

摘  要:作为形式概念分析的扩展,三元概念分析在高维数据的理论和应用中均取得显著效果.然而,数据量的极速增长导致三元概念的生成算法的时间复杂度呈指数级增长,在现实应用中面临巨大挑战,需要构造并行算法.因此文中提出适用于大规模数据的三元概念分布式并行构造算法,首先给出对象-属性和属性-条件三元概念的相关理论,并证明所有三元概念可通过合并这两种类型的中间概念生成.然后,采用两阶段聚合策略,改进Spark框架中的弹性分布式数据集操作符,有效解决数据倾斜问题,明显提升算法的运行效率.最后,在多个公开数据集上的实验表明,文中算法在海量数据中的三元概念生成过程中表现高效.As an extension of formal concept analysis,triadic concept analysis achieves significant results in both theory and applications of high-dimensional data.However,the time complexity of triadic concept generation algorithms,caused by the rapid growth of data volume,typically grows exponentially,presenting significant challenges in practical applications.Therefore,parallel algorithms are crucial.In this paper,a distributed parallel construction algorithm for triadic concepts suitable for large-scale data is proposed.First,the theories of object-attribute triadic concepts and attribute-condition triadic concepts are provided,and it is proved that all triadic concepts can be generated by merging these two types of intermediate concepts.Second,a two-stage aggregation strategy is employed to improve the resilient distributed dataset operator in the Spark framework.Consequently,the data skew problem is effectively solved and the efficiency of the proposed algorithm is significantly improved.Finally,experiments on multiple public datasets indicate that the proposed algorithm performs efficiently in generating triadic concepts for large datasets.

关 键 词:形式概念 三元概念 分布式并行 两阶段聚合 数据倾斜 

分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象