基于相似连接的多源数据并行预处理方法  被引量:12

Multi-source data parallel preprocessing method based on similar connection

在线阅读下载全文

作  者:郭方方[1] 潮洛蒙 朱建文 GUO Fangfang;CHAO Luomeng;ZHU Jianwen(School of Computer Science and Technology,Harbin Engineering University,Harbin Heilongjiang 150001,China)

机构地区:[1]哈尔滨工程大学计算机科学与技术学院,哈尔滨150001

出  处:《计算机应用》2019年第1期57-60,共4页journal of Computer Applications

基  金:国家科技重大专项(2016ZX03001023-005);国家级产学研合作项目(2016ZTE01-03-06);中央高校基本科研业务费专项(HEUCF100601)~~

摘  要:大规模网络环境和大数据相关技术的发展对传统数据融合分析技术提出了新的挑战。针对目前多源数据融合分析过程灵活性差、处理效率低的问题,提出了一种基于相似连接的多源数据并行预处理方法,该方法采用了分治和并行的思想。首先,通过对多源数据中的相似语义进行统一、对个性语义进行保留的预处理方法提高了灵活性;其次,提出了一种改进的并行MapReduce框架,提高了相似连接的效率。实验结果表明,所提方法在保证数据完整性的基础上,使总的数据量减小了32%。与传统的MapReduce框架相比,改进后的框架在耗费时间方面减小了43. 91%,因此该方法可以有效提高多源数据融合分析的效率。With the development of large-scale network environments and big data-related technologies, traditional data fusion analysis technology faces new challenges. Focusing on poor flexibility and low processing efficiency in current multisource data fusion analysis process, a multi-source data parallel preprocessing method based on similar connection was proposed, in which the idea of dividing and conquering and paralleling was adopted. Firstly, the preprocessing method was improved to increase the flexibility by unifying similar semantics in multi-source data and retaining personality semantics.Secondly, an improved parallel MapReduce framework was proposed to improve the efficiency of similar connections. The experimental results show that the proposed method reduces total data volume by 32% while ensuring data integrity. Compared with traditional MapReduce framework, the improved framework decreases 43. 91% of time consumed; therefore, the proposed method can effectively improve the efficiency of multi-source data fusion analysis.

关 键 词:网络安全 多源数据 数据预处理 相似连接 MAPREDUCE 

分 类 号:TP274[自动化与计算机技术—检测技术与自动化装置]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象