基于Hadoop和Spark的雷达数据序列模式挖掘系统  被引量:5

Sequential pattern mining system of radar data based on Hadoop and Spark

在线阅读下载全文

作  者:罗祖兵 杨晓敏[1] 严斌宇[1] LUO Zubing;YANG Xiaomin;YAN Binyu(College of Electronics and Information Engineering,Sichuan University,Chengdu Sichuan 610065,China)

机构地区:[1]四川大学电子信息学院

出  处:《计算机应用》2019年第S02期169-174,共6页journal of Computer Applications

摘  要:针对传统单机模式下的数据挖掘系统难以处理大规模的雷达数据的问题,提出了一种基于分布式计算框架Hadoop和Spark的雷达数据序列模式挖掘系统。首先,对模拟的原始雷达数据进行一系列的预处理,包括基于密度的去噪、基于脉冲幅值一阶差分的符号化和数据分割,获取适合于后续挖掘的干净的数据;其次,将预处理后的雷达数据存入Hadoop分布式文件系统(HDFS),利用基于Spark的前缀投影序列模式挖掘算法(PrefixSpan)挖掘雷达数据中的频繁序列;最后,对挖掘的结果进行一定的后处理,先利用挖掘结果中的规律,对结果序列集的进行初步过滤,然后对剩余的结果集进行遍历过滤,获取最终的结果序列集。实验结果表明,随着数据集的不断增大,传统的单机模式下挖掘系统的处理时间增长迅速,很快便无法处理,而提出的雷达数据挖掘系统的处理时间增长比较缓慢,适合于处理海量的数据。For the problem that the traditional data mining system based on single-machine is difficult to deal with large-scale radar data,a sequential pattern mining system of radar data based on distributed computing framework,like Hadoop and Spark,was proposed.Firstly,a series of pre-processing of the simulated radar data,including denoising based on density,symbolization based on first-order difference of pulse amplitude and data segmentation,were carried out to obtain clean data,which is suitable for subsequent mining.Secondly,the pre-processed radar data were stored in Hadoop Distributed File System(HDFS),and then Prefix-Projected Pattern Growth(PrefixSpan),based on Spark,was used to mine frequential sequences in radar data.Finally,some post-processing steps were used to obtain the final result.The mining results were filtered preliminarily by using the rules in the mining results,and then the remaining result sets were filtered through traversal to obtain the final sequence result set.The experimental results show that with the increase of the amount of data sets,the processing time of the traditional mining system based on single-machine increases rapidly,even can not work,while the processing time of the proposed mining system of radar data increases slowly,and it can still work very well.

关 键 词:雷达数据 HADOOP SPARK 数据挖掘 

分 类 号:TP311.1[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象