大数据相似重复记录检测算法在试题库中的运用  被引量:1

The Application of Detection Algorithm for Similar and Duplicate Record in Item Bank

在线阅读下载全文

作  者:胡小琴 潘锦锋 HU Xiaoqin;PAN Jinfeng(Software Department,Quanzhou University of Information Engineering,Quanzhou 362000,China)

机构地区:[1]泉州信息工程学院软件学院,福建泉州362000

出  处:《成都工业学院学报》2023年第1期66-69,共4页Journal of Chengdu Technological University

基  金:福建省中青年教师教育科研项目(JAT190930)。

摘  要:为了提高试题库中重复信息自动化检测能力,提出面向试题库建设的大数据相似重复记录检测算法。采用大数据分析方法,构建试题库大数据相似重复记录分布模型,获取随机链路中重复记录的分布区间,采用层次关系入度集特征监测的方法,分析试题库大数据相似重复记录特征结构,根据获取的统计特征量,基于空间网格聚类方法对试题库大数据的相似重复记录进行融合处理,根据处理结果,在空间坐标系中实现大数据相似重复记录的检测。仿真实验结果表明,所提算法进行试题库的大数据相似重复记录检测的错误率较低,时间开销较小。In order to improve the automatic detection ability of repeated information in the item bank, the similar and repetitive record detection algorithm for the construction of item bank was proposed. Big data analysis methods was used to construct the distribution model of similar and repeated records in the item bank, and the distribution section of the repeated records in the random link was obtained. The method of hierarchical relational set feature monitoring was used to analyze the test key data similar to the recording feature structure.According to the obtained statistical feature amount, the similar repetition record of the test case base data was fused based on the spatial grid clustering method. According to the processing result, the detection of large data similarly repeated records was realized in the spatial coordinate system. The simulation experiment results show that the error rate of the large data similar to the counterproductory test bank is relatively low, and the time overhead is small.

关 键 词:大数据相似度 重复记录 检测算法 试题库设计 数据聚类 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象