检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:胡小琴 潘锦锋 HU Xiaoqin;PAN Jinfeng(Software Department,Quanzhou University of Information Engineering,Quanzhou 362000,China)
机构地区:[1]泉州信息工程学院软件学院,福建泉州362000
出 处:《成都工业学院学报》2023年第1期66-69,共4页Journal of Chengdu Technological University
基 金:福建省中青年教师教育科研项目(JAT190930)。
摘 要:为了提高试题库中重复信息自动化检测能力,提出面向试题库建设的大数据相似重复记录检测算法。采用大数据分析方法,构建试题库大数据相似重复记录分布模型,获取随机链路中重复记录的分布区间,采用层次关系入度集特征监测的方法,分析试题库大数据相似重复记录特征结构,根据获取的统计特征量,基于空间网格聚类方法对试题库大数据的相似重复记录进行融合处理,根据处理结果,在空间坐标系中实现大数据相似重复记录的检测。仿真实验结果表明,所提算法进行试题库的大数据相似重复记录检测的错误率较低,时间开销较小。In order to improve the automatic detection ability of repeated information in the item bank, the similar and repetitive record detection algorithm for the construction of item bank was proposed. Big data analysis methods was used to construct the distribution model of similar and repeated records in the item bank, and the distribution section of the repeated records in the random link was obtained. The method of hierarchical relational set feature monitoring was used to analyze the test key data similar to the recording feature structure.According to the obtained statistical feature amount, the similar repetition record of the test case base data was fused based on the spatial grid clustering method. According to the processing result, the detection of large data similarly repeated records was realized in the spatial coordinate system. The simulation experiment results show that the error rate of the large data similar to the counterproductory test bank is relatively low, and the time overhead is small.
关 键 词:大数据相似度 重复记录 检测算法 试题库设计 数据聚类
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.90