检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:王巍[1,2,3] 刘阳 洪惠君[1,2] 梁雅静 Wang Wei;Liu Yang;Hong Huijun;Liang Yajing(School of Information&Electrical Engineering,Hebei University of Engineering,Handan 056038,Hebei,China;Hebei Key Laboratory of Security&Protection Information Sensing and Processing,Handan 056038,Hebei,China;School of Internet of Things Engineering,Jiangnan University,Wuxi 214122,Jiangsu,China)
机构地区:[1]河北工程大学信息与电气工程学院,河北邯郸056038 [2]河北省安防信息感知与处理重点实验室,河北邯郸056038 [3]江南大学物联网工程学院,江苏无锡214122
出 处:《计算机应用与软件》2023年第2期17-25,共9页Computer Applications and Software
基 金:国家自然科学基金项目(61802107);教育部-中国移动科研基金项目(MCM20170204);江苏省博士后科研资助计划项目(1601085C)。
摘 要:安防行业的结构化数据中存在大量的相似重复记录,传统的相似重复记录检测算法的识别率很难满足安防行业的实际需求。针对这种情况,引入了卷积神经网络模型,设计两种以LeNet-5模型为基础的改进模型,一种是输入为词向量矩阵的模型,另一种是输入为相似度矩阵的模型。实验表明,输入为词向量矩阵的模型的精确率和召回率均达到了96%以上,输入为相似度矩阵的模型的精确率和召回率高达98%,并且K折交叉验证的结果说明模型具有较强的泛化能力。There are a lot of approximately duplicate record in the structured data of security industry.The recognition rate of traditional approximately duplicate record detection algorithm is difficult to meet the actual demand of security industry.In order to solve the above problems,a convolutional neural network model was introduced and two improved models based on LeNet-5 model were designed.One was the model with input as word embedding matrix,the other is the model with input as similarity matrix.The experiments show that the precision rate and recall rate of the model with input as word embedding matrix reach more than 96%.And the precision rate and recall rate of the model with input as a similarity matrix reach up to 98%.The experimental results of K-fold cross validation show that both models have strong generalization ability.
关 键 词:安防行业 数据清洗 相似重复记录检测 CNN LeNet-5
分 类 号:TP311[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.104