检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:PANG Shanchen YAO Jiamin LIU Ting ZHAO Hua CHEN Hongqi
机构地区:[1]College of Computer and Communication Engineering,China University of Petroleum,Qingdao 266580,China [2]Weihai Science and Technology Bureau,Weihai 264200,China [3]College of Computer Science and Engineering,Shandong University of Science and Technology,Qingdao 266590,China [4]College of Intelligence and Computing,Tianjin University,Tianjin 300350,China
出 处:《Chinese Journal of Electronics》2020年第2期233-241,共9页电子学报(英文版)
基 金:supported by the National Natural Science Foundation of China(No.61572523,No.61873281,No.61572522)。
摘 要:Text similarity measurements are the basis for measuring the degree of matching between two or more texts. Traditional large-scale similarity detection methods based on a digital fingerprint have the advantage of high detection speed, which are only suitable for accurate detection. We propose a method of Chinese text similarity measurement based on feature phrase semantics.Natural language processing(NLP) technology is used to pre-process text and extract the keywords by the Term frequency-Inverse document frequency(TF-IDF) model and further screen out the feature words. We get the exact meaning of a word and semantic similarities between words and a How Net semantic dictionary. We substitute concepts to get the feature phrases and generate a semantic fingerprint and calculate similarity. The experimental results indicate that the method proposed is superior in similarity detection in terms of its accuracy rate,recall rate, and F-value to the traditional and digital fingerprinting method.Text similarity measurements are the basis for measuring the degree of matching between two or more texts. Traditional large-scale similarity detection methods based on a digital fingerprint have the advantage of high detection speed, which are only suitable for accurate detection. We propose a method of Chinese text similarity measurement based on feature phrase semantics.Natural language processing(NLP) technology is used to pre-process text and extract the keywords by the Term frequency-Inverse document frequency(TF-IDF) model and further screen out the feature words. We get the exact meaning of a word and semantic similarities between words and a How Net semantic dictionary. We substitute concepts to get the feature phrases and generate a semantic fingerprint and calculate similarity. The experimental results indicate that the method proposed is superior in similarity detection in terms of its accuracy rate,recall rate, and F-value to the traditional and digital fingerprinting method.
关 键 词:Term frequency-Inverse document frequency(TF-IDF) model Semantic fingerprint SIMILARITY Characteristic phrases
分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.118.173.146