A Text Similarity Measurement Based on Semantic Fingerprint of Characteristic Phrases 被引量：3

作　　者：PANG Shanchen YAO Jiamin LIU Ting ZHAO Hua CHEN Hongqi

机构地区：[1]College of Computer and Communication Engineering,China University of Petroleum,Qingdao 266580,China [2]Weihai Science and Technology Bureau,Weihai 264200,China [3]College of Computer Science and Engineering,Shandong University of Science and Technology,Qingdao 266590,China [4]College of Intelligence and Computing,Tianjin University,Tianjin 300350,China

出　　处：《Chinese Journal of Electronics》2020年第2期233-241,共9页电子学报（英文版）

基　　金：supported by the National Natural Science Foundation of China(No.61572523,No.61873281,No.61572522)。

摘　　要：Text similarity measurements are the basis for measuring the degree of matching between two or more texts. Traditional large-scale similarity detection methods based on a digital fingerprint have the advantage of high detection speed, which are only suitable for accurate detection. We propose a method of Chinese text similarity measurement based on feature phrase semantics.Natural language processing(NLP) technology is used to pre-process text and extract the keywords by the Term frequency-Inverse document frequency(TF-IDF) model and further screen out the feature words. We get the exact meaning of a word and semantic similarities between words and a How Net semantic dictionary. We substitute concepts to get the feature phrases and generate a semantic fingerprint and calculate similarity. The experimental results indicate that the method proposed is superior in similarity detection in terms of its accuracy rate,recall rate, and F-value to the traditional and digital fingerprinting method.Text similarity measurements are the basis for measuring the degree of matching between two or more texts. Traditional large-scale similarity detection methods based on a digital fingerprint have the advantage of high detection speed, which are only suitable for accurate detection. We propose a method of Chinese text similarity measurement based on feature phrase semantics.Natural language processing(NLP) technology is used to pre-process text and extract the keywords by the Term frequency-Inverse document frequency(TF-IDF) model and further screen out the feature words. We get the exact meaning of a word and semantic similarities between words and a How Net semantic dictionary. We substitute concepts to get the feature phrases and generate a semantic fingerprint and calculate similarity. The experimental results indicate that the method proposed is superior in similarity detection in terms of its accuracy rate,recall rate, and F-value to the traditional and digital fingerprinting method.

关键词：Term frequency-Inverse document frequency(TF-IDF) model Semantic fingerprint SIMILARITY Characteristic phrases

分类号：TP391.1[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

A Text Similarity Measurement Based on Semantic Fingerprint of Characteristic Phrases 被引量：3

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

A Text Similarity Measurement Based on Semantic Fingerprint of Characteristic Phrases 被引量：3

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索