A Text Similarity Measurement Based on Semantic Fingerprint of Characteristic Phrases  被引量:3

在线阅读下载全文

作  者:PANG Shanchen YAO Jiamin LIU Ting ZHAO Hua CHEN Hongqi 

机构地区:[1]College of Computer and Communication Engineering,China University of Petroleum,Qingdao 266580,China [2]Weihai Science and Technology Bureau,Weihai 264200,China [3]College of Computer Science and Engineering,Shandong University of Science and Technology,Qingdao 266590,China [4]College of Intelligence and Computing,Tianjin University,Tianjin 300350,China

出  处:《Chinese Journal of Electronics》2020年第2期233-241,共9页电子学报(英文版)

基  金:supported by the National Natural Science Foundation of China(No.61572523,No.61873281,No.61572522)。

摘  要:Text similarity measurements are the basis for measuring the degree of matching between two or more texts. Traditional large-scale similarity detection methods based on a digital fingerprint have the advantage of high detection speed, which are only suitable for accurate detection. We propose a method of Chinese text similarity measurement based on feature phrase semantics.Natural language processing(NLP) technology is used to pre-process text and extract the keywords by the Term frequency-Inverse document frequency(TF-IDF) model and further screen out the feature words. We get the exact meaning of a word and semantic similarities between words and a How Net semantic dictionary. We substitute concepts to get the feature phrases and generate a semantic fingerprint and calculate similarity. The experimental results indicate that the method proposed is superior in similarity detection in terms of its accuracy rate,recall rate, and F-value to the traditional and digital fingerprinting method.Text similarity measurements are the basis for measuring the degree of matching between two or more texts. Traditional large-scale similarity detection methods based on a digital fingerprint have the advantage of high detection speed, which are only suitable for accurate detection. We propose a method of Chinese text similarity measurement based on feature phrase semantics.Natural language processing(NLP) technology is used to pre-process text and extract the keywords by the Term frequency-Inverse document frequency(TF-IDF) model and further screen out the feature words. We get the exact meaning of a word and semantic similarities between words and a How Net semantic dictionary. We substitute concepts to get the feature phrases and generate a semantic fingerprint and calculate similarity. The experimental results indicate that the method proposed is superior in similarity detection in terms of its accuracy rate,recall rate, and F-value to the traditional and digital fingerprinting method.

关 键 词:Term frequency-Inverse document frequency(TF-IDF) model Semantic fingerprint SIMILARITY Characteristic phrases 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象