基于语义相似度的话题关联检测方法  被引量:6

Topic Link Detection Method Based on Semantic Similarity

在线阅读下载全文

作  者:翟东海[1,2] 崔静静[1] 聂洪玉[1] 杜佳[2] 

机构地区:[1]西南交通大学信息科学与技术学院,四川成都610031 [2]西藏大学工学院,西藏拉萨850000

出  处:《西南交通大学学报》2015年第3期517-522,共6页Journal of Southwest Jiaotong University

基  金:国家语委"十二五"科研规划资助项目(YB125-49);教育部科学技术研究重点项目(212167);中央高校基本科研业务费专项资金资助项目(SWJTU12CX096);国家级大学生创新创业训练计划资助项目(201210694017)

摘  要:为有效识别任意两篇报道的相似性,提出了一种基于语义相似度的话题关联检测算法.该算法首先通过计算特征词之间的相对熵作为两篇报道中特征词之间的语义相似度;其次,通过计算平均语义相似度获得特征词和报道之间的关联度;最后,结合特征词在语料库中的TF-IF(term frequency-inverse document frequency)权重计算两篇报道之间的关联度,实现报道之间的关联度检测.本文提出的方法与现有的向量空间模型方法和仅依赖于平均点互信息的方法进行了比较,并通过TDT4中文语料进行测评,结果表明,基于语义相似度的关联检测方法能够更好地利用文本的语境信息,提高了现有检测系统的性能,其最小DET(detection error tradeoff)代价降低了3%.To effectively judge the similarity between the topics of any two of stories, a topic link detection method was proposed on the basis of semantic similarity. First, the relative entropy between the feature words in two stories was calculated to work as the semantic similarity. Furthermore, the relevance between the feature words and the other story was obtained by calculating the average semantic similarity. At last, the relevance degree between two stories was calculated by considering TF-IF( term frequency-nverse document frequency)weights of the feature words in the corpus and the semantic similarity simultaneously, completing the link detection of the story pairs. The proposed algorithm was compared with the VSM (vector space model) method and average point-wise mutual information. The experimental results for Chinese Corpus of TDT4 show that minimum DET( detection error tradeoff)cost of the proposed algorithm is reduced by about 3% , which demonstrates that the proposed algorithm can impose the context information effectively and improve the performance of the topic link detection system simultaneously.

关 键 词:关联检测 语义相似度 相对熵 关联度 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象