检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:翟东海[1,2] 崔静静[1] 聂洪玉[1] 杜佳[2]
机构地区:[1]西南交通大学信息科学与技术学院,四川成都610031 [2]西藏大学工学院,西藏拉萨850000
出 处:《西南交通大学学报》2015年第3期517-522,共6页Journal of Southwest Jiaotong University
基 金:国家语委"十二五"科研规划资助项目(YB125-49);教育部科学技术研究重点项目(212167);中央高校基本科研业务费专项资金资助项目(SWJTU12CX096);国家级大学生创新创业训练计划资助项目(201210694017)
摘 要:为有效识别任意两篇报道的相似性,提出了一种基于语义相似度的话题关联检测算法.该算法首先通过计算特征词之间的相对熵作为两篇报道中特征词之间的语义相似度;其次,通过计算平均语义相似度获得特征词和报道之间的关联度;最后,结合特征词在语料库中的TF-IF(term frequency-inverse document frequency)权重计算两篇报道之间的关联度,实现报道之间的关联度检测.本文提出的方法与现有的向量空间模型方法和仅依赖于平均点互信息的方法进行了比较,并通过TDT4中文语料进行测评,结果表明,基于语义相似度的关联检测方法能够更好地利用文本的语境信息,提高了现有检测系统的性能,其最小DET(detection error tradeoff)代价降低了3%.To effectively judge the similarity between the topics of any two of stories, a topic link detection method was proposed on the basis of semantic similarity. First, the relative entropy between the feature words in two stories was calculated to work as the semantic similarity. Furthermore, the relevance between the feature words and the other story was obtained by calculating the average semantic similarity. At last, the relevance degree between two stories was calculated by considering TF-IF( term frequency-nverse document frequency)weights of the feature words in the corpus and the semantic similarity simultaneously, completing the link detection of the story pairs. The proposed algorithm was compared with the VSM (vector space model) method and average point-wise mutual information. The experimental results for Chinese Corpus of TDT4 show that minimum DET( detection error tradeoff)cost of the proposed algorithm is reduced by about 3% , which demonstrates that the proposed algorithm can impose the context information effectively and improve the performance of the topic link detection system simultaneously.
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.166