信息检索中一种句子相似度的计算方法  

A calculation method of the sentence similarity in information retrieval

在线阅读下载全文

作  者:刘云芳[1] 杨燕[1] 贾真[1] 尹红风[1] 杨宇飞[1] 

机构地区:[1]西南交通大学信息科学与技术学院,四川成都610031

出  处:《应用科技》2014年第4期41-46,共6页Applied Science and Technology

基  金:国家自然科学基金资助项目(61170111;61152001);中国科学院自动化所复杂系统管理与控制重点实验室开放课题资助项目(20110102);中央高校基本科研业务费专项基金资助项目(SWJTU11ZT08)

摘  要:为提高信息检索中检索结果的查准率,提出了基于句法分析以及带权路径长度的句子相似度计算方法。该方法首先对用户问句进行了分词、词性标注以及句法分析处理,并根据处理后的结果对该句进行了关键词提取、加权和同义词近义词扩展处理。然后提出了基于带权路径长度计算的方法,并用该方法计算用户问句与检索信息标题句之间的相似度,即问句的带权路径长度与标题句的带权路径长度的相对比值,以此对检索结果进行二次排序,提高检索结果查准率。实验表明,该句子相似度方法能有效地提高信息检索中检索结果的查准率。In order to improve the precision ratio of retrieval results in information retrieval, a calculation method of the sentence similarity based on the syntactic analysis and weighted path length is been proposed .In this method , firstly, word segmentation, part-of-speech tagging and syntactic analysis are processed for a user question .Accord-ing to the processing result of the user question , the extraction , weighting , synonyms expansion and homoionym ex-pansion are conducted for the keywords in this user question .Then the method based on weighted path length calcu-lation is proposed in this paper .Using this method , the similarity between the user question and retrieval of infor-mation title words is calculated .The similarity also can be regarded as the relative ratio between the weighted path length of taglines and weighted path length of questions .Therefore , relying on the similarities , retrieval results is secondarily sorted and the recall and precision of results of information retrieval are improved .Experiments show that this method of sentence similarity calculation can improve the precision of retrieved result in information retriev -al .

关 键 词:信息检索 相似度 词性标注 句法分析 带权路径长度 二次排序 查准率 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象