基于潜在语义分析的汉语问答系统答案提取  被引量:44

Answer Extracting for Chinese Question-Answering System Based on Latent Semantic Analysis

在线阅读下载全文

作  者:余正涛[1] 樊孝忠[2] 郭剑毅[1] 耿增民[2] 

机构地区:[1]昆明理工大学信息工程与自动化学院 [2]北京理工大学计算机科学工程系,北京100081

出  处:《计算机学报》2006年第10期1889-1893,共5页Chinese Journal of Computers

基  金:教育部博士点基金(20050007023);国家自然科学基金(60663004);云南省信息技术基金(2002IT03)资助.

摘  要:为了解决在汉语问答系统答案提取时,由于词的同义或多义现象而导致的“漏提”或“错提”等问题,提出了一种基于潜在语义分析(LSA)的问题和答案句子相似度计算方法.它利用空间向量模型作为问题和句子的表示方法,借助于潜在语义分析理论,对大量问答作句子语料统计分析,构建了一个潜在的词-句子语义空间,从而消除了词之间的相关性,并在语义空间上实现了问题与答案句子相似度计算,有效地解决了词的同义和多义问题.最后结合问题类型和相似度计算结果,对汉语基于事实的简单陈述问题进行了答案句子提取实验.答案提取的MRR值达到了0.47,明显优于空间向量模型.结果说明该方法具有很好的效果.When extracting answers in Chinese question-answering system, synonymy will cause to lose several correct answers, and polysemy will cause to extract wrong answers. In order to solve these problems, this paper proposes a method to calculate similarity between question and sentence based on Latent Semantic Analysis (LSA). This method represents the question and sentence with space vector model, statistically analyzes the abundant question-answering sentence pair corpus with the help of latent semantic analysis theory, and constructs a latent word-sentence semantic space, which gets rids of the correlativity between word. And then similarity calculation between question and sentence is implemented in this semantic space. So the question of synonymy and polysemy is solved effectively. Finally, combining question type and similarity between question and sentence, the experiment on extracting sentence as answer for Chinese factoid question is done. The MRR value with LSA is 0.47, which is better than VSM obviously. The results show that this method makes a very better effect.

关 键 词:问答系统 答案提取 相似度 向量空间模型 潜在语义分析 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象