基于复述的中文自然语言接口  被引量:1

Chinese natural language interface based on paraphrasing

在线阅读下载全文

作  者:张俊驰[1] 胡婕[1] 刘梦赤[2] 

机构地区:[1]湖北大学计算机与信息工程学院,武汉430062 [2]软件工程国家重点实验室(武汉大学),武汉430072

出  处:《计算机应用》2016年第5期1290-1295,1301,共7页journal of Computer Applications

基  金:国家自然科学基金资助项目(61202100)~~

摘  要:针对传统以句法分析为主的数据库自然语言接口系统识别用户语义准确率不高,且需要大量人工标注训练语料的问题,提出了一种基于复述的中文自然语言接口(NLIDB)实现方法。首先提取用户语句中表征数据库实体词,建立候选树集及对应的形式化自然语言表达;其次由网络问答语料训练得到的复述分类器筛选出语义最相近的表达;最后将相应的候选树转换为结构化查询语句(SQL)。实验表明该方法在美国地理问答语料(Geo Queries880)、餐饮问答语料(Rest Queries250)上的F1值分别达到83.4%、90%,均优于句法分析方法。通过对比实验结果发现基于复述方法的数据库自然语言接口系统能更好地处理用户与数据库的语义鸿沟问题。In this paper,a novel method for Chinese Natural Language Interface of Database( NLIDB) based on Chinese paraphrase was proposed to solve the problems of traditional methods based on syntactic parsing which cannot obtain high accuracy and need a lot of manual label training corpus. First,key entities of user statements in databases were extracted,and candidate tree sets and their tree expressions were generated. Then most relevant semantic expressions were filtered by paraphrase classifier which was obtained from the Internet QA training corpus. Finally,candidate trees were translated into Structured Query Language( SQL). F1 score was respectively 83. 4% and 90% on data sets of Chinese America Geography( Geo Queries880) and Questions about Restaurants( Rest Queries250) by using the proposed method,better than syntactic based method. The experimental results demonstrate that the NLIDB based on paraphrase can handle the semantic gaps between users and databases better.

关 键 词:数据库自然语言接口 词向量 复述 自然语言表达 机器学习 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象