中文问答系统中时间敏感问句的识别和检索  被引量:4

Recognition and Retrieval of Time-sensitive Question in Chinese QA System

在线阅读下载全文

作  者:侯永帅 张耀允[1] 王晓龙[1] 陈清财[1] 王宇亮[1] 户保田[1] 

机构地区:[1]哈尔滨工业大学深圳研究生院网络环境智能计算重点实验室,广东深圳518055

出  处:《计算机研究与发展》2013年第12期2612-2620,共9页Journal of Computer Research and Development

基  金:国家自然科学基金面上项目(61272383;61173075)

摘  要:当前问答系统如"百度知道"、"SoSo问问"等在问句检索时没有考虑时效性要求,对时间敏感问句不能返回满足时效要求的结果.针对该问题,设计了时间敏感问句的识别和检索方法:首先依据时效要求对问句进行分类,识别出时间敏感问句,然后解析时间敏感问句的时效区间,最后根据解析结果对问句检索结果进行过滤,得到满足时效要求的结果.问句分类采用词法、句法和语义等特征,使用决策树、朴素贝叶斯、SVM等机器学习方法进行测试.问句的时效区间使用构造的时间域表达式计算获得.实验表明,使用C5.0决策树进行时间敏感问句的识别准确率达到0.901;与未考虑时间敏感问题的系统相比,时间敏感问句检索结果平均精度得到较大改善.Currently, question-answering (Q&A) systems such as Baidu Zhidao, SoSo WenWen, etc., have been able to find out questions semantically relevant to most queries. However, for questions with time constraint, the performance of searching results is much worse than that of the queries without such constraint. To solve this problem, an automatical recognition and retrieval method for time-sensitive questions are proposed. At first, time-sensitive questions is recognized by using classification algorithms; next, time-range of the time-sensitive question is resolved; finally, the question search results are filtered by resolved time-range. To recognize time-sensitive questions, [exical, syntactic and semantic features are extracted; machine learning methods including the decision-tree, naiveBayes and SVM are employed; and AdaBoost algorithm is also adopted to solve the corpus imbalance issue. A resolving method is proposed to calculate question time-range. Based on those, a prototype system of question retrieval is used for validation, which is built from question and answer pairs of financial domain collected from Web. Experimental results show that, lay using the C5.0 decision tree algorithm, the precision of time-sensitive questions recognition reaches 0. 901; the mean average precision(MAP) of the retrieval result for time-sensitive questions is enhanced 0. 039 2 compared with SoSo WenWen, and is enhanced 0. 195 6 compared with Baidu Zhidao, increasing by 74.24% and 197.58% respectively. The average system response time of the question retrieval prototype system is 0. 628 7 s.

关 键 词:时间敏感问句 时间解析 问句分类 问句检索 问答系统 

分 类 号:TP391[自动化与计算机技术—计算机应用技术] TP18[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象