检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:侯永帅 张耀允[1] 王晓龙[1] 陈清财[1] 王宇亮[1] 户保田[1]
机构地区:[1]哈尔滨工业大学深圳研究生院网络环境智能计算重点实验室,广东深圳518055
出 处:《计算机研究与发展》2013年第12期2612-2620,共9页Journal of Computer Research and Development
基 金:国家自然科学基金面上项目(61272383;61173075)
摘 要:当前问答系统如"百度知道"、"SoSo问问"等在问句检索时没有考虑时效性要求,对时间敏感问句不能返回满足时效要求的结果.针对该问题,设计了时间敏感问句的识别和检索方法:首先依据时效要求对问句进行分类,识别出时间敏感问句,然后解析时间敏感问句的时效区间,最后根据解析结果对问句检索结果进行过滤,得到满足时效要求的结果.问句分类采用词法、句法和语义等特征,使用决策树、朴素贝叶斯、SVM等机器学习方法进行测试.问句的时效区间使用构造的时间域表达式计算获得.实验表明,使用C5.0决策树进行时间敏感问句的识别准确率达到0.901;与未考虑时间敏感问题的系统相比,时间敏感问句检索结果平均精度得到较大改善.Currently, question-answering (Q&A) systems such as Baidu Zhidao, SoSo WenWen, etc., have been able to find out questions semantically relevant to most queries. However, for questions with time constraint, the performance of searching results is much worse than that of the queries without such constraint. To solve this problem, an automatical recognition and retrieval method for time-sensitive questions are proposed. At first, time-sensitive questions is recognized by using classification algorithms; next, time-range of the time-sensitive question is resolved; finally, the question search results are filtered by resolved time-range. To recognize time-sensitive questions, [exical, syntactic and semantic features are extracted; machine learning methods including the decision-tree, naiveBayes and SVM are employed; and AdaBoost algorithm is also adopted to solve the corpus imbalance issue. A resolving method is proposed to calculate question time-range. Based on those, a prototype system of question retrieval is used for validation, which is built from question and answer pairs of financial domain collected from Web. Experimental results show that, lay using the C5.0 decision tree algorithm, the precision of time-sensitive questions recognition reaches 0. 901; the mean average precision(MAP) of the retrieval result for time-sensitive questions is enhanced 0. 039 2 compared with SoSo WenWen, and is enhanced 0. 195 6 compared with Baidu Zhidao, increasing by 74.24% and 197.58% respectively. The average system response time of the question retrieval prototype system is 0. 628 7 s.
关 键 词:时间敏感问句 时间解析 问句分类 问句检索 问答系统
分 类 号:TP391[自动化与计算机技术—计算机应用技术] TP18[自动化与计算机技术—计算机科学与技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.49