基于深度学习算法的学术查询意图分类器构建  被引量:9

Based on Deep Learning Algorithm to Construct the Classifier of Academic Query Intent

在线阅读下载全文

作  者:王瑞雪[1] 方婧[1] 桂思思 陆伟[1,3] 张显 Wang Ruixue;Fang Jing;Gui Sisi;Lu Wei;Zhang Xian(School of Information Management,Wuhan University,Wuhan 430072;College of Information Science&Technology,Nanjing Agricultural University,Nanjing 210095;Institute for Information Retrieval and Knowledge Mining,Wuhan University,Wuhan 430072;Baidu Times Network Technology(Beijing)Co.,Ltd.Beijing 100085)

机构地区:[1]武汉大学信息管理学院,武汉430072 [2]南京农业大学信息管理系,南京210095 [3]武汉大学信息检索与知识挖掘研究所,武汉430072 [4]百度时代网络技术(北京)有限公司,北京100085

出  处:《图书情报工作》2021年第3期93-99,共7页Library and Information Service

基  金:国家社会科学基金青年项目"面向学术搜索的查询意图研究"(项目编号:19CTQ023)研究成果之一。

摘  要:[目的/意义]实现学术查询意图的自动识别,提高学术搜索引擎的效率。[方法/过程]结合已有查询意图特征和学术搜索特点,从基本信息、特定关键词、实体和出现频率4个层面对查询表达式进行特征构造,运用Naive Bayes、Logistic回归、SVM、Random Forest四种分类算法进行查询意图自动识别的预实验,计算不同方法的准确率、召回率和F值。提出了一种将Logistic回归算法所预测的识别结果扩展到大规模数据集、提取"关键词类"特征的方法构建学术查询意图识别的深度学习两层分类器。[结果/结论]两层分类器的宏平均F1值为0.651,优于其他算法,能够有效平衡不同学术查询意图的类别准确率与召回率效果。两层分类器在学术探索类的效果最好,F1值为0.783。[Purpose/significance]To find the solutions of automatically identifying search query intent and improve the efficiency of academic search engines.[Method/process]Combining the features of query intent and academic search,we constructed the feature from four aspects,which are the basic descriptive statistics,the special keywords,entity information and the frequency.For the experiments,we examined four types of classifiers which are the Naive Bayes,Logistic regression,SVM,Random Forest and calculated precision,recall and F-measure.A method which is extending the recognition results of academic query intent predicted by Logistic regression algorithm to large-scale data sets and extracting"keyword type"features is proposed to construct a two-layer classifier based on deep learning algorithm for academic query intent recognition.[Result/conclusion]The macro-average F1 value of the two-layer classifier is 0.651,which is superior to other algorithms.This method can effectively balance the precision and recall rate of different academic query intentions.The final second-layer prediction model receives the best classification performance,the score of F1 is 0.783.

关 键 词:学术查询意图 自动识别 两层分类器 

分 类 号:G250.2[文化科学—图书馆学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象