检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:王瑞雪[1] 方婧[1] 桂思思 陆伟[1,3] 张显 Wang Ruixue;Fang Jing;Gui Sisi;Lu Wei;Zhang Xian(School of Information Management,Wuhan University,Wuhan 430072;College of Information Science&Technology,Nanjing Agricultural University,Nanjing 210095;Institute for Information Retrieval and Knowledge Mining,Wuhan University,Wuhan 430072;Baidu Times Network Technology(Beijing)Co.,Ltd.Beijing 100085)
机构地区:[1]武汉大学信息管理学院,武汉430072 [2]南京农业大学信息管理系,南京210095 [3]武汉大学信息检索与知识挖掘研究所,武汉430072 [4]百度时代网络技术(北京)有限公司,北京100085
出 处:《图书情报工作》2021年第3期93-99,共7页Library and Information Service
基 金:国家社会科学基金青年项目"面向学术搜索的查询意图研究"(项目编号:19CTQ023)研究成果之一。
摘 要:[目的/意义]实现学术查询意图的自动识别,提高学术搜索引擎的效率。[方法/过程]结合已有查询意图特征和学术搜索特点,从基本信息、特定关键词、实体和出现频率4个层面对查询表达式进行特征构造,运用Naive Bayes、Logistic回归、SVM、Random Forest四种分类算法进行查询意图自动识别的预实验,计算不同方法的准确率、召回率和F值。提出了一种将Logistic回归算法所预测的识别结果扩展到大规模数据集、提取"关键词类"特征的方法构建学术查询意图识别的深度学习两层分类器。[结果/结论]两层分类器的宏平均F1值为0.651,优于其他算法,能够有效平衡不同学术查询意图的类别准确率与召回率效果。两层分类器在学术探索类的效果最好,F1值为0.783。[Purpose/significance]To find the solutions of automatically identifying search query intent and improve the efficiency of academic search engines.[Method/process]Combining the features of query intent and academic search,we constructed the feature from four aspects,which are the basic descriptive statistics,the special keywords,entity information and the frequency.For the experiments,we examined four types of classifiers which are the Naive Bayes,Logistic regression,SVM,Random Forest and calculated precision,recall and F-measure.A method which is extending the recognition results of academic query intent predicted by Logistic regression algorithm to large-scale data sets and extracting"keyword type"features is proposed to construct a two-layer classifier based on deep learning algorithm for academic query intent recognition.[Result/conclusion]The macro-average F1 value of the two-layer classifier is 0.651,which is superior to other algorithms.This method can effectively balance the precision and recall rate of different academic query intentions.The final second-layer prediction model receives the best classification performance,the score of F1 is 0.783.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.191.28.190