学术搜索日志中的个体用户识别研究  被引量:1

Research on Individual Identification in Academic Search Log

在线阅读下载全文

作  者:郑婷婷 陈翀 白海燕[2] 梁冰[2] Zheng Tingting;Chen Chong;Bai Haiyan;Liang Bing(School of Government Management,Beijing Normal University,Beijing,100875;Institute of Scientific and Technical Information of China,Beijing,100038)

机构地区:[1]北京师范大学政府管理学院,北京100875 [2]中国科学技术信息研究所,北京100038

出  处:《情报杂志》2019年第11期175-180,共6页Journal of Intelligence

摘  要:[目的/意义]文献检索中,特定账号可能以独享和共享的方式被使用。在理解用户信息需求确保个性化服务的精准性的问题上,首先要排除共享账号的群体所产生的各异行为对理解用户需求造成的干扰。因此,需要识别用户的行为边界,即某个账号的访问者是个体还是群体。[方法/过程]从科研用户的日志数据中提取行为习惯和主题偏好两方面特征,构建基于科研用户小数据和随机森林分类的个体用户识别模型,并以国家科技数字图书馆网站为例进行实证研究。[结果/结论]实验表明,提出的方法能够有效识别学术搜索日志中的个体用户,准确率约为92.9%,其中主题一致性是区别个体与群体科研用户的最重要特征。本研究不仅可以帮助识别个体用户和机构用户,优化用户管理,而且为跨设备的同一用户判定提供思路。[Purpose/Significance]In academic search system,user account may be occupied by only one individual or shared by multiple individuals.In order to provide accurate and personalized service,we should remove behaviors produced by shared accounts to better understand information needs of individual.Therefore,it is necessary to identify whether the visitor of an account is individual or non-individual.[Method/Process]Firstly,we extract features of search behavior and literature subject from log of academic user.Then,we propose a method to identify individual based on small data of academic user and random forest classification algorithmFinally,we conduct empirical research on log of National Science and Technology Digital Library.[Results/Conclusions]Experiments show that the random forest algorithm based on the features of search behavior and literature subject is effective in identifying individual accessing from massive log,with the precision of 92.9%.Topic consistency is the most important feature to distinguish individual and non-individual.This study can not only help optimize user management,but also provide ideas to the same user identification from cross-device.

关 键 词:科研用户 学术搜索日志 小数据 个体用户识别 随机森林分类 

分 类 号:G252.7[文化科学—图书馆学] TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象