检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:只莹莹[1] ZHI Yingying
机构地区:[1]中国国家图书馆,北京100081
出 处:《图书情报导刊》2022年第8期71-77,共7页Journal of Library and Information Science
基 金:国家重点研发计划课题“公共文化资源服务效能评估及大数据智能分析平台构建研究”(项目编号:2019YFC1521404)。
摘 要:国家图书馆作为国家级公益性文化机构,拥有海量的馆藏资源和庞大的用户群体,多年来在资源建设、读者行为等方面积累了丰富的数据,基于这些重要数据进行科学预测非常有意义。从已有的读者信息和借阅数据中获取到与读者活跃度相关的所有特征,利用CART决策树构建了未来活跃读者群体的预测模型,并通过调参和剪枝等手段得到模型的最高得分。结果表明:读者ID、读者年龄、所借单册的分类号和出版社是影响读者活跃度的主要因子。该模型在5万条以上大样本数据量上呈现较好的预测能力,并通过交叉验证可以稳定模型的平均准确程度,避免了训练样本的随机性。As a national public welfare cultural institution,the National Library has a large number of collection resources and huge user groups. Over the years,it has accumulated rich data in resource construction and reader behavior.It is of great significance to make scientific prediction based on these important data. This paper obtains all the characteristics related to reader activity from the existing reader information and borrowing data,constructs the prediction model of future active readers group by using CART decision tree,and obtains the highest score of the model by means of parameter adjustment and pruning. The results show that reader ID,reader age,the classification number and publishing house of the borrowed single volume are the main factors affecting the activity of readers. The model shows good prediction ability in the large sample data of more than 50 000,and the average accuracy of the model is stabilized through cross validation to avoid the randomness of training samples.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.145