基于CART决策树的活跃读者群体预测模型研究  

Research on Prediction Model of Active Readers Group Based on CART Decision Tree

在线阅读下载全文

作  者:只莹莹[1] ZHI Yingying

机构地区:[1]中国国家图书馆,北京100081

出  处:《图书情报导刊》2022年第8期71-77,共7页Journal of Library and Information Science

基  金:国家重点研发计划课题“公共文化资源服务效能评估及大数据智能分析平台构建研究”(项目编号:2019YFC1521404)。

摘  要:国家图书馆作为国家级公益性文化机构,拥有海量的馆藏资源和庞大的用户群体,多年来在资源建设、读者行为等方面积累了丰富的数据,基于这些重要数据进行科学预测非常有意义。从已有的读者信息和借阅数据中获取到与读者活跃度相关的所有特征,利用CART决策树构建了未来活跃读者群体的预测模型,并通过调参和剪枝等手段得到模型的最高得分。结果表明:读者ID、读者年龄、所借单册的分类号和出版社是影响读者活跃度的主要因子。该模型在5万条以上大样本数据量上呈现较好的预测能力,并通过交叉验证可以稳定模型的平均准确程度,避免了训练样本的随机性。As a national public welfare cultural institution,the National Library has a large number of collection resources and huge user groups. Over the years,it has accumulated rich data in resource construction and reader behavior.It is of great significance to make scientific prediction based on these important data. This paper obtains all the characteristics related to reader activity from the existing reader information and borrowing data,constructs the prediction model of future active readers group by using CART decision tree,and obtains the highest score of the model by means of parameter adjustment and pruning. The results show that reader ID,reader age,the classification number and publishing house of the borrowed single volume are the main factors affecting the activity of readers. The model shows good prediction ability in the large sample data of more than 50 000,and the average accuracy of the model is stabilized through cross validation to avoid the randomness of training samples.

关 键 词:图书馆 活跃读者 CART决策树 预测模型 

分 类 号:G252.0[文化科学—图书馆学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象