基于RoBERTa和LightGBM的中文图书采选模型研究  

Research on Chinese Book Acquisition Model Based on RoBERTa and LightGBM

作  者:钟建法[1] 孟子正 ZHONG Jianfa;MENG Zizheng

机构地区:[1]厦门大学图书馆,福建厦门361005 [2]厦门大学经济学院,福建厦门361005

出  处:《大学图书馆学报》2025年第1期82-92,共11页Journal of Academic Libraries

基  金:福建省社会科学基金项目“基于机器学习的图书馆纸电图书协同采选模型构建及其应用研究”(项目批准号:FJ2023B111)的研究成果之一。

摘  要:在对智能图书采选模型构建方法进行综述和对相关机器学习算法进行介绍基础上,探索基于RoBERTa和LightGBM构建高校图书馆中文图书采选机器学习模型。分析模型的构建目标和研究框架,从数据来源与清洗、特征筛选与确定、衍生特征构建、基于RoBERTa模型的文本特征构造、数据编码等方面对特征工程进行详细描述,构建基于LightGBM的中文图书采选分类模型并进行模型评估,提出模型应用策略方案和后续研究建议,以期推进机器学习模型的应用发展和图书采选工作的智能化转型。Exploring intelligent book selection and model application based on big data and artificial intelligence technology is an important way of high-quality development of library collection construction.Based on the review of the intelligent book selection standards and the construction methods of book selection model,this paper introduced the functions and roles of RoBERTa model and LightGBM algorithm,and explored the construction of a machine learning model for Chinese book acquisition in university libraries based on RoBERTa and LightGBM.The purpose is to provide a reliable and effective classification prediction model and practical selection application scheme for university libraries to carry out book acquisition based on Chinese book subscription bibliography,promoting the research and application development of machine learning model and the intelligent transformation of book acquisition.It first collected the China's mainland Chinese book subscription catalogue form 2017 to 2022 and the library collection data of Xiamen University Library and processed the data according to the requirements of model construction;Secondly,it conducted feature selection based on factors of book selection,and then performed the data cleaning and standardization of the features;Thirdly,it carried out text features extraction using RoBERTa and with label coding method and expert scoring method it encoded categorical features into numerical type,forming a standardized structured data table;Fourth,it constructed the LightGBM classification model for training and prediction and used the test set data to evaluate the model and analyze the results;Finally,it proposed the model application strategy scheme and follow-up research suggestions.The experimental results show that by utilizing RoBERTa s text understanding ability and LightGBM s efficient classification performance,it can better address the difficulties encountered in the existing intelligent book selection models such as book text feature extraction,high-dimensional discrete f

关 键 词:高校图书馆 图书采访 机器学习模型 RoBERTa LightGBM 

分 类 号:G253.1[文化科学—图书馆学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象