全文本视角下的核心引文测度:概念界定、指标体系与识别模型  

Essential Reference Measurements from the Perspective of Full-Text:Concept Definition,Index System,and Identification Model

在线阅读下载全文

作  者:林歌歌 侯海燕[1] 潘宇馨 梁国强 胡志刚 Lin Gege;Hou Haiyan;Pan Yuxin;Liang Guoqiang;Hu Zhigang(School of Public Administration and Policy,Dalian University of Technology,Dalian 116024;College of Economics and Management,Beijing University of Technology,Beijing 100124;Institute for Science,Technology and Society,South China Normal University,Guangzhou 510006)

机构地区:[1]大连理工大学公共管理学院,大连116024 [2]北京工业大学经济与管理学院,北京100124 [3]华南师范大学科学技术与社会研究院,广州510006

出  处:《情报学报》2024年第10期1199-1212,共14页Journal of the China Society for Scientific and Technical Information

基  金:教育部哲学社会科学研究重大课题攻关项目“基础研究领域颠覆性科研成果识别与我国基础研究能力提升研究”(22JZD021);国家自然科学基金项目“基于引用行为的学术评价体系的构建与实证研究”(71974030);中央高校基本科研业务费专项资金资助项目“我国基础研究领域颠覆性科研能力评估”(DUT23RW302)。

摘  要:识别施引文献中的核心引文是深入开展科技成果评价的重要基础。为此,本文探讨了全文本视角下的核心引文测度,包括概念界定、指标体系构建及识别模型的优化,从而提供一个更为精准的科学评价工具。首先,明确核心引文的定义,构建包含题录信息和引用信息2个维度、8个子维度及33个引文特征指标的核心引文识别指标体系。其次,通过多种机器学习模型(如随机森林、支持向量机、逻辑回归)对引文特征指标进行遴选与优化,分析其相关性及信息增益,保留21个重要的引文特征指标,并验证识别模型的有效性。研究结果表明,基于引用信息的引文特征指标在识别核心引文时具有更高的重要性和贡献度。机器学习模型在核心引文识别中的表现优异,特别是随机森林、支持向量机、逻辑回归等模型,其ROC (receiver operating characteristic)曲线的AUC (area under curve)值均大于0.85,证明了模型的高效性和鲁棒性。核心引文测度方法及识别模型不仅为科学评价体系提供了更精准的工具,也为深入研究引文分析奠定了坚实的基础。Identifying essential references within citing documents is fundamental for conducting thorough evaluations of scientific achievements.Therefore,this study explores the measurement of essential references from the perspective of full text that includes the definition of concepts,construction of an indicator system,and optimization of identification models,thereby providing a more precise scientific evaluation tool.First,the definition of essential references was clarified,and an indicator system for identifying essential references was constructed,encompassing two dimensions(bibliographic and citation information),eight sub-dimensions,and 33 citation feature indicators.Second,by utilizing various machine learning models,such as random forest,support vector machine,and logistic regression,citation feature indicators were selected and optimized.Their correlations and information gains were analyzed,and 21 important citation feature indicators were retained,to validate the effectiveness of the identification models.The results indicate that citation feature indicators based on citation information hold greater importance and contribute more to the identification of essential references.The performance of machine learning models in identifying essential references was excellent,particularly for the random forest,support vector machine,and logistic regression models,with area under receiver operating characteristic curve(AUC)values exceeding 0.85,demonstrating the efficiency and robustness of the models.The core citation measurement methods and identification models not only provide more accurate tools for scientific evaluation systems but also lay a solid foundation for further in-depth research into citation analysis.

关 键 词:核心引文 引用信息 题录信息 机器学习 全文引文分析 

分 类 号:G353.1[文化科学—情报学] G203

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象