基于决策树和逻辑回归模型的人工智能领域潜在“精品”论文识别研究  被引量:9

Identification of“Hidden Treasures”among Massive Literature from Artificial Intelligence Field Based on Decision Tree and Logistic Regression

在线阅读下载全文

作  者:崔静静 胡泽文 任萍 CUI Jing-jing;HU Ze-wen;REN Ping(School of Management Science and Engineering,Nanjing University of Information Science&Technology,Nanjing 210044,China)

机构地区:[1]南京信息工程大学管理工程学院,江苏南京210044

出  处:《情报科学》2022年第5期90-96,共7页Information Science

基  金:国家社会科学基金项目“面向海量科技文献的潜在‘精品’识别方法与应用研究”(20CTQ031)。

摘  要:【目的/意义】海量科技文献中存在大量潜在“精品”文献,如何识别并利用此类文献是目前较具现实意义的研究问题。【方法/过程】本文以Web of Science数据库中人工智能领域1990-2010年期间的文献原文及引文数据为样本,构建该领域文献原文-引文特征向量空间,融合决策树和逻辑回归模型对文献特征向量空间进行模型训练和潜在“精品”论文识别的测试应用。【结果/结论】实验结果表明,“发表五年后被引量”特征变量的加入能够显著提升决策树和逻辑回归模型的识别分类效果,使得两类模型的识别准确率分别达到84%和89%以上,提升幅度达到20多个百分点。逻辑回归模型的识别效果始终优于决策树模型,通过调整两种模型的超参数,能够使得模型获得更理想的识别效果。此外,早期人工智能领域科学研究仍处于小团队协作阶段,领域文献的基金支持和开放获取程度较低。【创新/局限】尽管论文创新性引入机器学习方法实现潜在“精品”文献识别模型的建模与应用,然而仍需将模型拓展到更多学科领域。【Purpose/significance】There are a large number of excellent papers in the scientific literature that have not been found.Identifying and making use of these excellent papers have important practical significance at present.【Method/process】In this study,we use the 1990-2010 original and citation literature data in the field of artificial intelligence from the Web of Science database to construct the original paper-citation feature vector space,and use the decision tree and logistic regression for model training and testing.【Result/conclusion】The result shows that the indicator of"citations during five years after publication"can significantly improve the recognition effect of decision trees and logistic regression,making the accuracy of the two models reach 84%and 89%respectively,and the increase rate reached more than 20%.The recognition effect of the logistic regression is always better than that of the decision tree.By adjusting the hyperparameters of the two models,the model can obtain a better recognition effect.In addition,early scientific research in the field of artificial intelligence is still in the stage of small team collaboration,and the degree of funding and open access to this field literature is low.【Innovation/limitation】We innovatively introduce machine learning methods to realize the recognition models of“hidden treasures”among massive literature.However,we need apply these recognition models into more disciplines.

关 键 词:决策树 逻辑回归模型 人工智能 潜在“精品”论文 高质量论文 

分 类 号:G250.2[文化科学—图书馆学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象