基金项目研究的主题挖掘与动态演化分析——以美国NSF数据中AI领域为例  被引量:15

Topic Mining and Dynamic Evolution Analysis of Funding Projects:Case Studies of AI Field in NSF Data

在线阅读下载全文

作  者:靳嘉林 王曰芬[1,2,3] 巴志超 岑咏华 Jin Jialin;Wang Yuefen;Ba Zhichao;Cen Yonghua(School of Economics&Management,Nanjing University of Science&Technology,Nanjing 210094;Management School of Tianjin Normal University,Tianjin 300387;Institute for Big Data Science,Tianjin Normal University,Tianjin 300387;Laboratory of Data Intelligence and Interdisciplinary Innovation,Nanjing University,Nanjing 210023)

机构地区:[1]南京理工大学经济管理学院,南京210094 [2]天津师范大学管理学院,天津300387 [3]天津师范大学大数据科学研究院,天津300387 [4]南京大学数据智能与交叉创新实验室,南京210023

出  处:《情报学报》2022年第9期967-979,共13页Journal of the China Society for Scientific and Technical Information

基  金:国家社会科学基金重大项目“面向知识创新服务的数据科学理论与方法研究”(16ZDA224)。

摘  要:本文旨在构架基金项目研究主题挖掘与动态演化分析的情报研究流程,通过对表征基金项目标题、摘要与学部的数据进行关联建模和挖掘,从项目内容层面上探究基金资助领域研究涉及的主题特点、范围侧重、发展方向及演化脉络。首先利用RAKE (rapid automatic keyword extraction)关键词抽取算法从基金数据中的标题和摘要中抽取关键词,通过术语切分等方式获得核心关键词;然后,采用Google的word2vec深度学习工具对核心关键词进行词向量建模,并使用k-means算法对生成的词向量进行聚类,挖掘相应的研究主题;进而对主题分布进行统计分析,且通过WMD (word mover’s distance)算法计算主题之间的相似度,以分析研究主题演化趋势,并识别出演化主路径。实证研究发现,以美国NSF (National Science Foundation)数据中AI (artificial intelligence)领域为例,所提方法流程能够识别出AI领域的多个主题,且能识别出不同学部的主题侧重;在发展过程中,研究主题演化呈现出大量分裂与融合的复杂态势,演化路径明晰,侧重点突出,通过演化强度能够明晰研究主题演化的主路径。研究结果表明,本文方法流程能够有效揭示基金资助对相关技术的整合与推动态势,可为学术研究与政府规划提供有力的支撑。This study aims to construct an analysis process of topic mining and dynamic evolution for funding projects data.By modeling and mining the relationship between funding projects and the directorate using title and abstract metadata,this proposed process can locate the characteristics of topics,the scope and focus of the directorate,the developing direction,and the evolution of a certain funding field from the perspective of funding project content.Firstly,keywords are extracted from the title and abstract of the funding projects using the rapid automatic keyword extraction(RAKE) algorithm,and the core keywords are obtained through the term segmentation method.Subsequently,the word vector is modeled with the core keywords using the Google word2vec deep learning tool,and the word vector is clustered to mine the topics through the k-means algorithm.Finally,the distribution of topics is described and the similarity between the topics is calculated using the word mover’s distance(WMD) algorithm for analyzing the evolutionary trend and the primary evolutionary path of the topics.Using artificial intelligence(AI) in the National Science Foundation(NSF) data,it is discovered that the proposed process can recognize the topics within the AI field and the specific focus of the different directorates.Moreover,through the proposed process,it is observed that the evolution of these topics presents a complex situation of a large number of division and integration,a clear evolutionary path,a prominent focus.and a key evolutionary path through the evolutionary intensity of topics,which indicate that this process can reveal the funding direction for integrating and promoting the related technology in certain fields and can provide strong support for academic research and government planning.

关 键 词:主题挖掘 动态演化 词向量建模 美国国家科学基金 人工智能 

分 类 号:TP18[自动化与计算机技术—控制理论与控制工程] G353.1[自动化与计算机技术—控制科学与工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象