融合“科学—技术—舆情”数据特征的技术筛选方法研究  

Research on Technology Filtering by Integrating S&T and Public Opinion Data Features

在线阅读下载全文

作  者:吕璐成 周健[3] 赵展一 赵亚娟[1,2] 刘细文 LüLucheng;Zhou Jian;Zhao Zhanyi;Zhao Yajuan;Liu Xiwen(National Science Library,Chinese Academy of Sciences,Beijing 100190;Department of Library,Information and Archives Management,School of Economics and Management,University of Chinese Academic of Sciences,Beijing 100190;Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190)

机构地区:[1]中国科学院文献情报中心,北京100190 [2]中国科学院大学经济与管理学院信息资源管理系,北京100190 [3]中国科学院计算技术研究所,北京100190

出  处:《情报理论与实践》2024年第10期173-182,共10页Information Studies:Theory & Application

基  金:国家自然科学基金青年科学基金项目“技术距离视角下的技术融合模式、特征及预测研究”(项目编号:72304268);中国科学院青年创新促进会项目(项目编号:E2291801)的成果。

摘  要:[目的/意义]利用科技文献进行技术监测预警是科技情报工作的重要内容。目前,采用自然语言处理技术从科技文献中抽取的技术元素存在数量多、不易展示的问题,因此设计了一种融合“科学—技术—舆情”数据特征的技术筛选方法来实现科技文献技术挖掘结果的筛选。[方法/过程]以技术术语表示技术,基于词法结构分析和修饰符匹配方法构建技术术语层次结构体系,利用表征技术基础研究热度的论文数据、表征技术研发热度的专利数据、表征技术市场关注度的舆情数据,构建重要性、成长性、新颖性和持久性4类特征,采用机器学习方法训练和确定技术筛选模型。[结果/结论]通过与人工筛选结果对比发现,本方法能够更有效地筛选技术。在各种模型中,同时采用3类数据和4类特征构建的技术筛选模型效果最优,该方法可以为开展技术识别预测工作,研发技术挖掘工具提供依据。[局限]该方法仅在技术术语层次结构的第一层进行了效果验证,其领域适用性与数据类型方面还有待进一步研究。[Purpose/significance]The application of S&T data for technical monitoring and early warning is an important issue for the research of S&T information.At present,there are problems with a large number of technical elements extracted from S&T literature using natural language processing technology,which are difficult to display.This paper proposes a technical filtering approach that integrates the features of S&T and public opinion data to screen and filter technical mining results.[Method/process]In this paper,technologies are represented by technical terms extracted from the data.Based on lexical structure analysis and modifier matching methods,a hierarchical structure system of technical terminology is constructed.Using paper data representing the popularity of basic research on technology,patent data representing the popularity of technology research and development,and public opinion data representing the market attention of technology,four types of features are constructed:importance,growth,novelty,and persistence.Machine learning methods are used to train and determine the technology filtering model.[Result/conclusion]By comparing the results with manual filtering,it was found that this method is more effective in filtering techniques.Among various models,the technology filtering model constructed using three types of data and four types of features simultaneously has the best performance.This method can provide a basis for conducting technology identification and prediction work,and developing technology mining tools.[Limitations]This method has only been validated in the first layer of the technical terminology hierarchy,and further research is needed in terms of domain applicability and data types.

关 键 词:技术筛选 技术挖掘 多源数据融合 文本挖掘 机器学习 技术识别与预测 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象