Smap:基于文献语义的学科知识图景可视化  

Smap:Visualization of Scientific Knowledge Landscape Based on Document Semantics

在线阅读下载全文

作  者:张爽 刘非凡[1,2] 罗双玲[3] 夏昊翔 Zhang Shuang;Liu Feifan;Luo Shuangling;Xia Haoxiang(Institute of Systems Engineering,Dalian University of Technology,Dalian 116024;Research Center for Big Data and Intelligent Decision-Making,Dalian University of Technology,Dalian 116024;School of Maritime Economics and Management,Dalian Maritime University,Dalian 116026)

机构地区:[1]大连理工大学系统工程研究所,大连116024 [2]大连理工大学大数据与智能决策研究中心,大连116024 [3]大连海事大学航运经济与管理学院,大连116026

出  处:《情报学报》2023年第1期74-89,共16页Journal of the China Society for Scientific and Technical Information

基  金:国家自然科学基金面上项目“协同创新网络中的集体智能动态机理研究”(71871042);教育部人文社科规划项目“综合数据解析和动力学建模的科研协作系统演化模式与机理研究”(18YJA630118);辽宁省社会科学规划基金项目“大数据背景下图书馆科技情报知识服务的支撑方法研究”(L16BTQ003)。

摘  要:随着文献爆炸式增长,学科领域不断交叉融合,科研规模扩大和知识体系复杂性日益提升,如何清晰地可视化学科知识图景,进而把握知识结构和研究态势,引起了科技情报人员的广泛关注。本研究基于文档表示学习和流形学习算法,提供了一种科学领域语义地图(semantic map,Smap)构建方法。首先以Doc2Vec捕获文献间的高维语义特征,然后利用UMAP(uniform manifold approximation and projection)对文献语义临近性进行非线性降维,最后以核密度估计根据文献分布异质性刻画领域知识结构。在实证分析阶段,本研究对文献规模覆盖了从千级到百万级的4个学科领域,进行了领域可视化、知识层级结构识别以及动态演化分析。进而,本研究借助引用关系、关键词以及数据集的分类体系,通过量化Smap地图上文献分布的局部纯粹性以及全局地图距离和研究差异的相关性,验证了所提方法的有效性。本研究通过与随机实验对比,进一步地量化了有效性的显著程度。本研究为当前科学领域可视化方法提供了有益补充,可为大规模科技文献数据驱动的科技情报服务提供分析工具。Given the explosive growth of academic literature,the continuous cross-fusion of knowledge,and the expansion and the increasing complexity of scientific research,widespread attention has been drawn to clearly visualizing the knowledge structure drown in massive amounts of literature as well as grasping development trends.Based on document representation learning and manifold learning algorithms,we suggest a method for constructing a semantic map(Smap).First,Doc2Vec is adopted to capture the high-dimensional semantic features between documents;then,UMAP(uniform manifold approximation and projection)is utilized to perform non-linear dimensionality reduction on the semantic proximity of documents.Finally,the kernel density estimation is employed to characterize the knowledge structure according to the heterogeneity of the document distribution.In the empirical experiments,we cover four scientific domains,ranging from thousands-level to millions-level of documents.Then,we construct an Smap,identify knowledge hierarchical structure,and analyze their dynamic evolution.Furthermore,using the classification system provided by Microsoft Academic Graph(MAG),citation relations,and keywords,we quantify the local purity of the document distribution on Smap and the correlation between the map distance and research distinction to verify the effectiveness of the proposed method.By comparing with controlled experiments,we further demonstrate the significance of the effectiveness of our method.This study expands the current methods of visualization systems in the scientific field and provides an alternative visualization method for scientific and technological information services.

关 键 词:语义地图 知识结构可视化 深度学习 流形学习 

分 类 号:G353.1[文化科学—情报学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象