检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:唐晓波[1,2] 王琼赋 牟昊 TANG Xiao-bo;WANG Qiong-fu;MOU Hao(Center for Studies of Information System,Wuhan University,Wuhan 430072,China;School of Information Management,Wuhan University,Wuhan 430072,China;State Grid Sichuan Electric Power Company,Chengdu 610041,China)
机构地区:[1]武汉大学信息系统研究中心,湖北武汉430072 [2]武汉大学信息管理学院,湖北武汉430072 [3]国网四川省电力公司,四川成都610041
出 处:《情报科学》2022年第10期3-11,32,共10页Information Science
基 金:国家社会科学基金重大项目“基于大数据的科教评价信息云平台构建和智能服务研究”(19ZDA349)
摘 要:【目的/意义】通过概念层次关系自动抽取可以快速地在大数据集上进行细粒度的概念语义层次自动划分,为后续领域本体的精细化构建提供参考。【方法/过程】首先,在由复合术语和关键词组成的术语集上,通过词频、篇章频率和语义相似度进行筛选,得到学术论文评价领域概念集;其次,考虑概念共现关系和上下文语义信息,前者用文献-概念矩阵和概念共现矩阵表达,后者用word2vec词向量表示,通过余弦相似度进行集成,得到概念相似度矩阵;最后,以关联度最大的概念为聚类中心,利用谱聚类对相似度矩阵进行聚类,得到学术论文评价领域概念层次体系。【结果/结论】经实验验证,本研究提出的模型有较高的准确率,构建的领域概念层次结构合理。【创新/局限】本文提出了一种基于词共现与词向量的概念层次关系自动抽取模型,可以实现概念层次关系的自动抽取,但类标签确定的方法比较简单,可以进一步探究。【Purpose/significance】Through the automatic extraction of concept hierarchies,a fine-grained domain semantic hierarchical system can be quickly and automatically obtained on a large data set,which provides a reference for the subsequent refined construction of domain ontology.【Method/process】Firstly,based on the term set composed of compound terms and keywords,filtered by word frequency,text frequency and semantic similarity,the concepts of academic paper evaluation are obtained.Secondly,considering the co-occurrence relationship expressed by the paper-concept matrix and the concept co-occurrence matrix and the contextual semantic information expressed by the word2vec word vector,the concept similarity matrix is obtained through similarity integration.Finally,based on concept similarity matrix,taking the concept with the largest correlation as the clustering center,the concept hierarchies of academic paper evaluation are obtained through spectral clustering.【Result/conclusion】Experiment proves that the accuracy rate of the model proposed in this study is good,and the constructed academic papers evaluation domain hierarchical structure is reasonable.【Innovation/limitation】This paper proposes an automatic extraction model of concept hierarchies based on word co-occurrence and word vector,which can automatically extract the hierarchies.However,the method for determining the class label is relatively simple and can be further explored.
关 键 词:自动抽取 本体构建 层次关系 语义相似度 词共现
分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.33