英汉双语富媒体知识图谱构建工程研究——以CNS英文期刊为例

Research on the Construction of English-Chinese Bilingual Rich Media Knowledge Graph:A Case Study of CNS English Journal

作　　者：韦向峰[1,2] 缪建明张全袁毅[1] WEI Xiangfeng;MIAO Jianming;ZHANG Quan;YUAN Yi(Institute of Acoustics,Chinese Academy of Science,Beijing 100190,China;The Key Laboratory of Rich-Media Knowledge Organization and Service of Digital Publishing Content,Beijing 100038,China;Information Center of China North Industries Group Corporation Limited,Beijing 100089,China)

机构地区：[1]中国科学院声学研究所,北京100190 [2]富媒体数字出版内容组织与知识服务重点实验室,北京100038 [3]中国兵器工业信息中心,北京100089

出　　处：《情报工程》2023年第5期84-96,共13页Technology Intelligence Engineering

基　　金：2022年富媒体数字出版内容组织与知识服务重点实验室开放基金“基于英文科技出版物的跨语言富媒体知识工程研究”(ZD2022-10/01)。

摘　　要：[目的/意义]研究自动构建英汉双语富媒体知识图谱的方法和过程,为跨语言多模态知识图谱的自动构建提供借鉴参考,对及时获取最新英文科研成果、科技情报监测等具有重要意义。[方法/过程]采用自顶向下和自底向上相结合的方法,先从顶层设计要抽取的主要实体、属性和关系,从底层非结构化文本数据进行分析抽取细粒度的实体和属性,对有歧义实体和跨语言实体进行实体对齐,对跨媒体的实体进行实体链接,用图数据库实现知识图谱的存储及应用。[局限]未来需进一步提高细粒度实体的抽取正确率,对音视频媒体进行特征提取和内容自动识别。[结果/结论]以CNS(Cell、Nature、Science)等英文科技期刊网站为例,通过数据抓取、实体抽取、属性抽取、知识融合、跨媒体链接等过程,实现了英汉双语富媒体知识图谱的构建、存储和可视化展示。[Objective/Significance]It is of great significance for scientific and technological information monitoring and obtaining the latest English scientific research results in time,with researching the method and process of automatically constructing the English-Chinese rich media knowledge graph.It is also a meaningful experience for constructing cross-language and cross-media knowledge graph.[Methods/Processes]The approach that combines top-down and bottom-up methods is employed,starting with top-level design for extracting primary entities,attributes,and relationships.For fine-grained entities and attributes,analysis and extraction are performed from the bottom-up analyzing unstructured textual data.Ambiguous entities and cross-lingual entities require entity alignment,while cross-media entities require entity linking.By using a graph database,teh storage and its application of the knowledge graph can be implemented.[Limitations]Future works include further improving the accuracy of fine-grained entity extraction,extracting features and automatically recognizing content for audio and video media.[Results/Conclusions]Taking CNS(Cell,Nature,Science)and other English scientific and technological journal websites as an example,this paper successfully constructed a bilingual English-Chinese multimedia knowledge graph through data scraping,entity extraction,attribute extraction,knowledge fusion,cross-media linking.

关键词：富媒体知识图谱实体抽取实体对齐语步识别

分类号：G35[文化科学—情报学] TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

英汉双语富媒体知识图谱构建工程研究——以CNS英文期刊为例

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

英汉双语富媒体知识图谱构建工程研究——以CNS英文期刊为例

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索