英汉双语富媒体知识图谱构建工程研究——以CNS英文期刊为例  

Research on the Construction of English-Chinese Bilingual Rich Media Knowledge Graph:A Case Study of CNS English Journal

在线阅读下载全文

作  者:韦向峰[1,2] 缪建明 张全 袁毅[1] WEI Xiangfeng;MIAO Jianming;ZHANG Quan;YUAN Yi(Institute of Acoustics,Chinese Academy of Science,Beijing 100190,China;The Key Laboratory of Rich-Media Knowledge Organization and Service of Digital Publishing Content,Beijing 100038,China;Information Center of China North Industries Group Corporation Limited,Beijing 100089,China)

机构地区:[1]中国科学院声学研究所,北京100190 [2]富媒体数字出版内容组织与知识服务重点实验室,北京100038 [3]中国兵器工业信息中心,北京100089

出  处:《情报工程》2023年第5期84-96,共13页Technology Intelligence Engineering

基  金:2022年富媒体数字出版内容组织与知识服务重点实验室开放基金“基于英文科技出版物的跨语言富媒体知识工程研究”(ZD2022-10/01)。

摘  要:[目的/意义]研究自动构建英汉双语富媒体知识图谱的方法和过程,为跨语言多模态知识图谱的自动构建提供借鉴参考,对及时获取最新英文科研成果、科技情报监测等具有重要意义。[方法/过程]采用自顶向下和自底向上相结合的方法,先从顶层设计要抽取的主要实体、属性和关系,从底层非结构化文本数据进行分析抽取细粒度的实体和属性,对有歧义实体和跨语言实体进行实体对齐,对跨媒体的实体进行实体链接,用图数据库实现知识图谱的存储及应用。[局限]未来需进一步提高细粒度实体的抽取正确率,对音视频媒体进行特征提取和内容自动识别。[结果/结论]以CNS(Cell、Nature、Science)等英文科技期刊网站为例,通过数据抓取、实体抽取、属性抽取、知识融合、跨媒体链接等过程,实现了英汉双语富媒体知识图谱的构建、存储和可视化展示。[Objective/Significance]It is of great significance for scientific and technological information monitoring and obtaining the latest English scientific research results in time,with researching the method and process of automatically constructing the English-Chinese rich media knowledge graph.It is also a meaningful experience for constructing cross-language and cross-media knowledge graph.[Methods/Processes]The approach that combines top-down and bottom-up methods is employed,starting with top-level design for extracting primary entities,attributes,and relationships.For fine-grained entities and attributes,analysis and extraction are performed from the bottom-up analyzing unstructured textual data.Ambiguous entities and cross-lingual entities require entity alignment,while cross-media entities require entity linking.By using a graph database,teh storage and its application of the knowledge graph can be implemented.[Limitations]Future works include further improving the accuracy of fine-grained entity extraction,extracting features and automatically recognizing content for audio and video media.[Results/Conclusions]Taking CNS(Cell,Nature,Science)and other English scientific and technological journal websites as an example,this paper successfully constructed a bilingual English-Chinese multimedia knowledge graph through data scraping,entity extraction,attribute extraction,knowledge fusion,cross-media linking.

关 键 词:富媒体 知识图谱 实体抽取 实体对齐 语步识别 

分 类 号:G35[文化科学—情报学] TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象