一种面向催化材料领域的文献信息抽取方法

A literature information extraction method for catalytic materials

作　　者：高强[1] 张仰森[1] 孙圆明贾启龙 GAO Qiang;ZHANG Yangsen;SUN Yuanming;JIA Qilong(Institute of Intelligent Information Processing,Beijing Information Science&Technology University,Beijing 100192,China)

机构地区：[1]北京信息科技大学智能信息处理实验室,北京100192

出　　处：《北京信息科技大学学报（自然科学版）》2024年第2期50-56,共7页Journal of Beijing Information Science and Technology University

基　　金：北京材料基因工程高精尖创新中心项目。

摘　　要：为有效利用PDF文献中的非结构化文本数据,面向费托合成催化材料领域文献,设计了关键信息抽取流水线从PDF文献中抽取表格及其相应注释等关键信息。以微分二值化网络(differentiable binarization network, DBNet)为基准模型,通过引入自适应空间注意力(adaptive spatial attention, ASA)模块,提出了DB-ASA文本检测模型,提高了检测精度。采用单视觉文本识别模型(scene text recognition with a single visual model, SVTR)进行文本识别,结合领域字典文件在自建数据集上对模型进行微调,文本识别准确率可达93.87%。In order to effectively utilize the unstructured text data in PDF literature in the Fischer-Tropsch synthesis of catalytic materials,a key information extraction pipeline was designed to extract key information such as tables and corresponding annotations from PDF documents.A DB-ASA text detection model was proposed by using the differentiable binarization network(DBNet)as a benchmark model and introducing an adaptive spatial attention(ASA)module,resulting in improved detection accuracy.Using scene text recognition with a single visual model(SVTR)for text recognition,the model was fine-tuned on a self-built dataset by combining domain dictionary files,achieving a text recognition accuracy of 93.87%.

关键词：催化材料费托合成信息抽取文本识别

分类号：TP391.1[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种面向催化材料领域的文献信息抽取方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种面向催化材料领域的文献信息抽取方法

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索