检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:黄胜[1,2] 王博博 朱菁 HUANG Sheng;WANG Bo-bo;ZHU Jing(School of Communication and Information Engineering,Chongqing University of Posts and Telecommunications,Chongqing 400065,China;Key Laboratory of Optical Communications and Networking,Chongqing University of Posts and Telecommunications,Chongqing 400065,China;Data Center,Shenzhen Securities Information Limited Company,Shenzhen 518000,China)
机构地区:[1]重庆邮电大学通信与信息工程学院,重庆400065 [2]重庆邮电大学光通信与网络重点实验室,重庆400065 [3]深圳证券信息有限公司数据中心,广东深圳518000
出 处:《计算机工程与设计》2020年第1期115-121,共7页Computer Engineering and Design
基 金:国家自然科学基金项目(61371096)
摘 要:针对金融类公告中的结构化数据难以被高效快速提取的问题,提出一种基于文档结构与Bi-LSTM-CRF网络模型的信息抽取方法。自定义一种文档结构树生成算法,利用规则从文档结构树中抽取所需节点信息;构建基于信息句触发词的局部句子规则,抽取包含结构化字段信息的信息句;将字段的结构化信息抽取看作序列标注问题,分词时加入领域知识词典,构建基于Bi-LSTM-CRF的神经网络模型进行字段信息识别。实验结果表明,该信息抽取方法可以满足多类型公告的结构化信息提取,最终的信息句与字段信息抽取的平均F1值均可达到91%以上,验证了该方法在产品业务中的可行性和实用性。Structured data in financial bulletins are difficult to extract efficiently and quickly,a method of extracting information based on document structure and Bi-LSTM-CRF network model was proposed.A document structure tree generation algorithm was defined to extract the required node information from the document structure tree by using rules.A local sentence rule based on trigger words of information sentences was constructed to extract information sentences containing structured field information.The structured information extraction of field was regarded as the problem of sequence labeling.A domain knowledge dictionary was added to the word segmentation,and a Bi-LSTM-CRF based neural network model was constructed to recognize field information.Experimental results show that the information extraction method can satisfy the structural information extraction of multi-type announcements.The average F1 value of the final information sentence and field information extraction can reach over 91%,which verifies the feasibility and practicability of the proposed method in product business.
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.190.159.222