面向政府公文的关系抽取方法研究  被引量:3

Research on Relation Extraction Method for Government Documents

在线阅读下载全文

作  者:崔从敏 施运梅[1,2] 袁博[1,2] 李云汉 李源华 周楚围 CUI Cong-min;SHI Yun-mei;YUAN Bo;LI Yun-han;LI Yuan-hua;ZHOU Chu-wei(Beijing Key Laboratory of Internet Culture Digital Dissemination,Beijing Information Science and Technology University,Beijing 100101,China;Beijing Information Science and Technology University,Beijing 100101,China)

机构地区:[1]北京信息科技大学网络文化与数字传播北京市重点实验室,北京100101 [2]北京信息科技大学,北京100101

出  处:《计算机技术与发展》2021年第12期26-32,共7页Computer Technology and Development

基  金:国家重点研发计划项目(2018YFB1004100)。

摘  要:政府公文内容多,涉及范围广,从中挖掘出有价值的信息,可减轻政府工作人员的压力,比如应用实体关系抽取技术挖掘人事信息。采用远程监督的关系抽取方法可以减少人工标注成本,提高关系抽取效率,进而保证了获取重要信息的质量和实效性。该文提出一种ALBERT预训练语言模型和胶囊网络相结合的远程监督实体关系抽取方法,抽取公文中的人名职务关系。ALBERT通过字嵌入和位置嵌入的方式,提取文本中深层的语义信息,胶囊网络通过传输低层到高层的特征,提高关系分类效果。实验结果表明,提出的关系抽取模型的准确率、召回率、F1值均高于基线方法,能够有效提高关系抽取性能,解决公文领域标注数据集少的问题。该方法所获实例可扩充现有公文领域知识库,可以辅助政府工作人员在书写公文时快速获取人事信息,避免信息传递错误。Government documents contain rich contents and cover a wide range.Mining valuable information from them can relieve the pressure on staffs,such as using entity relationship extraction technology to mine personnel information.The method of distant supervision for relation extraction can reduce the cost of manual labeling,improve the efficiency of relation extraction,and ensure the quality and effectiveness of obtaining important information.We propose a method of distant supervision for entity relation extraction based on combining ALBERT pre-training language model with capsule network to extract the person names and positions relationship in the official documents.ALBERT extracts the deep semantic information from the text by way of word embedding and position embedding.Capsule network improves relationship classification by transferring low-level to high-level features.The experiment shows that the accuracy,recall rate and F1 value of the proposed relationship extraction model are higher than the baseline method,which can effectively improve the performance of relation extraction and solve the problem of fewer labeled datasets in the field of official documents.The examples obtained in this paper can expand the existing document domain knowledge base,and help government staffs to quickly obtain personnel information when writing documents,so as to avoid information transmission errors.

关 键 词:实体关系抽取 远程监督 ALBERT 预训练语言模型 胶囊网络 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象