面向规范性文件的基于BERT的文本纠错模型被引量：4

BERT-based Text Error Correction Model for Normative Documents

作　　者：汪苏琪王明文[1] 曾雪强[1] WANG Suqi;WANG Mingwen;ZENG Xueqiang(College of Computer Information Engineering,Jiangxi Normal University,Nanchang 330022,China)

机构地区：[1]江西师范大学计算机信息工程学院,江西南昌330022

出　　处：《山西大学学报（自然科学版）》2022年第2期257-263,共7页Journal of Shanxi University(Natural Science Edition)

基　　金：国家自然科学基金(61866017;61866018;61876074;61966019);江西省自然科学基金(20192BAB207027)。

摘　　要：针对行政规范性文件的文本纠错任务,提出了基于BERT(Bidirectional Encoder Representations from Transformers)的文本纠错模型,模型针对冗余、缺失、错序、错字四类任务分别建模,分为检错和纠错两个阶段。检错阶段检查出文本是否有错、错误的位置以及错误的类型等内容,纠错阶段运用BERT掩码语言模型和混淆集匹配的方法预测文本缺失内容。实验结果表明:新提出的基于BERT的文本纠错模型在行政规范性文件的文本纠错任务中的F1值为71.89%,比经典的中文文本纠错工具Pycorrector提升了9.48%。This paper proposes a text error correction model based on BERT(Bidirectional Encoder Representations from Transformers) for the text error correction task of administrative normative documents.The model is modeled separately for four tasks such as redundancy error,missing error,wrong sequence error,and typo error.It is divided into two stages:error detection and error correction.In the error detection stage,the model checks the right or wrong of text,the location of error,and the type of error,etc.In the error correction stage,the model uses the BERT mask language model and the confusion set matching method to predict the missing information of the text.On the audit document error correction data set which is constructed in this paper.The experimental results show that the newly proposed BERT-based text error correction model has an F1 code of 71.89%,which is 9.48% higher than the classic Chinese text error correction tool Pycorrector in the text error correction task of administrative normative documents.

关键词：中文文本纠错行政规范性文件 BERT BiLSTM 条件随机场

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

面向规范性文件的基于BERT的文本纠错模型被引量：4

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

面向规范性文件的基于BERT的文本纠错模型 被引量：4

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

面向规范性文件的基于BERT的文本纠错模型被引量：4