检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:周雨昊 孙哲[1] 吴晓非[1] 禹可[1] ZHOU Yuhao;SUN Zhe;WU Xiaofei;YU Ke(School of Artificial Intelligence,Beijing University of Posts and Telecommunications,Beijing 100876,China)
出 处:《北京邮电大学学报》2023年第4期91-96,122,共7页Journal of Beijing University of Posts and Telecommunications
基 金:国家自然科学基金项目(61601046)。
摘 要:针对在中文错别字纠正中,平等地融合汉字的语义、读音和字形信息进行建模的方法会由于错误的读音或字形信息而影响模型性能的问题,提出了一种基于门控特征融合的中文错别字纠正模型,利用自适应门控来选择性地融合语义、读音和字形信息,提升模型性能并加强模型的可解释性。此外,使用改进的四角号码编码汉字的字形信息,有效地提取了汉字的字形特征,并且基于此扩展了模型预训练时的字形相似混淆集。使用了基于混淆集替换的预训练掩码策略,使模型能有效学习文本错误知识。在公开数据集SIGHAN13、SIGHAN14和SIGHAN15上,所提模型分别取得了78.7%、67.8%和77.7%的纠错F1分数,相比于最优基线模型分别提升了1.5%、1.5%和1.0%。In response to the problem of model performance being affected by incorrect pronunciation or glyph when fusing semantic,phonetic and glyph information of Chinese characters equally in Chinese spelling correction,a Chinese spelling correction model based on gated feature fusion is proposed,which uses adaptive gates to selectively fuse semantic,phonetic and glyph information to improve the performance of the model and enhance the interpretability of the model.The improved four corner code is used to encode the glyph features of Chinese characters,effectively extracting the glyph features of Chinese characters,and based on this,the glyph similarity confusion set in the pre-training stage of the model is expanded.The pre-training mask strategy based on confusion set replacement is used to enable the model to effectively learn the erroneous knowledge contained in the text.On the public SIGHAN13,SIGHAN14 and SICHAN15 datasets,the proposed model achieves correction F1-scores of 78.7%,67.8%and 77.7%,respectively,which are 1.5%,1.5%and 1.0%higher than the optimal baseline model.
分 类 号:TP183[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.49