检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:孙哲[1] 禹可[1] 吴晓非[1] Sun Zhe;Yu Ke;Wu Xiaofei(School of Artificial Intelligence,Beijing University of Posts&Telecommunications,Beijing 100876,China)
出 处:《计算机应用研究》2023年第8期2292-2297,共6页Application Research of Computers
摘 要:中文拼写纠错是一项检测和纠正文本中拼写错误的任务。大多数中文拼写错误是在语义、读音或字形上相似的字符被误用,因此常见的做法是对不同模态提取特征进行建模。但将不同特征直接融合或是利用固定权重进行求和,使得不同模态信息之间的重要性关系被忽略以及模型在识别错误时会出现偏差,阻止了模型以有效的方式学习。为此,提出了一种新的模型以改善这个问题,称为基于文本序列错误概率和中文拼写错误概率融合的汉语纠错算法。该方法使用文本序列错误概率作为动态权重、中文常见拼写错误概率作为固定权重,对语义、读音和字形信息进行了高效融合。模型能够合理控制不同模态信息流入混合模态表示,更加针对错误发生处进行学习。在SIGHAN基准上进行的实验表明,所提模型的各项评估分数在不同数据集上均有提升,验证了该算法的可行性。Chinese spelling error correction is a task to detect and correct spelling errors in text.Most Chinese spelling errors are the misuse of semantically,phonetically or morphologically similar characters,so it is common to extract features for mode-ling different modalities.However,the direct fusion of different features or summation using fixed weights prevent the model from learning in an efficient way by ignoring the importance relationship between the information of different modalities and the bias of the model in identifying errors.This paper proposed a new model to improve this problem,called the Chinese error correction algorithm based on the fusion of text sequence error probability and Chinese spelling error probability.The method used the text sequence error probability as the dynamic weight and the common Chinese spelling error probability as the fixed weight to efficiently fuse semantic,phonetic and morphologic information.The model was able to reasonably control the inflow of different modal information into the mixed modal representation and learnt more specifically where the errors occurred.Experiments conducted on the SIGHAN benchmark show that all evaluation scores of the proposed model are improved on different datasets,which validates the feasibility of the algorithm.
关 键 词:中文拼写纠错 错误概率 预训练 信息融合 序列到序列模型
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.220.70.192