检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:尚海怡 黄继风[1] 陈海光[1] SHANG Haiyi;HUANG Jifeng;CHEN Haiguang(College of Information,Mechanical and Electrical Engineering,Shanghai Normal University,Shanghai 201418,China)
机构地区:[1]上海师范大学信息与机电工程学院,上海201418
出 处:《计算机应用》2022年第S02期25-30,共6页journal of Computer Applications
基 金:上海市地方能力建设项目(19070502900)。
摘 要:针对中文同一个词的不同词性在句子中所代表的关系不同的问题,提出基于Transformer融合词性特征的中文语法纠错(CGEC)模型,所提模型将语言学知识作为辅助信息融入中文语法纠错任务。首先,在不改变句子序列长度的基础上,在原始词嵌入层中以不同方式拼接词性向量,得到全差异词嵌入、词差异词嵌入和词性差异词嵌入三种不同的词嵌入方式;然后,将新的词嵌入方式与Transformer模型相结合,对错误语句进行语法纠错。实验结果表明,三种词嵌入方式均不同程度地提高了F0.5值,且全差异词嵌入方式的效果最好:与Transformer模型相比,F0.5提升了2.73个百分点,BLEU提升了6.27个百分点;与基于Transformer增强架构的中文语法纠错模型相比,F0.5提升了1.88个百分点。所提模型在对词性特征提取时可以侧重源语句与目标语句的语法差异,更好地捕捉句子的语法特征。Aiming at the problem that different part-of-speech of the same Chinese word represents different relationships in sentences,a Chinese Grammatical Error Correction(CGEC)model based on Transformer fused with part-ofspeech feature was proposed.Linguistic knowledge was incorporated as auxiliary information into Chinese grammatical error correction tasks by the proposed model.First,without changing the length of the sentence sequence,the part-of-speech vectors were spliced in different ways in the original word embedding layer to obtain full-difference word embedding,worddifference word embedding and part-of-speech-difference word embedding.Then,the new word embedding methods were combined with the Transformer model to perform grammatical error correction on wrong sentences.The experimental results show that the three word embedding methods improves the F0.5value to varying degrees,and the full-difference word embedding has the best effect.Compared with the Transformer model,the F0.5value of the full-difference word embedding increases by 2.73 percentage points and BLEU(Bilingual Evaluation Understudy)increases by 6.27 percentage points.Compared with the Chinese grammatical error correction model based on the Transformer enhanced architecture,F0.5increases by 1.88 percentage points.The proposed algorithm enables the model to focus on the grammatical differences between the source sentence and the target sentence when extracting part-of-speech features,so as to better capture the grammatical features of sentences.
关 键 词:中文语法纠错 语言学知识 词嵌入 Transformer模型 解码器
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7