SSHGCN:基于音形异构图卷积的中文纠错方法  

SSHGCN:A Chinese Error Correction Method Based on Heterogeneous Graph Convolution with Phonological and Visual Features

在线阅读下载全文

作  者:任俊 黄瑞章[1,2,3] REN Jun;HUANG Ruizhang(Text Computing and Cognitive Intelligence Engineering Research Center of the Ministry of Education,Guizhou University,Guiyang550025,China;State Key Laboratory of Public Big Data,Guizhou University,Guiyang 550025,China;College of Computer Science and Technology,Guizhou University,Guiyang 550025,China)

机构地区:[1]贵州大学文本计算与认知智能教育部工程研究中心,贵州贵阳550025 [2]贵州大学公共大数据国家重点实验室,贵州贵阳550025 [3]贵州大学计算机科学与技术学院,贵州贵阳550025

出  处:《山西大学学报(自然科学版)》2024年第3期518-527,共10页Journal of Shanxi University(Natural Science Edition)

基  金:国家自然科学基金(62066007);贵州省科技支撑计划项目(2022277)。

摘  要:中文拼写纠错旨在检测和纠正中文文本的拼写错误,现有方法已尝试将字符相似性建模成图结构信息。但目前方法的图结构忽略汉字之间的深层音近关系,并缺少充分发挥字音和字形作用的多模态信息融合方法。因此,本文根据汉字的声母韵母信息和拼音的重要度得到拼音相似关系,结合汉字形近关系来构建汉字相似拼音-形近异构图。在该图上使用异构图卷积来互补使用汉字的音形信息,充分融合汉字的声韵和形状信息。该方法在SIGHAN15(Special Interest Group on Chinese Language Processing 15)基准上句子纠正级的F1值超过所有的对比方法,并在SIGHAN13基准上媲美最优的对比方法,验证了该方法的有效性。Chinese spelling correction aims to detect and correct spelling errors in Chinese text.Existing methods have attempted to model character similarity as graph structure information.However,the graph structure of current methods ignores the deep phonetic proximity among Chinese characters and lacks a multimodal information fusion method that fully exploits the role of character sound and shape.Therefore,this paper obtains the phonetic similarity relationship based on the initial and final information of Chi-nese characters and the importance of pinyin,and combines the shape proximity relationship of Chinese characters to construct a Chinese character similar pinyin-shape proximity heterogeneous graph.The heterogeneous graph convolution is used on this graph to complement the use of the sound and shape information of Chinese characters,and fully integrate the tone and shape information of Chinese characters.This method surpasses all comparison methods in terms of sentence-level FI score on the SIGHAN15 bench-mark,and is comparable to the best comparison method on the SIGHAN13 benchmark,verifying the effectiveness of this method.

关 键 词:中文拼写纠错 多模态信息融合方法 字符相似性 拼音相似关系 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象