结合词形词性和译文的汉语词义消歧  被引量:2

Chinese Word Sense Disambiguation Based on Word-translation and Part-of-speech

在线阅读下载全文

作  者:张春祥 赵凌云 高雪瑶[2] ZHANG Chun-xiang;ZHAO Ling-yun;GAO Xue-yao(School of Software and Microelectronics, Harbin University of Science and Technology, Harbin 150080, China;School of Computer Science and Technology, Harbin University of Science and Technology, Harbin 150080, China)

机构地区:[1]哈尔滨理工大学软件与微电子学院,哈尔滨150080 [2]哈尔滨理工大学计算机科学与技术学院,哈尔滨150080

出  处:《哈尔滨理工大学学报》2020年第3期131-136,共6页Journal of Harbin University of Science and Technology

基  金:国家自然科学基金(61502124,60903082);中国博士后科学基金(2014M560249);黑龙江省自然科学基金(F2015041,F201420);黑龙江省普通高校基本科研业务费专项资金(LGYC2018JC014)。

摘  要:针对汉语中存在的词汇歧义问题,根据左右邻接词汇的词形、词性和译文信息,采用卷积神经网络(convolution neural network,CNN)来确定它的真实含义。选取歧义词汇的消歧词窗,共包含两个邻接词汇单元,抽取其词形、词性和译文作为消歧特征。以消歧特征为基础,结合卷积神经网络来构建词义消歧分类器。利用SemEval-2007:Task#5的训练语料和哈尔滨工业大学语义标注语料来优化CNN的参数。采用SemEval-2007:Task#5的测试语料对词义消歧分类器进行测试。实验结果表明:相对于贝叶斯(Bayes)模型和BP神经网络(BP neural network)而言,本文所提出方法的消歧平均准确率分别提高了14.94%和6.9%。For vocabulary ambiguity problem in Chinese,CNN(Convolution Neural Network)is adopted to determine true meaning of ambiguous vocabulary where word,part-of-speech and translation around its left and right adjacent words are used.We select disambiguation window of ambiguous word which contains two adjacent lexical units and word,part-of-speech and translation are extracted as disambiguation features.Based on disambiguation features,convolution neural network is used to construct word sense disambiguation(WSD)classifier.Training corpus in SemEval-2007:Task#5 and semantic annotation corpus in Harbin Institute of Technology are used to optimize parameters of CNN.Test corpus in SemEval-2007:Task#5 is applied to test word sense disambiguation classifier.Experimental results show that compared with Bayes model and BP neural network,the proposed method in this paper can make average disambiguation accuracy improve 14.94%and 6.9%.

关 键 词:词汇歧义 卷积神经网络 词汇单元 消歧特征 词义消歧 

分 类 号:TP391.2[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象