基于fastText模型的词向量表示改进算法  被引量:10

Base on fastText model to improve the word embedding of phrases and morphology

在线阅读下载全文

作  者:阴爱英 吴运兵[2] 郑一江 余小燕[2] YIN Aiying;WU Yunbing;ZHENG Yijiang;YU Xiaoyan(Department of Computer Engineering,Zhicheng College of Fuzhou University,Fuzhou,Fujian 350002,China;College of Mathematics and Computer Science,Fuzhou University,Fuzhou,Fujian 350108,China)

机构地区:[1]福州大学至诚学院计算机工程系,福建福州350002 [2]福州大学数学与计算机科学学院,福建福州350108

出  处:《福州大学学报(自然科学版)》2019年第3期314-319,共6页Journal of Fuzhou University(Natural Science Edition)

基  金:福建省自然科学基金资助项目(2017J01755);福建省教育厅中青年教师教育科研项目(JAT170102)

摘  要:传统词向量表示模型往往忽视了单词间的句法形态结构,导致模型预测准确率不高.为此,提出基于fastText模型的词向量表示改进算法.首先,在训练模型数据集上,引入stopwords处理技术,剔除一些无意义介词等对预测模型干扰,减少噪声数据;其次,针对fastText模型中n-gram分解格式进行限定,将分解条件设置为符合英文单词的组成结构;最后,去除fastText模型中单词前后缀标记符,减少无用分解对模型预测产生干扰.实验结果表明,与fastText模型相比,所提出的改进模型在单词关系评分、语义相似性、句法相似性均取得较好的准确率.The traditional word vector representation model ignores the syntactic morphological structure between words,which leads to the low prediction accuracy of the model.In this paper,we propose an improved word vector representation algorithm based on fastText model.Firstly,we introduce stopwords processing technology on the training model datasets to eliminate the interference of meaningless prepositions to the prediction model,and reduce noise data.Secondly,the n-gram decomposition format in the fastText model is limited,the decomposition condition is set to conform to the composition structure of English words.Finally,the word prefix and suffix markers in fastText model are removed to reduce the interference caused by useless decomposition to the model prediction.Experimental results show that compared with the fastText model,the improved model proposed in this paper achieves better accuracy in word relationship score,semantic similarity and syntactic similarity.

关 键 词:词向量 skip-gram模型 fastText模型 自然语言处理 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象