多特征融合的越南语关键词生成方法  

Vietnamese keyphrase generation method based on multi-feature fusion

在线阅读下载全文

作  者:陈瑞清 高盛祥[1,2] 余正涛[1,2] 张迎晨 张磊[1,2] 杨舰 CHEN Rui-qing;GAO Sheng-xiang;YU Zheng-tao;ZHANG Ying-chen;ZHANG Lei;YANG Jian(Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,Yunnan,China;Yunnan Key Laboratory of Artificial Intelligence,Kunming University of Science and Technology,Kunming 650500,Yunnan,China)

机构地区:[1]昆明理工大学信息工程与自动化学院,云南昆明650500 [2]昆明理工大学云南省人工智能重点实验室,云南昆明650500

出  处:《云南大学学报(自然科学版)》2022年第1期23-33,共11页Journal of Yunnan University(Natural Sciences Edition)

基  金:国家自然科学基金(61972186);国家重点研发计划(2018YFC0830105);云南省重大科技专项(202002AD080001-5)。

摘  要:越南语属于低资源语种,高质量关键词新闻数据稀缺,为了解决样本不足条件下生成越南语新闻关键词准确性不高的问题,提出了一种多特征融合的越南语关键词生成模型,拟提升生成的越南语关键词与越南语新闻文档的相关性.首先,将越南语新闻实体、词性、词汇位置特征与词向量拼接,使输入模型的词向量包含更多维度的语义信息;其次,利用双向注意力机制捕获上下文与新闻标题的依赖关系,增强标题在关键词生成中的指导作用;最后,结合复制机制生成越南语关键词,从而提高关键词的语义相关性.在构建的越南语新闻关键词数据集上进行实验,结果表明融合多特征的关键词生成模型能在越南语训练样本有限的条件下生成高质量关键词,F1@10、R@50分数比TG-Net分别提升了13.2%和17.1%.Vietnamese is a low-resource language and high-quality keyphrase news corpus is scarce. In order to solve the problem that the accuracy of generating Vietnamese news keyphrases is not high under the condition of insufficient samples, a multi-feature fusion Vietnamese keyphrase generation model is proposed to improve the relevance of the generated Vietnamese keyphrases and Vietnamese news documents. Firstly, the features of Vietnamese news entity, part of speech, vocabulary position are spliced with the word vector, so that the word vector of the input model contains more dimensional semantic information. Secondly, the bidirectional attention mechanism is used to capture the dependence of context and news headlines and enhance the guiding role of headlines in keyphrase generation. Finally, it combine the copy mechanism to generate Vietnamese keyphrases for improving the semantic relevance of keyphrases. Experiments on the constructed Vietnamese news corpus show that the keyphrase generation model fused with multiple features can generate high-quality keyphrases under the condition of limited Vietnamese training corpus. Compared with TG-Net, the F1@10 and R@50 score are improved by 13.2% and 17.1% respectively.

关 键 词:多特征 越南语 关键词生成 双向注意力机制 词向量 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象