一种多方法融合的藏语情感词典构建方法  

A Study on the Multi-method Integrated Approach for Constructing Tibetan Emotional Dictionary

在线阅读下载全文

作  者:才让东知 尼玛扎西 达瓦追玛[1,2,3,4] 道吉扎西 Tshering-Dondrub;Nimma-Zhaxi;Dawa-Zhuima;Daoji-Zhaxi(College of Information Science and Technology,Tibet University,Lhasa 850000,China;Key Laboratory of Tibetan Information Technology and Artificial Intelligence of Tibet Autonomous Region,Tibet University,Lhasa 850000,China;Engineering Research Center of Tibetan Information Technology,Ministry of Education,Tibet University,Lhasa 850000,China;Collaborative Innovation Center for Tibet informatization by MOE and Tibet Autonomous Region,Tibet University,Lhasa 850000,China)

机构地区:[1]西藏大学信息科学技术学院,西藏拉萨850000 [2]西藏大学西藏自治区藏文信息技术人工智能重点实验室,西藏拉萨850000 [3]西藏大学藏文信息技术教育部工程研究中心,西藏拉萨850000 [4]西藏大学西藏信息化省部共建协同创新中心,西藏拉萨850000

出  处:《高原科学研究》2024年第2期96-105,共10页Plateau Science Research

基  金:科技创新2030——新一代人工智能重大项目(2022ZD0116101);西藏大学研究生高水平人才培养计划项目(2021-GSP-S129)。

摘  要:深度学习在藏语情感分析领域备受关注,相较于传统机器学习方法其表现更出色。然而,构建藏语情感词典仍面临着挑战,如词汇量不足、过度依赖机器翻译系统、词典匹配源单一、缺少口语情感词典等。为解决上述问题,文章提出了一种多方法融合的藏语情感词典构建方法。首先,统计并分析已有情感词标注规则后提出了一种藏语情感词的标注规则作为情感词分类的主要依据;其次,提出了一种多词典匹配的藏语情感词典构建方法构建了藏语基准情感词典,为了扩大基准情感词典的规模,利用SO-PMI和基于word2vec词向量相似度扩充方法对基准词典进行词汇扩充,并且利用藏语3大方言的口语词典经人工筛选后构建了藏语口语情感词典;然后,将基准词典和扩充词典合并去重后得到了《藏语书面语与口语情感词典》;最后,为了证明本文方法的可行性和所构建词典的可用性而进行了藏语情感词典性能评估实验;实验中准确率、召回率、F值分别为60.80%、90.31%、72.67%,达到了较好的应用水平,验证了多方法融合的藏语情感词典构建方法的可行性。Currently,deep learning is gaining significant attention in the field of Tibetan sentiment analysis due to its superior performance compared to traditional machine learning methods,especially in the crucial role of sentiment word features.However,there are challenges in constructing a Tibetan sentiment dictionary,such as limited vocabulary,overreliance on machine translation systems,single dictionary matching sources,and lack of oral sentiment dictionaries.To address these issues,this article proposes a multi-method integrated approach for constructing a Tibetan sentiment dictionary.First,after conducting a statistical analysis of existing sentiment word annotation rules,a Tibetan sentiment word annotation rule is proposed as the main basis for sentiment word classification.Secondly,a method for constructing a multi-dictionary matching Tibetan sentiment dictionary is proposed and a Tibetan benchmark sentiment dictionary was constructed,to expand the vocabulary of the bench-mark dictionary,the SO-PMI and word2vec word vector similarity expansion methods were used.Furthermore,a Tibetan oral sentiment dictionary was created by manually screening the oral dictionaries of the three major Tibetan dialects.Subsequently,the benchmark dictionary and the expanded dictionary were combined and deduplicated to obtain the Tibetan Written and Spoken Emotional Dictionary.Finally,an experiment was conducted to evaluate the performance of the Tibetan sentiment dictionary,demonstrating the feasibility of the proposed method and the usability of the constructed dictionary.The accuracy,recall,and F-values of the experiment were 60.80%,90.31%,and 72.67%,respectively,indicating a good level of application and verifying the feasibility of the multi-method integrated approach to constructing the Tibetan sentiment dictionary.

关 键 词:藏语 SO-PMI 情感词典 多词典匹配 扩充词典 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象