检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:杨进 朱云飞 陈晨 阿永强 YANG Jin;ZHU Yunfei;CHEN Chen;A Yongqiang(College of Cyber Security,Sichuan University,Chengdu 610065,China;School of Information Science and Technology,Tibet University,Lhasa 850000,China)
机构地区:[1]四川大学网络空间安全学院,四川成都610065 [2]西藏大学信息科学技术学院,西藏拉萨850000
出 处:《高原科学研究》2023年第2期84-92,共9页Plateau Science Research
基 金:国家自然科学基金(62162057,61872254);四川省科技计划(2021JDRC0004);公安部信息网络安全重点实验室(C20606);国家级大学生创新训练项目(202210694032).
摘 要:该文提出了基于TMS-BERT(Tibetan Multi-granularity Semantic matching-BERT)的藏文多粒度语义匹配模型。针对藏文文本特点,提出一种基于音节字、词、短语混合的多粒度特征向量构建模型,有效保留了藏文的语义特征,缓解了传统藏文文本匹配模型存在的维度灾难问题。提出一种基于Transformer的双向编码能力和自注意力机制,采用大量藏文训练一个用于检测藏文语义相似性的模型,克服了传统文本匹配模型检测准确率较低的问题。在社交平台和新闻网站等搜集到71904个藏文句子对用于训练和模型评估,该模型最终精确率高达95.33%,准确率高达94.33%,相比于传统的BERT模型准确率提高了3.68%,比传统词向量生成模型fastText准确率提高了12.39%,比传统文本相似度模型提高了27.35%。A multi-granularity semantic matching model for Tibetan text based on Tibetan Multi-granularity Semantic matching-BERT(TMS-BERT)is introduced in this paper.Aiming at the characteristics of Tibetan text,a multi-granularity feature vector construction model based on a mixture of syllabic characters,words,and phrases is proposed,which effectively preserves the semantic features of Tibetan and alleviates the dimensional disaster problem of traditional Tibetan text matching models.A model based on Transformer's bidirectional coding ability and self-attentiveness mechanism is proposed,and a model for detecting semantic similarity of Tibetan texts was trained using a large amount of Tibetan texts,which overcomes the problem of low detection accuracy of traditional text matching models.71904 Tibetan sentence pairs was collected from social media platforms and news websites for training and model evaluation.The accuracy of the model is as high as 95.33% with an accuracy rate of 94.33%,which is 3.68%,12.39%,and 27.35% higher than the accuracy of the BERT model,the word vector generation model fastText model,and the text similarity model,respectively,and proved the efficiency of the model introduced in this paper.
分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.26