检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:旦正吉 华却才让 完么措 白颖 Danzheng-Ji;Huaque-Cairang;Wanme-Cuo;BAI Ying(School of Computer Science and Technology,Qinghai Normal University,Xining 810008,China;The State Key Laboratory of Tibetan Intelligent Information Processing and Application,Qinghai Normal University,Xining 810008,China;Key Laboratory of Tibetan Information Processing of Ministry of Education,Qinghai Normal University,Xining 810008,China)
机构地区:[1]青海师范大学计算机学院,青海西宁810008 [2]青海师范大学藏语智能信息处理及应用国家重点实验室,青海西宁810008 [3]青海师范大学藏文信息处理教育部重点实验室,青海西宁810008
出 处:《高原科学研究》2024年第2期118-125,共8页Plateau Science Research
基 金:国家自然科学基金项目(62166034);藏语智能信息处理及应用国家重点实验室项目(2020-ZJ-Y05);青海省基础研究计划项目(2020-0301-ZJC-0042);青海省应用基础研究计划项目(2021-ZJ-727).
摘 要:针对藏语句子语义分析中语义种类繁多且广泛存在歧义的难点,提出了基于藏文音节向量和BiL-STM-CRF混合模型相结合的藏语语义组块识别方法。首先制定了13种语义组块标注规范,其次构建了13211句语义组块标注语料库,在此基础上采用TS-BiLSTM-CRF方法训练了藏语语义组块识别和分类模型。综合测试实验结果表明,该模型精确率为75.03%,召回率为76.52%,F1值为75.77%。各类语义组块识别中,指示类(INS)识别的测评结果远高于其他几类语义组块,精确率为90.87%;组织类(ORG)的测评结果偏低于其他类型,精确率为66.67%。文章研究证实了TS-BiLSTM-CRF模型在藏语语义组块识别分析任务中具有较好的性能。A Tibetan semantic chunking recognition method based on the combination of Tibetan syllable vectors and BiLSTM-CRF hybrid model is proposed to address the difficulties associated with diverse semantic types and ambiguities in the semantic analysis of Tibetan sentences.Firstly,13 semantic chunking annotation standards were developed,and a semantic chunking annotation corpus comprising 13211 sentences was then constructed.Based on this,the Tibetan semantic chunking recognition and classification model was trained using the TS-BiLSTM-CRF method.The results of the comprehensive test experiment show that the accuracy rate,the recall rate,and the F1 value are 75.03%,76.52%,and 75.77%,respectively.Among all types of semantic chunking recognition,the evaluation results show that the accuracy rate of INS class recognition are much higher compared to other types of semantic blocks,with a value of 90.87%,while the ORG class has a lower accuracy rate of 66.67%than those of other types.This study validates that the TS-BiLSTM-CRF model exhibits strong performance in Tibetan semantic chunking recognition and analysis tasks.
关 键 词:藏语 语义组块识别 TS-BiLSTM-CRF模型 标注规范
分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.147.72.3