融合格序列和多维语义特征的藏语句法成分标注研究  

Study on the Tibetan Syntactic Component Labeling Based on Integrating Case Sequence Knowledge and Other Semantic Features

在线阅读下载全文

作  者:尕藏扎西 多拉 冷本扎西 Gyesang-Tashi;Dolha;Lumbum-Tashi(Department of Chinese language and literature,Northwest Minzu University,Lanzhou 730030,China;State Key Laboratory of Tibetan Intelligence,Qinghai Normal University,Xining 810016,China)

机构地区:[1]西北民族大学中国语言文学学部,甘肃兰州730030 [2]青海师范大学藏语智能全国重点实验室,青海西宁810008

出  处:《高原科学研究》2025年第1期119-128,共10页Plateau Science Research

基  金:国家自然科学基金项目(62266037,62206146);青海省科技厅科技基础条件平台项目(2023-ZJ-T02);青海师范大学自然科学中青年科研基金项目(KJQH2022011)。

摘  要:深层句法分析是藏语自然语言理解中的关键难题之一。针对现有藏语句法分析模型性能欠佳的问题,文章提出一种融合格序列知识和多维语义特征的藏语句法成分标注方法。该方法以提取藏语格序列对句法成分的约束信息为主要语义特征,进而融合藏文字丁、词、词性等多维语义特征后,用Bi-LSTM+CRF联合预测藏语句法成分标记。实验结果显示,该方法在真实语料中的准确率达90.67%、精确率达87.00%、召回率达87.33%,F1值达87.16%,其F1值高于所有基线模型。此外,通过消融实验验证了融合藏语格序列知识及其他特征的WPCc_BiLSTM+CRF模型可大幅提升藏语句法成分标注性能。Deep syntactic analysis is one of the key problems in understanding Tibetan natural language.To solve the poor performance of existing Tibetan syntactic analysis models,this study introduces a new Tibetan syntactic component labeling method that integrates Tibetan case sequence knowledge and multidimensional semantic features.The method primarily extracts constraining information from the Tibetan case associated with syntactic components as its main semantic feature,and then integrates other features such as Tibetan characters,words,and part of speech(POS) tags,to jointly predict Tibetan syntactic component tags using Bi-LSTM+CRF.The experimental results show that this method achieves an accuracy of 90.67% on real corpus data,with precision rate,recall rate,and F1 value of 87.00%,87.33%,and 87.16%,respectively.Also,the F1 value of this method surpasses those of all baseline models.In addition,the ablation experiment verifies that the performance of Tibetan syntactic component tagging is significantly improved by the WPCc_BiLSTM+CRF model,which integrates Tibetan case semantic knowledge and other features.

关 键 词:藏语格序列 语义特征 句法成分标注 句法分析 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象