基于组块的藏文依存句法分析及自动标注方法  

Chunk-based Tibetan Dependency Parsing and Automatic Annotation Method

在线阅读下载全文

作  者:达瓦追玛 曹玺 尼玛扎西 群诺[1,2,3,4] 道吉扎西 Dawa-Zhuima;CAO Xi;Nima-Zhaxi;Qunnuo;Daoji-Zhaxi(College of Information Science and Technology,Tibet University,Lhasa 850000,China;Key Laboratory of Tibetan Information Technology and Artificial Intelligence of Tibet Autonomous Region,Tibet University,Lhasa 850000,China;Engineering Research Center of Tibetan Information Technology,Ministry of Education,Tibet University,Lhasa 850000,China;Collaborative Innovation Center for Tibet informatization by MOE and Tibet Autonomous Region,Tibet University,Lhasa 850000,China)

机构地区:[1]西藏大学信息科学技术学院,西藏拉萨850000 [2]西藏大学西藏自治区藏文信息技术人工智能重点实验室,西藏拉萨850000 [3]西藏大学藏文信息技术教育部工程研究中心,西藏拉萨850000 [4]西藏大学西藏信息化省部共建协同创新中心,西藏拉萨850000

出  处:《高原科学研究》2024年第1期102-111,共10页Plateau Science Research

基  金:科技创新2030“新一代人工智能”重大项目(2022ZD0116102);西藏大学研究生高水平人才培养计划项目(2021-GSP-S128).

摘  要:依存句法分析是自然语言处理领域核心技术之一,旨在通过分析句子中词语之间的依存关系来确定句法结构。目前,藏文依存句法分析研究面临着长句解析困难和粗粒度依存转化映射不全面等问题。为此,文章提出一种基于组块和细粒度词性匹配规则的藏文依存句法分析及自动标注方法。该方法首先完善了藏文依存句法标注体系,并基于该标注体系人工标注数据集,提取词性匹配规则,进而通过藏文句子组块识别,提高了长句解析的准确性,最后实现了一个藏文依存句法自动标注原型系统TDParser,并构建了含7335条依存句法的藏文依存句法树库。通过实验证明了TDParser的性能及自动标注数据的有效性。Dependency parsing is one of the core techniques in natural language processing,aiming to determine the syntactic structure of a sentence by analyzing the dependency relationships between words in a sentence.Cur-rently,the study of Tibetan dependency parsing is facing challenges such as difficulty in parsing long sentences and incomplete mapping of coarse-grained dependency conversions.To address these issues,a Tibetan depen-dency syntactic analysis and automatic annotation method based on chunks and fine-grained part-of-speech matching rules is proposed in this paper.This method begins with refining the Tibetan dependency syntax anno-tation system,then manually annotates datasets based on this system and extracts part-of-speech matching rules.Subsequently,it enhances the accuracy of parsing long sentences through Tibetan sentence chunk recogni-tion.Finally,it develops a prototype system named TDParser for automatic Tibetan dependency syntax annota-tion and constructs a Tibetan dependency syntax treebank containing 7335 dependency syntax entries.Our ex-perimental results verified the performance of TDParser and the effectiveness of the automatic annotated data.

关 键 词:藏文 依存句法分析 组块 自动标注 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象