基于多策略的藏语语义角色标注研究  被引量:3

Multi-Strategy Semantic Role Labeling of Tibetan

在线阅读下载全文

作  者:龙从军[1,2] 康才畯 李琳[3] 江荻[1] 

机构地区:[1]中国社科院民族所,北京100081 [2]中国科学院软件研究所,北京100190 [3]青海师范大学计算机学院,青海西宁810004

出  处:《中文信息学报》2014年第5期176-181,共6页Journal of Chinese Information Processing

基  金:国家自然科学基金(61132009)

摘  要:语义角色标注研究对自然语言处理具有十分重要的意义。英汉语语义角色标注研究已经获得了很多成果。然而藏语语义角色标注研究不管是资源建设,还是语义角色标注的技术探讨都鲜有报道。藏语具有比较丰富的句法标记,它们把一个句子天然地分割成功能不同的语义组块,而这些语义组块与语义角色之间存在一定的对应关系。根据这个特点,该文提出规则和统计相结合的、基于语义组块的语义角色标注策略。为了实现语义角色标注,文中首先对藏语语义角色进行分类,得到语义角色标注的分类体系;然后讨论标注规则的获得情况,包括手工编制初始规则集和采用错误驱动学习方法获得扩充规则集;统计技术上,选用了条件随机场模型,并添加了有效的语言特征,最终语义角色标注的结果准确率、召回率和F值分别达到82.78%、85.71%和83.91%。Semantic role labeling is of great significance for natural language processing. Substantial achievements have been made in this issue for both English and Chinese. However, either the resource construction or the technology development for semantic role labeling in Tibetan is still in the initial stage. Tibetan has rich syntactic markers which naturally segment a sentence into different semantic chunks, and there are certain relationship between these chunks and semantic roles. Accordingly, this paper propose a semantic role labeling strategy for Tibetan based on semantic chunking by combining two means of rules and statistics. In order to realize the semantic role labeling, a classification system of Tibetan semantic roles is designed and then the acquisition of rules is discussed, including a manual initial rule sets and expanded rule sets from Transformation-Based Error-driven Learning (TBL). Meanwhile the Conditional Random Fields (CRFs) Model is adopted for statistical decision. Experimental results shows that the proposed semantic role labeling method achieves 82.78% in precision, 85.71% in recall, and 83.91% in F measure.

关 键 词:藏语 语义角色标注 TBL CRFS 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象