基于依存树与规则相结合的汉泰新闻事件要素抽取方法  被引量:8

A News Event Extraction Method in Chinese and Thai Languages Based on Dependency Tree Elements Combined with Rules

在线阅读下载全文

作  者:程良 郜洪奎 王红斌[2] CHENG Liang;GAO Hong-kui;WANG Hong-bin(City College,Kunming University of Science and Technology,Kunming 650051,China;Faculty of Information Engineering and Automation,Kumning University of Science and Technology,Kunming 650504,China)

机构地区:[1]昆明理工大学城市学院,云南昆明650051 [2]昆明理工大学信息工程与自动化学院,云南昆明650504

出  处:《软件导刊》2018年第7期49-56,63,共9页Software Guide

基  金:国家自然科学基金项目(61462054);云南省科技厅面上项目(2015FB135);云南省教育厅科学研究基金项目(2018JS035)

摘  要:针对汉泰新闻事件要素抽取进行研究,首先分析汉泰语言特点,发现泰语的定语、状语和补语后置与中文语法结构类似,进一步分析发现汉泰依存结构相同。因此,通过平行句对构建汉泰依存树,再根据泰语语言特点定义若干规则,利用依存树与规则相结合抽取泰语句子的主语、宾语和状语。实验验证,泰语主语名词短语、宾语名词短语和状语名词短语的事件要素抽取正确率分别为62.13%、64.18%和70.21%,说明基于依存树与规则相结合抽取泰语新闻事件元素是可行的。This research aims to study the extraction method for news in both Chinese and Thai languages.An analysis on the characteristics of Chinese and Thai language was carried out.It was found that the attributive,adverbial and post-complement were similar in both languages,which further indicated that Chinese and Thai language shared the same dependency structure.Therefore,Chinese and Thai dependency structure trees were developed by parallel sentences.Then,according to the rules of Thai language features,subject,object and adverbial of Thai sentences were extracted by combining dependency tree and the defined rules.The research confirmed the main elements in Thai news included subject noun phrases,object noun phrases,and adverbial noun phrases,with the correct extraction rate of 62.13%,64.18% and 70.21% respectively.It is evident that dependency structure tree in combination with language rules could be applied in extracting the elements in Thai news.

关 键 词:依存树 规则 泰语 要素抽取 自然语言处理 

分 类 号:TP301[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象