基于组合特征的自训练隐式篇章关系的识别技术  被引量:4

Implicit Discourse Relation Identification Based on Combined Features and Self-training Learning

在线阅读下载全文

作  者:刘初 陈锦秀[1] 

机构地区:[1]厦门大学信息科学与技术学院,厦门大学云计算与大数据研究中心,福建厦门361005

出  处:《厦门大学学报(自然科学版)》2014年第2期182-189,共8页Journal of Xiamen University:Natural Science

基  金:国家自然科学基金(60803078);福建省自然科学基金(2010J01351);教育部海外留学回国人员科研启动基金

摘  要:信息抽取技术中,隐式篇章关系识别一直是研究难点.针对现有的有监督篇章关系识别方法中需要大量人工标注数据的缺点,提出了用自训练的策略实现半监督的隐式篇章关系的自动识别模型,尝试仅用少量标注样本,却获得和有监督方法相媲美的识别准确率,为未来实时大数据篇章关系识别提供了新的契机.此外,为了进一步提高识别的准确率,还针对词对特征、产生式特征、动词特征等9种篇章关系特征进行特征组合分析,构建候选篇章关系实例的知识表示,对模型进行优化.通过在Penn Discourse Treebank(PDTB2.0)语料库上的实验结果分析表明,该模型比传统有监督识别方法在准确率和F-score上分别提高了5.2%和13.5%.In the area of information extraction (IE),it is a difficult task for implicit discourse relation identification. Aim to over- come the shortage of labeled data for the existing supervised discourse relation identification methods,a semi-supervised identification model based on self-training strategy was presented. Using only few labeled examples, the model achieved comparable performance with supervised methods,which provides a new opportunity for future real-time big-data identification task.Besides, we extracted 9 kinds of features,such as, word pair, production rule and verb etc. were extracted, and knowledge representation of candidate in- stances were constructed by serveral of them to optimize the model.Experimental results on Penn Discourse Treebank (PDTB2.0) showed that our model increases of accuracy and F-score by 5.2% and 13.5% respectively compared with traditional supervised method.

关 键 词:隐式篇章关系识别 半监督学习 自训练 组合特征 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象