检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]厦门大学信息科学与技术学院,厦门大学云计算与大数据研究中心,福建厦门361005
出 处:《厦门大学学报(自然科学版)》2014年第2期182-189,共8页Journal of Xiamen University:Natural Science
基 金:国家自然科学基金(60803078);福建省自然科学基金(2010J01351);教育部海外留学回国人员科研启动基金
摘 要:信息抽取技术中,隐式篇章关系识别一直是研究难点.针对现有的有监督篇章关系识别方法中需要大量人工标注数据的缺点,提出了用自训练的策略实现半监督的隐式篇章关系的自动识别模型,尝试仅用少量标注样本,却获得和有监督方法相媲美的识别准确率,为未来实时大数据篇章关系识别提供了新的契机.此外,为了进一步提高识别的准确率,还针对词对特征、产生式特征、动词特征等9种篇章关系特征进行特征组合分析,构建候选篇章关系实例的知识表示,对模型进行优化.通过在Penn Discourse Treebank(PDTB2.0)语料库上的实验结果分析表明,该模型比传统有监督识别方法在准确率和F-score上分别提高了5.2%和13.5%.In the area of information extraction (IE),it is a difficult task for implicit discourse relation identification. Aim to over- come the shortage of labeled data for the existing supervised discourse relation identification methods,a semi-supervised identification model based on self-training strategy was presented. Using only few labeled examples, the model achieved comparable performance with supervised methods,which provides a new opportunity for future real-time big-data identification task.Besides, we extracted 9 kinds of features,such as, word pair, production rule and verb etc. were extracted, and knowledge representation of candidate in- stances were constructed by serveral of them to optimize the model.Experimental results on Penn Discourse Treebank (PDTB2.0) showed that our model increases of accuracy and F-score by 5.2% and 13.5% respectively compared with traditional supervised method.
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.13