机构地区:[1]江西财经大学信息管理学院,南昌330013 [2]江西财经大学数据与知识工程江西省高校重点实验室,南昌330013
出 处:《计算机学报》2019年第12期2795-2820,共26页Chinese Journal of Computers
基 金:国家自然科学基金项目(61562032,61662027,61173146,61363039,61363010,61462037);江西省自然科学基金项目(20152ACB20003,20161BAB202057);江西省高等学校科技落地计划项目(KJLD12022,KJLD14035);江西省教育厅科技研究项目(GJJ150819,GJJ160783);江西省高校人文社会科学研究项目(JC161001)资助~~
摘 要:实体关系抽取的目标在于探测实体之间的显式关系和隐式关系.现有研究大多集中在显式实体关系抽取,而忽略了隐式实体关系抽取.针对旅游和新闻领域文本经常包含许多由协陪义动词引发的隐式实体关系,本文研究了基于协陪义动词的中文隐式实体关系抽取问题.将机器学习方法与规则相结合,借助于显式实体关系对隐式实体关系进行推理.首先,利用依存句法分析,设计了协陪义候选句型分类算法以及相应的协陪义成分识别算法;其次,根据协陪义成分和协陪义动词作用范围的特点,设计了三种句内基于协陪义动词的隐式实体关系推理规则;最后,利用协陪义句中零形回指的先行词,建立不同句子中协陪义动词的主体成分与客体成分之间的联系,实现句间基于协陪义动词的隐式实体关系抽取.另外,本文还提出了趋向核心动词特征提取算法,进一步提高了动词特征对显式实体关系抽取的效果.基于真实的旅游领域和新闻领域文本数据集进行了详细的实验测试,实验结果表明了方法的有效性.The target of named entity relation extraction is to detect explicit and implicit relations between entities.Most of the existing researches focus on explicit entity relation extraction,but ignore implicit entity relation extraction.Compared with explicit relations,implicit relations have no explicit supporting evidence in text and require additional evident from a reading of the document.Therefore,implicit relations usually need to integrate semantic associations of sentence content with relevant linguistic information,specific context semantic information and related domain knowledge for indirect inference.However,because of the ambiguity of semantic relations,the complexity of sentence structures,the uncertainty of context information and the imbalance of data,the task of implicit relation extraction is more complicated and more difficult,and it cannot be implemented using ageneral model.Therefore,it has been a challenge to infer implicit relations.Several works related to implicit relation extraction have been performed for European languages and especially for English.As far as we know,very few studies have been done for Chinese language.In many text domains such as tourism and news domains,there exist many implicit entity relations triggered by company verbs.In this paper,we study the problem of Chinese implicit entity relation extraction based on company verbs.This paper proposes a two-stage scheme that takes into account both explicit relation extraction and implicit relation extraction.We integrate a machine learning method with rules and use explicit entity relations to infer implicit entity relations.Firstly,the company verb vocabulary is constructed by using a variety of methods and is used to select candidates from sentences containing company verbs.Secondly,the sentence pattern classification algorithm and the corresponding component recognition algorithm are designed for company candidate sentences.According to different roles of company verbs in the sentence,we employ dependency parsing to decide
关 键 词:关系抽取 隐式关系 协陪义动词 显式关系 动词特征
分 类 号:TP311[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...