机构地区:[1]苏州大学计算机科学与技术学院,江苏苏州215000 [2]苏州科技大学电子与信息工程学院,江苏苏州215009 [3]苏州市虚拟现实智能交互及应用技术重点实验室,江苏苏州215009
出 处:《计算机学报》2019年第12期2769-2794,共26页Chinese Journal of Computers
基 金:国家自然科学基金(61331011,61472264);苏州市科技发展计划(重点实验室SZS201609)项目资助~~
摘 要:篇章话题结构分析主要针对篇章的意图性,是篇章语义分析的基础,其主要任务是从整体层次上分析出篇章结构及其构成单元之间的语义关系,并利用上下文理解篇章.篇章分析既需要研究篇章的基本构成单元,更需要研究基本构成单元之间的篇章关系.然而当前自然语言处理的研究重心大都集中在词法和句法领域,而忽略了对篇章内在规律的研究,缺乏对篇章话题结构展开有效分析的系统理论方法,这就极大阻碍了基于篇章语义分析的相关应用.本文首先从篇章衔接性和连贯性两个基本特征入手,讨论了篇章话题结构分析的国内外研究现状,从理论体系探索、语料库构建和计算模型三方面展开详细综述,分析对比了各类理论、资源及其模型的特点.其中,理论部分代表性的工作包括语域加衔接理论,Hobbs模型,修辞结构理论,PDTB体系,意图结构理论,宏观结构理论等;资源部分主要工作有修辞结构篇章树库、宾州篇章树库、MUC语料、ACE评测语料、ARRAU、OntoNotes和篇章图库等;在计算模型方面,主要围绕上述理论和技术资源展开相关研究;随后,特别讨论了汉语篇章话题结构的最新研究进展.基于上述讨论,本文分析探索了基于主述位理论的篇章微观话题结构表示体系,并描述了相应语料库资源的构建及其一致性检验;篇章微观话题结构形式化表示为一个三元组,其主要特征是一种链式结构,链结点为篇章基本话题(子句),其内部的主位或述位为连接端,连接端之间通过微观话题联接建立起连接关系,其实质是一种语义关联,体现篇章之间的衔接关系.最后,本文还对篇章话题结构研究的未来发展方向进行了总结展望.The analysis of discourse topic structure which focused on the intension of the discourse is the basis of semantic analysis of discourse level.Its main task is to analyze the semantic relations between the discourse structure and its constituent units from the overall layout,and use the context to understand the discourse.The discourse analysis needs to study the basic constituent units of the discourse,and it is necessary to study the discourse relationship between the basic constituent units.However,most of the researches in Natural Language Processing are focusing on the lexical and syntactic aspects,and the research on the internal law of the discourse is relatively few.The lack of theoretical method system for the effective analysis of the discourse topic has seriously restricted the related application based on the semantic analysis of the discourse.This paper begins with the two basic characteristics of discourse coherence and cohesion,discusses the current situation of domestic and foreign research on the discourse structure analysis.It summarizes the three aspects of theoretical system exploration,corpus construction and calculation model,analyzes and compares the characteristics of various theories,resources and models.In the theoretical part,the representative work includes Cohesion and Register Theory(CRT),Hobbs Model,Rhetorical Structure Theory(RST),Architecture of Penn Discourse TreeBank,Intentional Structure Theory(IST),Macro-Structure Theory(MST).In the corpus construction part,the main work of the resource part is Rhetorical Structure Theory Discourse Treebank(RST-DT),Penn Discourse Treebank(PDTB),Evaluation Corpus of Message Understanding Conference(MUC),Evaluation Corpus of Automatic Content Extraction(ACE),ARRAU Corpus,OntoNotes Corpus and Discourse GraphBank.In the calculation model part,the paper mainly focuses on the above theoretical and technical resources to carry out research.Subsequently,we discuss the latest research progress of the topic structure of Chinese discourse.According to th
关 键 词:篇章话题结构 篇章理论 语料库标注 计算模型 篇章意图性 篇章语义分析
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...