检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:张艺璇 李斌[1,2] 许智星[1,2] Zhang Yi-xuan;Li Bin;Xu Zhi-xing(School of Chinese Language and Literature,Nanjing Normal University,Nanjing 210097,China;Center for Language Big Data and Computational Humanities,Nanjing Normal University,Nanjing 210097,China)
机构地区:[1]南京师范大学文学院,南京210097 [2]南京师范大学语言大数据与计算人文研究中心,南京210097
出 处:《外语学刊》2025年第1期19-28,共10页Foreign Language Research
基 金:国家社科重大项目“先秦诸子典籍知识库建设及词典编纂”(22&ZD262);教育部人文社科一般项目“基于大语言模型的古汉语词义知识库构建”(24A10319028)的阶段性成果。
摘 要:篇章级共指关系是语言学和计算语言学的研究难点之一。本文在梳理共指理论研究与趋势的基础上,回顾共指语料库的构建与自动解析方法,指出共指语料的构建主要存在以下两个问题:共指关系的标注较为粗疏,也基本不考虑与句子语义结构本身的关系。本文在句子级语义标注体系(中文抽象语义表示)的基础上,设计篇章共指的标注体系,以“概念同一性”为基本原则,从词形的异同和概念的表述角度区分9种篇章共指关系,标注了500个篇章的共指信息。与已完整标注的52种句内语义关系相结合,构建出带有篇章共指信息的篇章抽象语义图库。该语料库选自CTB新闻语料,体裁涵盖经济、体育及生活类,规模为6237句,16万词例。该语料库的构建为篇章级语义分析提供了新框架与数据资源。Discourse⁃level coreference is a challenging research area in both linguistics and computational linguistics.This paper reviews coreference theories and their development trends,with a focus on the construction of coreference corpus and automatic resolution methods.We pointed out two main issues in the construction of coreference corpus:the annotation of coreference relationships tends to be coarse⁃grained,and the relationships between coreference and sentence⁃level semantic structures are largely neglected.To address these gaps,this study designs a discourse⁃level coreference annotation framework based on the sentence⁃level semantic annotation framework Chinese Abstract Meaning Representation.Guided by the principle of“conceptual identity”,the framework categorizes nine types of discourse⁃level coreference relations from the perspectives of word type and concept consistency.Coreference information was annotated for 500 texts.By integrating 52 inner⁃sentence semantic relations already annotated,the study constructs a discourse abstract meaning graph enriched with discourse⁃level coreference information.The corpus is derived from the Chinese Treebank news corpus,covering economics,sports,and daily life,with a total size of 6,237 sentences and 163,227 word tokens.This corpus provides a novel framework and valuable data resources for discourse⁃level semantic analysis.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.20.224.152