检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:李雁群 何云琪 钱龙华 周国栋[1,2] LI Yanqun;HE Yunqi;QIAN Longhua;ZHOU Guodong(Natural Language Processing Laboratory,Soochow University,Suzhou,Jiangsu 215006,China;School of Computer Science and Technology,Sooehow University,Suzhou,Jiangsu 215006,China)
机构地区:[1]苏州大学自然语言处理实验室,江苏苏州215006 [2]苏州大学计算机科学与技术学院,江苏苏州215006
出 处:《中文信息学报》2018年第8期19-26,共8页Journal of Chinese Information Processing
基 金:国家自然科学基金(61373096;61331011;61673290)
摘 要:嵌套命名实体含有丰富的实体和实体间语义关系,有助于提高信息抽取的效率。由于缺少统一的标准中文嵌套命名实体语料库,目前中文嵌套命名实体的研究工作难于比较。该文在已有命名实体语料的基础上采用半自动化方法构建了两个中文嵌套命名实体语料库。首先利用已有中文命名实体语料库中的标注信息自动地构造出尽可能多的嵌套命名实体,然后再进行手工调整以满足对中文嵌套实体的标注要求,从而构建高质量的中文嵌套命名实体识别语料库。语料内和跨语料嵌套实体识别的初步实验表明,中文嵌套命名实体识别仍是一个比较困难的问题,需要进一步研究。Nested named entities contain rich entities and semantic relations between them,which facilitates to improve the effectiveness of information extraction.Due to the lack of uniform and standard Chinese nested named entity corpora,currently it is difficult to compare the research works on Chinese nested named entities.Based on the existing named entity corpora,this paper proposes to use semi-automatic method to construct two Chinese nested named entity corpora.First,we use the annotation information in the Chinese named entity corpora to automatically construct as many nested named entities as possible,and then manually adjust them to meet our annotation requirements for Chinese nested entity in order to build high-quality Chinese nested named entity corpora.The preliminary experiment of nested named entity recognition both within and across the corpora shows that Chinese nested named entity recognition is still a quite difficult problem and requires further research.
关 键 词:中文嵌套命名实体识别 条件随机场 信息抽取 语料库
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.44