中文嵌套命名实体识别语料库的构建  被引量:14

Chinese Nested Named Entity Recognition Corpus Construction

在线阅读下载全文

作  者:李雁群 何云琪 钱龙华 周国栋[1,2] LI Yanqun;HE Yunqi;QIAN Longhua;ZHOU Guodong(Natural Language Processing Laboratory,Soochow University,Suzhou,Jiangsu 215006,China;School of Computer Science and Technology,Sooehow University,Suzhou,Jiangsu 215006,China)

机构地区:[1]苏州大学自然语言处理实验室,江苏苏州215006 [2]苏州大学计算机科学与技术学院,江苏苏州215006

出  处:《中文信息学报》2018年第8期19-26,共8页Journal of Chinese Information Processing

基  金:国家自然科学基金(61373096;61331011;61673290)

摘  要:嵌套命名实体含有丰富的实体和实体间语义关系,有助于提高信息抽取的效率。由于缺少统一的标准中文嵌套命名实体语料库,目前中文嵌套命名实体的研究工作难于比较。该文在已有命名实体语料的基础上采用半自动化方法构建了两个中文嵌套命名实体语料库。首先利用已有中文命名实体语料库中的标注信息自动地构造出尽可能多的嵌套命名实体,然后再进行手工调整以满足对中文嵌套实体的标注要求,从而构建高质量的中文嵌套命名实体识别语料库。语料内和跨语料嵌套实体识别的初步实验表明,中文嵌套命名实体识别仍是一个比较困难的问题,需要进一步研究。Nested named entities contain rich entities and semantic relations between them,which facilitates to improve the effectiveness of information extraction.Due to the lack of uniform and standard Chinese nested named entity corpora,currently it is difficult to compare the research works on Chinese nested named entities.Based on the existing named entity corpora,this paper proposes to use semi-automatic method to construct two Chinese nested named entity corpora.First,we use the annotation information in the Chinese named entity corpora to automatically construct as many nested named entities as possible,and then manually adjust them to meet our annotation requirements for Chinese nested entity in order to build high-quality Chinese nested named entity corpora.The preliminary experiment of nested named entity recognition both within and across the corpora shows that Chinese nested named entity recognition is still a quite difficult problem and requires further research.

关 键 词:中文嵌套命名实体识别 条件随机场 信息抽取 语料库 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象