融合词典特征的Bi-LSTM-WCRF中文人名识别  被引量:7

Bi-LSTM-WCRF Incorporating Dictionary Feature for Chinese Person Name Recognition

在线阅读下载全文

作  者:成于思[1] 施云涛 CHENG Yusi;SHI Yuntao(School of Civil Engineering,Southeast University,Nanjing,Jiangsu 210096,China;Cloud Computing Research Center,Suning Technology Corporation,Nanjing,Jiangsu 210042,China)

机构地区:[1]东南大学土木工程学院,江苏南京210096 [2]苏宁科技集团云计算研发中心,江苏南京210042

出  处:《中文信息学报》2020年第4期69-76,共8页Journal of Chinese Information Processing

基  金:国家自然科学基金(71601047);中国博士后科学基金(2015M581706)

摘  要:受限于标注语料的领域和规模以及类别不均衡,中文人名识别性能偏低。相比人名识别训练语料,人名词典获取较为容易,利用词典提升人名识别性能有待进一步研究。该文提取人名词典特征,融入到双向长短期记忆(Bi-LSTM)网络模型中,在损失函数中提高人名标签权重,设计加权条件随机场(WCRF)。从人名词典中获取姓和名相关的特征信息,Bi-LSTM网络捕获句子中上下文信息,WCRF提高人名识别的召回率。在《人民日报》语料和工程法律领域语料上进行实验,结果表明:在领域测试语料上,与基于隐马尔可夫模型的方法相比,人名识别的F1值提高18.34%,与传统Bi-LSTM-CRF模型相比,召回率提高15.53%,F1提高8.83%。WCRF还可以应用到其他类别不均衡的序列标注或分类问题中。Chinese person name recognition is restricted by the domain and size of the existing annotated corpus and the issue of class imbalance. Person name dictionaries and domain dictionaries are more easily achieved than humanly annotated training corpus. This article incorporates dictionaries into bi-directional long short-term memory(Bi-LSTM) networks with weighted conditional random field layer(WCRF). The model extracts the possibility of family name and given name from personal name dictionaries. The domain dictionaries provide information on human names. Bi-LSTM captured context information and weighted conditional random field improved recall of personal name recognition. Experiments on People’s Daily corpus and construction law corpus show that, compared with the existing method based on hidden Markov model, the F1 value of personal name recognition is improved by 18.34%;compared with traditional Bi-LSTM-CRF model, Recall value increases by 15.53% and F1 value increases by 8.83%.

关 键 词:人名识别 双向长短期记忆网络 加权条件随机场 词典特征 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术] TP183[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象