检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:成于思[1] 施云涛 CHENG Yusi;SHI Yuntao(School of Civil Engineering,Southeast University,Nanjing,Jiangsu 210096,China;Cloud Computing Research Center,Suning Technology Corporation,Nanjing,Jiangsu 210042,China)
机构地区:[1]东南大学土木工程学院,江苏南京210096 [2]苏宁科技集团云计算研发中心,江苏南京210042
出 处:《中文信息学报》2020年第4期69-76,共8页Journal of Chinese Information Processing
基 金:国家自然科学基金(71601047);中国博士后科学基金(2015M581706)
摘 要:受限于标注语料的领域和规模以及类别不均衡,中文人名识别性能偏低。相比人名识别训练语料,人名词典获取较为容易,利用词典提升人名识别性能有待进一步研究。该文提取人名词典特征,融入到双向长短期记忆(Bi-LSTM)网络模型中,在损失函数中提高人名标签权重,设计加权条件随机场(WCRF)。从人名词典中获取姓和名相关的特征信息,Bi-LSTM网络捕获句子中上下文信息,WCRF提高人名识别的召回率。在《人民日报》语料和工程法律领域语料上进行实验,结果表明:在领域测试语料上,与基于隐马尔可夫模型的方法相比,人名识别的F1值提高18.34%,与传统Bi-LSTM-CRF模型相比,召回率提高15.53%,F1提高8.83%。WCRF还可以应用到其他类别不均衡的序列标注或分类问题中。Chinese person name recognition is restricted by the domain and size of the existing annotated corpus and the issue of class imbalance. Person name dictionaries and domain dictionaries are more easily achieved than humanly annotated training corpus. This article incorporates dictionaries into bi-directional long short-term memory(Bi-LSTM) networks with weighted conditional random field layer(WCRF). The model extracts the possibility of family name and given name from personal name dictionaries. The domain dictionaries provide information on human names. Bi-LSTM captured context information and weighted conditional random field improved recall of personal name recognition. Experiments on People’s Daily corpus and construction law corpus show that, compared with the existing method based on hidden Markov model, the F1 value of personal name recognition is improved by 18.34%;compared with traditional Bi-LSTM-CRF model, Recall value increases by 15.53% and F1 value increases by 8.83%.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.3