双层CRF与规则相结合的中文地名识别方法研究  被引量:9

RESEARCH ON CHINESE TOPONYM RECOGNITION METHOD WITH TWO-LAYER CRF AND RULES COMBINATION

在线阅读下载全文

作  者:孙虹[1] 陈俊杰[1] 

机构地区:[1]太原理工大学科学与技术学院,山西太原030024

出  处:《计算机应用与软件》2014年第11期175-177,182,共4页Computer Applications and Software

基  金:国家重点开放实验室课题项目(SKLSE2012-09-30)

摘  要:采用一种基于双层CRF模型与规则相结合的方法提高中文地名的识别性能。第一层CRF模型使用单字特征识别地名,将其结果添加至词典。第二层CRF模型利用词性、左指界词、右指界词和处理后的词典特征对地名进行识别。最后利用规则对识别结果进行过滤修剪和补召。通过双层CRF模型获取文本的远距离特征,解决了同一词汇因位置不同而标记不一致的问题,结合依据地名语言学特点制定的规则提高召回率。实验表明,双层CRF与规则相结合的方法对中文地名的识别取得了较好的效果。对Bakeoff2007的MSRA语料进行开放测试,得到的准确率、召回率、F值分别为95.32%、90.34%、94.12%。We use a method which is based on the combination of two-layer CRF model and rules to improve the performance of Chinese toponym recognition.The first layer of CRF model uses the single character feature to recognise the placenames,and adds the recognition results to the dictionary.The second layer of CRF model recognises the placenames by using four features including the part of speech,the word referring the left word boundary,the word referring the right word boundary and the processed dictionary characteristics.Finally,rules are utilised to filtering,trimming and supplementing the recognition result.Through two-layer CRF model to acquire long-distance feature of the text,we solve the problem of inconsistent markup of the same word due to its different position,and the recall rate is increased by combining the rules made according to the features of the toponymic linguistics.Experiment shows that the method of combining the two-layer CRF with the rules achieves preferable good effect on Chinese toponym recognition,and the open test on MSRA corpus of the Bakeoff 2007 reaches the accuracy of 95.32%,recall rate of 90.34% and F number of 94.12% respectively.

关 键 词:自然语言处理 中文地名识别 双层CRF模型 规则 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象