地名识别与匹配的概率统计方法  被引量:8

Method of Recognition and Match of Place Name Based on Statistic

在线阅读下载全文

作  者:肖计划[1] 

机构地区:[1]信息工程大学,河南郑州450001

出  处:《测绘科学技术学报》2014年第4期408-412,共5页Journal of Geomatics Science and Technology

基  金:国家自然科学基金项目(41201391);河南省科技创新人才计划(13400510001)

摘  要:建立了一个试验用地名库和地理语料库,在此基础上构建对地名用字可信度的统计分析模型。通过分析地名在中文文档中的使用习惯和规律,总结出经常与地名一起使用的且具有地名指示含义的辅助字或词,以此为基础建立地名识别辅助词词库和地名识别的规则库。对地名库和地理语料库的用字进行统计分析,通过设定地名用字可信度概率阈值和辅助词指示作用对文本中潜在地名进行初步的筛选形成候选地名;在粗筛选产生的候选地名基础上结合地名识别规则进一步确认,以提高地名识别的准确率。A Chinese place names library and geographical corpus library were established, and a statistical analysis model of the word credibility was constructed on the basis of analysis of the habits and patterns of place names in Chinese document. Summary was made that place names was often used in conjunction with the instructions and had the meaning of the place auxiliary word or phrase to form an auxiliary word thesaurus. By setting support statistical model probability threshold indicative of place names in the text preliminary recognition of potential candidates for place names ensured a higher recall rate. After establishing automatic recognition of geographical names rules, further to determine the candidate place names were determined and improved recognition accuracy.

关 键 词:地名识别 文本挖掘 信息提取 地名统计模型 地理语料库 

分 类 号:P208[天文地球—地图制图学与地理信息工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象