检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:肖计划[1]
机构地区:[1]信息工程大学,河南郑州450001
出 处:《测绘科学技术学报》2014年第4期408-412,共5页Journal of Geomatics Science and Technology
基 金:国家自然科学基金项目(41201391);河南省科技创新人才计划(13400510001)
摘 要:建立了一个试验用地名库和地理语料库,在此基础上构建对地名用字可信度的统计分析模型。通过分析地名在中文文档中的使用习惯和规律,总结出经常与地名一起使用的且具有地名指示含义的辅助字或词,以此为基础建立地名识别辅助词词库和地名识别的规则库。对地名库和地理语料库的用字进行统计分析,通过设定地名用字可信度概率阈值和辅助词指示作用对文本中潜在地名进行初步的筛选形成候选地名;在粗筛选产生的候选地名基础上结合地名识别规则进一步确认,以提高地名识别的准确率。A Chinese place names library and geographical corpus library were established, and a statistical analysis model of the word credibility was constructed on the basis of analysis of the habits and patterns of place names in Chinese document. Summary was made that place names was often used in conjunction with the instructions and had the meaning of the place auxiliary word or phrase to form an auxiliary word thesaurus. By setting support statistical model probability threshold indicative of place names in the text preliminary recognition of potential candidates for place names ensured a higher recall rate. After establishing automatic recognition of geographical names rules, further to determine the candidate place names were determined and improved recognition accuracy.
关 键 词:地名识别 文本挖掘 信息提取 地名统计模型 地理语料库
分 类 号:P208[天文地球—地图制图学与地理信息工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.91