基于统计的中文地址位置语义解析方法研究  被引量:8

The Method of Semantic Resolution of Chinese Addresses Based on Statistics

在线阅读下载全文

作  者:谢婷婷 严柯 XIE Ting-ting YAN Ke(Hubei Key Laboratory of Intelligent Robot School of Computer Science and Engineering, Wuhan Institute of Technology, Wuhan 430205, China)

机构地区:[1]智能机器人湖北省重点实验室 [2]武汉工程大学计算机科学与工程学院,湖北武汉430205

出  处:《软件导刊》2017年第10期19-21,共3页Software Guide

摘  要:为获取中文自然地址描述语句中的位置信息,提出一种不依赖于词典的中文地址分词方法。首先根据地址语料库中字串共现的统计规律统计词频,然后对地名地址串进行正则表达式预处理,再对地址串进行全切分处理。通过互信息和信息熵得到最优粗分结果,通过置信度对粗分结果进行过滤得到最优分词结果。实验结果表明,该方法在不依赖词典的情况下能有效实现对地名地址串的拆分,正确率和召回率分别达到了80.03%和89.28%。In order to obtain the Chinese natural address and describe the location information in the sentence, a chinese address segmentation method without dictionary is proposed. Firstly, the word frequency is calculated according to the statistical rules of the string appear together in the address corpus, then the address string is processed by regular expression, and address string is full segmentation processed, then the optimal coarse segmentation is obtained by mutual information and information entropy. Finally, the results are filtered by confidence, and the optimal results are obtained. Experiments show that this method can effectively segment Chinese address without dictionary, the results show that the accuracy rat can reach 80.03% and recall rate the can reach 89.28%.

关 键 词:中文分词 地名地址分词 互信息 信息熵 

分 类 号:TP301[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象