检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:谢婷婷 严柯 XIE Ting-ting YAN Ke(Hubei Key Laboratory of Intelligent Robot School of Computer Science and Engineering, Wuhan Institute of Technology, Wuhan 430205, China)
机构地区:[1]智能机器人湖北省重点实验室 [2]武汉工程大学计算机科学与工程学院,湖北武汉430205
出 处:《软件导刊》2017年第10期19-21,共3页Software Guide
摘 要:为获取中文自然地址描述语句中的位置信息,提出一种不依赖于词典的中文地址分词方法。首先根据地址语料库中字串共现的统计规律统计词频,然后对地名地址串进行正则表达式预处理,再对地址串进行全切分处理。通过互信息和信息熵得到最优粗分结果,通过置信度对粗分结果进行过滤得到最优分词结果。实验结果表明,该方法在不依赖词典的情况下能有效实现对地名地址串的拆分,正确率和召回率分别达到了80.03%和89.28%。In order to obtain the Chinese natural address and describe the location information in the sentence, a chinese address segmentation method without dictionary is proposed. Firstly, the word frequency is calculated according to the statistical rules of the string appear together in the address corpus, then the address string is processed by regular expression, and address string is full segmentation processed, then the optimal coarse segmentation is obtained by mutual information and information entropy. Finally, the results are filtered by confidence, and the optimal results are obtained. Experiments show that this method can effectively segment Chinese address without dictionary, the results show that the accuracy rat can reach 80.03% and recall rate the can reach 89.28%.
分 类 号:TP301[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.81