非均权-动态规划地址匹配算法设计与实现  

Design and Implementation of Non-equal Power-dynamic Programming Address Matching Algorithm

在线阅读下载全文

作  者:徐嘉康 张晨 王柳静[1] 张贵军[1] XU Jia-kang;ZHANG Chen;WANG Liu-jing;ZHANG Gui-jun(College of Information Engineering,Zhejiang University of Technology,Hangzhou 310023,China)

机构地区:[1]浙江工业大学信息工程学院,杭州310023

出  处:《小型微型计算机系统》2022年第3期530-535,共6页Journal of Chinese Computer Systems

基  金:国家自然科学基金项目(61573317)资助。

摘  要:传统的地址匹配方法往往难以胜任中文地址匹配问题.首先,每个中文单字都是独立整体,在纠错上难度大于英文,其次中文地址体系结构复杂,缺乏一个统一的标准.本文结合生物信息领域的序列比对思想,提出了一种基于动态规划的中文地址匹配方法.该方法将中文单字看成字符单元,对中文地址进行序列化,改进Smith-waterman算法进行序列匹配.针对中文的单字特点,统计区分文字的重要性差异,构建非均权打分策略;引入空分罚分策略,解决错误匹配及其过度拟合问题;使用排序均一化策略,优化了排序效率,增加了结果集的多样性.最后,将本算法应用于杭州市实际路网(1:30万),实验结果表明,该算法可以有效提升中文地址匹配精度.Conventional address matching methods are often difficult to handle Chinese address matching problems.First,each Chinese word is an independent whole,and it is more difficult to correct errors than in English.Secondly,the Chinese address system is complex and lacks a unified standard.In this paper,combining the idea of sequence comparison in the field of bioinformatics,a Chinese address matching method based on dynamic programming is proposed.This method regards Chinese single characters as character units,serializes Chinese addresses,and improves the Smith-waterman algorithm for sequence matching.According to the characteristics of Chinese single characters,the importance of distinguishing characters is statistically distinguished,and the non-equal weight scoring strategy is constructed;the gap penalty strategy is introduced to solve the problem of mismatch and overfitting;the uniformization strategy is used to optimize the sorting efficiency and increase the diversity of the result set.Finally,the algorithm is applied to the real road network in Hangzhou(1:300,000).The experimental results show that the algorithm can effectively improve the accuracy of Chinese address matching.

关 键 词:动态规划 中文地址匹配 地址树 空位罚分 置换矩阵 序列比对 

分 类 号:TP301[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象