检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:张琛 陈张建 刘江涛 任福[2] 张红伟[2] ZHANG Chen;CHEN Zhangjian;LIU Jiangtao;REN Fu;ZHANG Hongwei(Key Laboratory of Urban Land Resources Monitoring and Simulation,Ministry of Land and Resources,Shenzhen,Guangdong 518034,China;School of Resource and Environmental Sciences,Wuhan University,Wuhan 430079,China;Zhejiang Academy of Surveying and Mapping,Hangzhou 311000,China;Information Center of Planning Land Real-Estate of Shenzhen,Shenzhen,Guangdong 518040,China)
机构地区:[1]国土资源部城市土地资源监测与仿真重点实验室,广东深圳518034 [2]武汉大学资源与环境科学学院,武汉430079 [3]浙江省测绘科学技术研究院,杭州311000 [4]深圳市规划国土房产信息中心,广东深圳518040
出 处:《测绘科学》2021年第10期185-193,共9页Science of Surveying and Mapping
基 金:国土资源部城市土地资源监测与仿真重点实验室开放基金资助课题项目(KF201602028)。
摘 要:为提高地理编码系统对输入地址的分词适应性及匹配准确度,该文基于Lucene索引及查询机制提出了一种可适应于中文非标准地址的地址匹配改进方法。首先依据中文地址模式创建地址元素分层索引库,然后将拼音三叉树、同义词配置、未登录词配置等功能集成于IK分词器,获得初次匹配结果集合后计算编辑距离并排序选取返回值。匹配系统以浙江省台州市公安地址及行政法人地址为数据基础构建分词库和索引库,结果表明,该方法可实现输入地址的自适应分词,对中文非标准地址的匹配效果良好,能够服务于测绘和地理信息的相关应用场景。For improving segmentation adaptability and matching accuracy of the input address in geocoding,an improved method of address matching that can adapt to Chinese non-standard addresses was proposed based on Lucene index and query mechanism.Firstly,the method created a hierarchical index library of address elements in view of Chinese address patterns.Secondly,default tokenizer was transformed into a tokenizer with compound functions,including the ternary search trie composed of pinyin,synonym configuration and unregistered word recognition.Finally,Levenshtein distance would be introduced as an indicator of the results after obtaining the first matching set.The matching system built the word segmentation database and index database from address corpus of public security bureau and legal entity of administration in Taizhou city,Zhejiang province.The results indicated that this method could realize the adaptive word segmentation of the input address,and it had a significant matching effect for Chinese non-standard addresses.It provides theoretical and practical support for related application on surveying and mapping and geographic information.
关 键 词:地址匹配 地理编码 地址树模型 Lucene全文检索 地址分词 中文非标准地址 地址标准化
分 类 号:P208[天文地球—地图制图学与地理信息工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.117.135.125