检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:郁汀 王铎 陈钦 YU Ting;WANG Duo;CHEN Qin(The Third Research Institute of Ministry of Public Security,Shanghai 200031,China;Fudan University,Shanghai 200433,China)
机构地区:[1]公安部第三研究所,上海200031 [2]复旦大学,上海200433
出 处:《测绘通报》2022年第3期101-106,共6页Bulletin of Surveying and Mapping
摘 要:地址匹配中,由于传统相似度模型受字符重叠数影响大,在处理简写、缩写地址要素单元时,错误匹配问题突出;深度学习方法需要大量样本支撑,但庞大的数据量和多样的形式,导致生成样本的成本过高。为解决上述问题,本文首先应用基于条件随机场和双向长短时记忆神经网络的模型,对地址进行分词;然后通过建立一种伪语义相似度,对地址要素进行分级匹配。通过对公安业务中地址数据进行测试,在对缩写、简写等不规范地址描述方面,本文模型能较理想地完成任务,各参考指标均高于0.9。Due to various ways to express the address element such as abbreviation and logogram,address matching is a difficult task specially in Chinese address matching.One important address matching method is relying on similarity.However,these traditional similarity methods focused on the overlap characters,and could not deal with the situation.The other crucial and useful method is based on deep learning technology,but it is difficult to generate a large amount of learning samples.In this paper,Bi-directional long short-term memory conditional random field is applied to achieve the goal of Chinese address segmentation.Then,a new similarity named pseudo-semantic is constructed to solve the problem of abbreviation and logogram.According to current results,the pseudo-semantic similarity can provide better performance than other similarity models in the matching process and its recall and precision are both reaching 0.9 on the test set.The samples proved that the pseudo-semantic can recognize the abbreviation and logogram of address elements.
关 键 词:条件随机场和双向长短时记忆神经网络 地址要素解析 伪语义相似度 地址匹配 地址标准化
分 类 号:P208[天文地球—地图制图学与地理信息工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.145.165.235