基于伪语义相似度模型的中文地址匹配方法  被引量:3

A Chinese addresses matching method based on the pseudo-semantic model

在线阅读下载全文

作  者:郁汀 王铎 陈钦 YU Ting;WANG Duo;CHEN Qin(The Third Research Institute of Ministry of Public Security,Shanghai 200031,China;Fudan University,Shanghai 200433,China)

机构地区:[1]公安部第三研究所,上海200031 [2]复旦大学,上海200433

出  处:《测绘通报》2022年第3期101-106,共6页Bulletin of Surveying and Mapping

摘  要:地址匹配中,由于传统相似度模型受字符重叠数影响大,在处理简写、缩写地址要素单元时,错误匹配问题突出;深度学习方法需要大量样本支撑,但庞大的数据量和多样的形式,导致生成样本的成本过高。为解决上述问题,本文首先应用基于条件随机场和双向长短时记忆神经网络的模型,对地址进行分词;然后通过建立一种伪语义相似度,对地址要素进行分级匹配。通过对公安业务中地址数据进行测试,在对缩写、简写等不规范地址描述方面,本文模型能较理想地完成任务,各参考指标均高于0.9。Due to various ways to express the address element such as abbreviation and logogram,address matching is a difficult task specially in Chinese address matching.One important address matching method is relying on similarity.However,these traditional similarity methods focused on the overlap characters,and could not deal with the situation.The other crucial and useful method is based on deep learning technology,but it is difficult to generate a large amount of learning samples.In this paper,Bi-directional long short-term memory conditional random field is applied to achieve the goal of Chinese address segmentation.Then,a new similarity named pseudo-semantic is constructed to solve the problem of abbreviation and logogram.According to current results,the pseudo-semantic similarity can provide better performance than other similarity models in the matching process and its recall and precision are both reaching 0.9 on the test set.The samples proved that the pseudo-semantic can recognize the abbreviation and logogram of address elements.

关 键 词:条件随机场和双向长短时记忆神经网络 地址要素解析 伪语义相似度 地址匹配 地址标准化 

分 类 号:P208[天文地球—地图制图学与地理信息工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象