混合神经网络模型与注意力机制的地址匹配算法  被引量:4

An address matching algorithm based on hybrid neural network model and attention mechanism

在线阅读下载全文

作  者:陈健鹏 陈剑 佘祥荣 水新莹 陈刚 CHEN Jian-peng;CHEN Jian;SHE Xiang-rong;SHUI Xin-ying;CHEN Gang(Yangtze River Delta Information Intelligence Innovation Research Institute,Wuhu 241000,China)

机构地区:[1]长三角信息智能创新研究院,安徽芜湖241000

出  处:《计算机工程与科学》2022年第5期901-909,共9页Computer Engineering & Science

基  金:安徽省重点研究与开发计划(202104a05020071)。

摘  要:中文地名地址的标准化在当前智慧城市的建设中起到至关重要的作用。传统的地名地址标准化技术通常使用基于文本字符层面的相似度计算或规则库匹配的方法,对复杂、特殊或冗余地址的处理效果较差。通过将地址标准化任务转换为针对地址相似的匹配度计算任务,提出了一种融合注意力机制与多层次语义表征的地址匹配算法。首先依据地址文本特殊的语法结构,利用Trie语法树构建标准地址树;而后基于注意力机制,利用Bi-LSTM网络与CNN网络生成地址对的多层次语义表示;最后通过曼哈顿距离计算相似度。在自主构建的数据集上,提出的SGAM模型的匹配准确度(91.22%)相比TextRCNN、FastText、基于注意力的卷积神经网络(ABCNN)等模型提升了4%~10%,表明SGAM模型在地址匹配任务上有着更好的性能表现。The standardization of Chinese geographic addresses plays a crucial role in the current construction of smart cities.The traditional geographic address standardization technology usually uses the methods of similarity calculation or rule base matching based on the text character level,and the processing effect of complex,special or redundant addresses is poor.This paper proposes an address match-ing algorithm that combines attention mechanism and multi-level representation by converting the address standardization task into a matching degree calculation task for similar addresses.Firstly,according to the special grammatical structure of the address text,a standard address tree is constructed by using the Trie grammatical tree.Secondly,based on the attention mechanism,the Bi-LSTM network and the CNN network are used to generate multi-level semantic representations of address pairs.Finally,the similarity is calculated by Manhattan distance.On the self-built dataset,the proposed SGAM(Symmetrical Geographic Address Matching) model improves the matching accuracy(91.22%) by 4%~10% in comparison to TextRCNN,FastText,attention-based convolutional neural network(ABCNN) and other models,proving that the SGAM model has better performance on the address matching task.

关 键 词:地名地址 文本相似度计算 注意力机制 混合神经网络 智慧城市 

分 类 号:TP181[自动化与计算机技术—控制理论与控制工程] TP183[自动化与计算机技术—控制科学与工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象