地名地址基因的网页文本地名地址提取算法  被引量:5

Extraction algorithm of place name and address with text format in web pages based on the place name and address gene

在线阅读下载全文

作  者:杜中波[1] 刘新[1,2] 宋婷婷 梁冰 周新宇 DU Zhongbo;LIU Xin;SONG Tingting;LIANG Bing;ZHOU Xinyu(College of Geomatics,Shandong University of Science and Technology,Qingdao,Shandong 266590,China;Key Laboratory of Fundamental Geographic Information and Digital Technology of Shandong Province, Shandong University of Science and Technology,Qingdao,Shandong 266590,China;Chinese Academy of Surveying and Mapping, Beijing 100036, China;Urban Planning Management Information Center of Beijing Xicheng District, Beijing 100035, China)

机构地区:[1]山东科技大学测绘科学与工程学院,山东青岛266590 [2]山东科技大学山东省基础地理信息与数字化技术重点实验室,山东青岛266590 [3]中国测绘科学研究院,北京100036 [4]北京市西城区规划管理信息中心,北京100035

出  处:《测绘科学》2019年第4期196-202,共7页Science of Surveying and Mapping

基  金:测绘地理信息公益性行业科研专项(201512020);中国测绘科学研究院基本科研业务费项目(7771607);西城区科技项目(SD2015-25)

摘  要:针对网页文本蕴含着丰富的地名地址空间信息,但因其描述的随机性、多样性,导致信息很难被快速、准确地识别出来的问题。该文在分析网页文本中地名地址组成特点的基础上,考虑地名地址的事件属性,提出了一种基于"地名地址基因"的信息提取方法,依据事件相关度、地名地址的字符长度等提取因子建立提取规则树获取目标地名地址。实际数据测试表明该方法在地名地址提取上更具针对性,提高了效率和准确率。Aiming at the problem that web text contains a wealth of address space information,but it is difficult to identify and extract because the address are described randomly and diversely.This paper presented a new method for the address extraction based on the the place name and address genes library after analyzing the characteristics of them.In this paper,a extraction rule tree was established according to event attributes,character length and word frequency of the address.The actual data tests showed that the method was more specific,and the efficiency and accuracy were improved.

关 键 词:地名地址基因 网页信息 事件属性 规则树 

分 类 号:P208[天文地球—地图制图学与地理信息工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象