一种结合LDA主题分析的地理信息检索方法  被引量:4

An Approach for Geographical Information Retrieval with LDA Topic Analysis

在线阅读下载全文

作  者:盖森[1] 刘建忠[1] 熊伟[1] 孙晨[1] 张心悦[1] 

机构地区:[1]信息工程大学,河南郑州450001

出  处:《测绘科学技术学报》2015年第3期315-320,共6页Journal of Geomatics Science and Technology

基  金:国家自然科学基金项目(41471337)

摘  要:地理信息检索可以根据用户查询请求在文档集中检索出与其空间相关的信息,是信息检索领域一个重要的研究方向。传统的地理信息检索模型,将地理信息与主题信息分开考虑,忽略了两者之间的关系。针对该问题,提出一种结合LDA主题分析的地理信息检索改进方法。首先,通过LDA主题分析对检索文档集进行噪音剔除,然后挖掘查询请求和检索文档中地理信息和主题信息之间的关系,相似度计算采用夹角余弦和KL距离两种计算方法,并附加到查询请求和检索文档之间的相似度计算当中。此处对搜狗文本分类语料库精简版和复旦文本分类测试语料库进行了LDA主题分析,并进行了检索测试。实验表明改进模型能够较好地衡量地理信息与主题信息之间的关系,提高了检索的查准率。Geographical information retrieval can be used to retrieve spatial related information from the document collection according to the user query, which is an important research area of information retrieval. Conventional geographical information retrieval model deals with the geographical information and topic information separately,which ignores their relationship. Aiming at this problem, an improved settlement was put forward. At first, LDA topic analysis was used to remove noise of words and then exploit the relationship between geographical information and topic information in the query request and documents. Similarity was calculated through cosine and KL distance, which was attached to the comprehensive similarity. To terrify this method, the lite version of Sogou text categorization corpus and the Fudan text categorization testing corpus were applied with LDA topic analysis. The experimental results reveal that the improved model can measure the relationship between geographical information and topic information acceptably and improve the precision ratio.

关 键 词:地理信息检索 主题模型 隐性狄利克雷分配 相似度 查准率 

分 类 号:P208[天文地球—地图制图学与地理信息工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象