基于Lucene的地名数据库快速检索系统  被引量:20

Place name database quick searching system based on Lucene

在线阅读下载全文

作  者:张文元[1] 周世宇[1] 谈国新[1] 

机构地区:[1]华中师范大学国家文化产业研究中心,武汉430079

出  处:《计算机应用研究》2017年第6期1756-1761,共6页Application Research of Computers

基  金:国家科技支撑计划资助项目(2012BAH83F00)

摘  要:针对传统关系型数据库海量地名数据检索效率低下的问题,提出了一种盘古分词和Lucene全文检索相结合的地名数据库快速检索方法。首先,设计了一种地名数据表结构,比较了几种常用开源分词器的中文分词性能,并选用性能优异的盘古中文分词器,通过扩展其词典来实现中文地名的有效分词。其次,利用内存索引和多线程并行处理技术提高Lucene创建倒排索引效率,并依据地名类别和显示优先级属性优化了检索结果相关度排序策略。最后,开发了一套具有快速搜索和地图定位展示的Web地名检索系统,使用500万条真实地名数据测试了其检索性能,查询平均耗时不到1s,比MySQL数据库模糊检索效率提高了15倍,匹配结果也更加准确,能够提供高效灵活的海量地名公共检索服务。To avoid the low efficiency in massive place names searching in the traditional relational database, this paper proposed a fast place name database retrieval method with the integration of PanGuAnalyzer and Lucene full-text search toolbox. Firstly, it designed a place name data structure, and compared the segmentation performances of several open source Chinese analyzers. Based on the results, it integrated the excellent PanguAnalyzer with a rich place dictionary into Lucene so as to improve the effect of Chinese place name segmentation. To improve the efficiency of creating inverted index, it adopted memory index and multi-thread parallel processing. It also optimized the query result ranking strategy based on similarity scoring ac- cording to the category and display priority attributes of place names. Finally, it developed a place name searching system, which integrated various functions including place name searching, visualization, and location service. More than 5 000 000 real place name records were used to test the performance of the new searching technique. By comparing with the searching results of fuzzy query method based on MySQL database, the average response time of the new method was less than one second, and it was nearly fifteen times faster than the database retrieval. The new proposed full-text search strategy demonstrates its advantage in terms of accuracy and rapid response, and it can provide efficient and flexible public place name search service.

关 键 词:LUCENE 地名 全文检索 数据库 中文分词 相关度排序 

分 类 号:TP311.13[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象