Efficient String Similarity Search on Disks  

在线阅读下载全文

作  者:Jinbao Wang Donghua Yang 

机构地区:[1]The Academy of Fundamental and Interdisciplinary Sciences,Harbin Institute of Technology,Harbin,150080,China

出  处:《国际计算机前沿大会会议论文集》2015年第1期15-16,共2页International Conference of Pioneering Computer Scientists, Engineers and Educators(ICPCSEE)

基  金:This work is funded by Project (No. 61272046) supported by the National Natural Science Foundation of China; Project supported by the Natural Science Foundation of Heilongjiang Province,China(Grant No. F201317); The Fundamental Research Funds for the Central University (Grant No. HIT.NSRIF.2015065); China Postdoctoral Science Foundation Funded Project(Grant No. 2013T60372, 2014M561351).

摘  要:String similarity search is a basic operation for various applications,such as data cleaning, spell checking, bioinformatics and information integration. Memory based q-gram inverted indexes fail to support string similarity search over large scale string datasets due to the memory limitation, and it can no longer work if the data size grows beyond the memory size. In the era of big data, large string dataset are quite common. Existing external memory method, Behm-Index, only supports length-filter and prefix filter. This paper proposes LPA-Index to reduce I/O cost for better query response time, and LPA-Index is a disk resident index which suffers no limitation on data size compared to memory size. LPA-Index supports multiple filters to reduce query candidates effectively, and it adaptively reads inverted lists during query processing for better I/O performance. Experiment results demonstrate the efficiency of LPA-Index and its advantages over existing state-of-art disk index Behm-Index with regard to I/O cost and query response time.

关 键 词:We WOULD like to encourage you to LIST your KEYWORDS within the ABSTRACT section 

分 类 号:C5[社会学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象