基于顺序读取的分布式top-k查询算法  

Distributed top-k query processing algorithm based on sequential-read

在线阅读下载全文

作  者:毕方明[1] 陈伟[1] 杨魁[1] 车奔 

机构地区:[1]中国矿业大学(徐州)计算机科学与技术学院,江苏徐州221116

出  处:《计算机应用》2015年第A01期69-73,共5页journal of Computer Applications

基  金:国家自然科学基金资助项目(60970032);江苏省自然科学基金资助项目(BK2007035)

摘  要:top-k查询是一种被广泛应用的操作,通过把已有top-k算法作为分析和研究的基础,根据现有算法所存在的不足提出自己的解决方案。提出SRTA(Sequential-Read Threshold Algorithm),相比NRA算法对数据的存储进行了重新的规划,创建一个新的表将内存上的开销转换到较廉价的外存开销,只需顺序读取就可以进行有效的top-k查询,同时将表进行了划分,在并行处理的情况下更能提高程序的效率,能够很好地运行在内存有限的环境中。在SRTA基础上提出的DSRTA(Distributed Sequential-Read Threshold Algorithm),适用于分布式环境中。DSRTA先采用ID划分的方式把原有数据集划分为多个子空间,然后再进行数据规划,发挥分布式的性能优势,进一步提高了SRTA的查询效率。Top-k query is a widely used operation. This paper took the existing algorithms as the basis of analysis and research, and put forward solutions to solving the problems of the existing algorithms. Compared with the NRA ( No Random Access) algorithm, the SRTA ( Sequential-Read Threshold Algorithm) which proposed in this paper replanted the data storage mode, which created a new table to switch the memory overhead to the cheaper external memory overhead, so just sorted access was also able to do efficient top-k query. Meanwhile, the table was divided, which made the algorithm more efficient and smoother even with limited memory, in the case of parallel processing. DSRTA ( Distributed SRTA) algorithm applies to the distributed environment, which is designed on the basis of SRTA. The original data set was divided into more than one spaces in the way of ID division by DSRTA, and then replanted the data storage mode. By taking advantages of the distributed system performance, the query efficiency of SRTA was further improved.

关 键 词:分布式 数据存储 数据划分 顺序读取 内存有限 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象