大数据高性能排序算法的设计与实现  被引量:6

Design and Implementation of High Performance Ranking Algorithm for Big Data

在线阅读下载全文

作  者:陈洪雁[1] 万俊伟[1] 汪琦[1] 

机构地区:[1]北京跟踪与通信技术研究所,北京100094

出  处:《飞行器测控学报》2015年第2期120-127,共8页Journal of Spacecraft TT&C Technology

摘  要:针对大数据排序算法的需求,提出了基于任务驱动的并行排序算法。该算法采用任务驱动、AIO(Asynchronous Input/Output,异步输入/输出)和双缓冲区机制等技术充分利用系统资源;通过构造等价排序键,优化快速排序算法;并在算法实现上,采用多线程处理任务,通过控制线程个数控制并行度。综合利用这些技术,该算法使得大数据的排序性能接近理论极限值,在CPU(Central Processing Unit,中央处理器)资源充裕的情况下,利用异步压缩技术,还可以突破这一极限,最终实现的系统2 000s就可以对超过500Gbyte的磁盘数据做一次完整的排序。在数据库设计中充分利用此思想,将会实现连接和线程的分离,数据库将可以支持更大的连接数,从而提高数据库支持的并发度。A task-driving parallel ranking algorithm is proposed to meet demands for ranking algorithms for big data.Task-driving,AIO (Asynchronous Input and Output)and dual-buffer zone mechanisms are employed to make full use of system resources.The quick ranking algorithm is optimized by building equivalent keys.In algo-rithm implementation,parallel concurrences are controlled through the number of threads by using multi-threading in task handling.Through integrative use of such technologies,the ranking performance of the algorithm is ap-proached the theoretical limit.It is even possible to go beyond the limit,that is,completing ranking of more than 500 Gbyte disk data in 2000 s,by using asynchronous compression technology when there is adequate CPU (Central Processing Unit)resource.Utilizing this algorithm in database design will facilitate separation of connection and thread and the database will be able to support an even larger number of connections,thus increasing concurrences supported by the database.

关 键 词:国产数据库 海量数据 大数据 排序算法 自主可控 等价排序键 

分 类 号:V556[航空宇航科学与技术—人机与环境工程] TP311.12[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象