一种基于异步I/O的磁盘向量检索算法  

An Asynchronous I/O-Based Disk Vector Retrieval Algorithm

在线阅读下载全文

作  者:吴松林 田春岐[1] 毕枫林 

机构地区:[1]同济大学电子与信息工程学院,上海 [2]华东师范大学数据科学与工程学院,上海

出  处:《计算机科学与应用》2024年第1期68-77,共10页Computer Science and Application

摘  要:在机器学习与深度学习等领域中,近似最近邻搜索(ANNS)扮演着至关重要的角色,在近十多年来受到越来越多研究者的关注。传统的ANNS算法需要将向量原始数据和索引数据全部存储在内存中,这限制了其处理大数据的能力。本文提出了一种创新的基于异步I/O的磁盘向量近似最近邻搜索算法(AIO-ANN),该算法通过生成非阻塞I/O请求并即刻处理完成的请求,有效提升了搜索效率并降低了高延迟I/O请求的负面影响。在搜索过程中,AIO-ANN生成一批非阻塞I/O请求,并立即处理每个完成的I/O请求,而不是等待整批请求完成。同时为了最大化I/O等待时间的利用,算法将大部分计算任务转移到了I/O等待期间。此外,算法还整合了其他优化措施,如数据缓存与结果初始化。在大规模数据集上的实验,证明了AIO-ANN在搜索速度上超越了主流的ANNS算法DiskANN。In the fields of machine learning and deep learning, Approximate Nearest Neighbor Search (ANNS) plays a crucial role and has garnered increasing attention from researchers over the past decade. Traditional ANNS algorithms require storing all vector raw data and index data in memory, limiting their ability to process large datasets. This paper introduces an innovative Asynchronous I/O-based Disk Vector Approximate Nearest Neighbor Search algorithm (AIO-ANN), which enhances search ef-ficiency and reduces the negative impact of high-latency I/O requests by generating non-blocking I/O requests and immediately processing completed requests. During the search process, AIO-ANN generates a batch of non-blocking I/O requests and processes each completed request immediately, rather than waiting for the entire batch to complete. To maximize the utilization of I/O waiting time, the algorithm shifts the majority of computational tasks to these waiting periods. Additionally, the algorithm incorporates other optimization measures, such as data caching and result initialization. Experiments on large-scale datasets have proven that AIO-ANN surpasses mainstream ANNS algo-rithms like DiskANN in search speed.

关 键 词:向量检索 异步I/O 

分 类 号:TP3[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象