检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]同济大学电子与信息工程学院,上海 [2]华东师范大学数据科学与工程学院,上海
出 处:《计算机科学与应用》2024年第1期68-77,共10页Computer Science and Application
摘 要:在机器学习与深度学习等领域中,近似最近邻搜索(ANNS)扮演着至关重要的角色,在近十多年来受到越来越多研究者的关注。传统的ANNS算法需要将向量原始数据和索引数据全部存储在内存中,这限制了其处理大数据的能力。本文提出了一种创新的基于异步I/O的磁盘向量近似最近邻搜索算法(AIO-ANN),该算法通过生成非阻塞I/O请求并即刻处理完成的请求,有效提升了搜索效率并降低了高延迟I/O请求的负面影响。在搜索过程中,AIO-ANN生成一批非阻塞I/O请求,并立即处理每个完成的I/O请求,而不是等待整批请求完成。同时为了最大化I/O等待时间的利用,算法将大部分计算任务转移到了I/O等待期间。此外,算法还整合了其他优化措施,如数据缓存与结果初始化。在大规模数据集上的实验,证明了AIO-ANN在搜索速度上超越了主流的ANNS算法DiskANN。In the fields of machine learning and deep learning, Approximate Nearest Neighbor Search (ANNS) plays a crucial role and has garnered increasing attention from researchers over the past decade. Traditional ANNS algorithms require storing all vector raw data and index data in memory, limiting their ability to process large datasets. This paper introduces an innovative Asynchronous I/O-based Disk Vector Approximate Nearest Neighbor Search algorithm (AIO-ANN), which enhances search ef-ficiency and reduces the negative impact of high-latency I/O requests by generating non-blocking I/O requests and immediately processing completed requests. During the search process, AIO-ANN generates a batch of non-blocking I/O requests and processes each completed request immediately, rather than waiting for the entire batch to complete. To maximize the utilization of I/O waiting time, the algorithm shifts the majority of computational tasks to these waiting periods. Additionally, the algorithm incorporates other optimization measures, such as data caching and result initialization. Experiments on large-scale datasets have proven that AIO-ANN surpasses mainstream ANNS algo-rithms like DiskANN in search speed.
分 类 号:TP3[自动化与计算机技术—计算机科学与技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7