多核计算机上非递归并行计算矩阵乘积被引量：5

Non-recursive Parallel Computation for Matrix Multiplication on Multi-core Computers

出　　处：《小型微型计算机系统》2011年第5期860-866,共7页Journal of Chinese Computer Systems

基　　金：国家自然科学基金项目(60963001)资助;广西研究生教育创新计划项目资助;广西高校人才小高地建设创新团队计划项目(桂教人[2007]71号)资助;广西大学拨尖创新项目资助

摘　　要：提出"延迟隐藏"的数据预取模型,实现计算与访存的重叠操作,以达到共享二级缓存零缺失;给出"基本块"的概念,以简化算法的数据结构和减少存储开销;按基本块连续存储方式存储矩阵元素,从存储层次上优化算法,显著地减少页表缓冲缺失;采取非递归调度基本块的策略,充分利用多核计算机的共享二级缓存来减少访问主存的次数,并且不局限于某种特定的存储结构,实现算法缓存无关.多核计算机上的实验结果表明,给出的非递归计算矩阵乘积的线程级并行算法高效、可扩展.To achieve zero-loss in shared L2 cache,a delay hidden data prefetching model for supporting in parallel computation and access memory is presented,the concept of basic block of matrix is defined and the matrix is divided into sub-matrices according to the size of basic block in order to simplify data structures of the algorithm and reduce the required storage overhead.The matrix elements are continuously arranged with the storage mode of basic block and the algorithm is optimized on the storage level,and the Translation Lookaside Buffer（TLB） missing can be significantly reduced.A non-recursive strategy for scheduling basic blocks is proposed and the shared L2 cache on multi-core computers is fully utilized to reduce the number of accessing the main memory.The presented computing matrix multiplication algorithm is not limited to the particular storage structure and it is cache oblivious.The experiments on the multi-core computer show that the non-recursive and thread-level parallel algorithm for matrix multiplication is efficient and scalable.

关键词：多核计算机矩阵乘积并行算法延迟隐藏缓存无关

分类号：TP338[自动化与计算机技术—计算机系统结构]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

多核计算机上非递归并行计算矩阵乘积被引量：5

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

多核计算机上非递归并行计算矩阵乘积 被引量：5

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

多核计算机上非递归并行计算矩阵乘积被引量：5