检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]广西大学计算机与电子信息学院,广西南宁530004
出 处:《小型微型计算机系统》2011年第5期860-866,共7页Journal of Chinese Computer Systems
基 金:国家自然科学基金项目(60963001)资助;广西研究生教育创新计划项目资助;广西高校人才小高地建设创新团队计划项目(桂教人[2007]71号)资助;广西大学拨尖创新项目资助
摘 要:提出"延迟隐藏"的数据预取模型,实现计算与访存的重叠操作,以达到共享二级缓存零缺失;给出"基本块"的概念,以简化算法的数据结构和减少存储开销;按基本块连续存储方式存储矩阵元素,从存储层次上优化算法,显著地减少页表缓冲缺失;采取非递归调度基本块的策略,充分利用多核计算机的共享二级缓存来减少访问主存的次数,并且不局限于某种特定的存储结构,实现算法缓存无关.多核计算机上的实验结果表明,给出的非递归计算矩阵乘积的线程级并行算法高效、可扩展.To achieve zero-loss in shared L2 cache,a delay hidden data prefetching model for supporting in parallel computation and access memory is presented,the concept of basic block of matrix is defined and the matrix is divided into sub-matrices according to the size of basic block in order to simplify data structures of the algorithm and reduce the required storage overhead.The matrix elements are continuously arranged with the storage mode of basic block and the algorithm is optimized on the storage level,and the Translation Lookaside Buffer(TLB) missing can be significantly reduced.A non-recursive strategy for scheduling basic blocks is proposed and the shared L2 cache on multi-core computers is fully utilized to reduce the number of accessing the main memory.The presented computing matrix multiplication algorithm is not limited to the particular storage structure and it is cache oblivious.The experiments on the multi-core computer show that the non-recursive and thread-level parallel algorithm for matrix multiplication is efficient and scalable.
关 键 词:多核计算机 矩阵乘积 并行算法 延迟隐藏 缓存无关
分 类 号:TP338[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.185