机构地区:[1]School of Computer Science, Fudan University [2]Institute of Parallel and Distributed Systems, Shanghai Jiao Tong University
出 处:《Science China(Information Sciences)》2017年第12期203-219,共17页中国科学(信息科学)(英文版)
摘 要:The increasing demand for performance has stimulated the wide adoption of many-core accelerators like IntelR Xeon PhiTMCoprocessor, which is based on Intel's Many Integrated Core architecture. While many HPC applications running in native mode have been tuned to run efficiently on Xeon Phi, it is still unclear how a managed runtime like JVM performs on such an architecture. In this paper, we present the first measurement study of a set of Java HPC applications on Xeon Phi under JVM. One key obstacle to the study is that there is currently little support of Java for Xeon Phi. This paper presents the result based on the first porting of Open JDK platform to Xeon Phi, in which the Hot Spot virtual machine acts as the kernel execution engine. The main difficulty includes the incompatibility between Xeon Phi ISA and the assembly library of Hotspot VM.By evaluating the multithreaded Java Grande benchmark suite and our ported Java Phoenix benchmarks, we quantitatively study the performance and scalability issues of JVM on Xeon Phi and draw several conclusions from the study. To fully utilize the vector computing capability and hide the significant memory access latency on the coprocessor, we present a semi-automatic vectorization scheme and software prefetching model in Hot Spot.Together with 60 physical cores and tuning, our optimized JVM achieves averagely 2.7 x and 3.5 x speedup compared to Xeon CPU processor by using vectorization and prefetching accordingly. Our study also indicates that it is viable and potentially performance-beneficial to run applications written for such a managed runtime like JVM on Xeon Phi.The increasing demand for performance has stimulated the wide adoption of many-core accelerators like IntelR Xeon PhiTMCoprocessor, which is based on Intel's Many Integrated Core architecture. While many HPC applications running in native mode have been tuned to run efficiently on Xeon Phi, it is still unclear how a managed runtime like JVM performs on such an architecture. In this paper, we present the first measurement study of a set of Java HPC applications on Xeon Phi under JVM. One key obstacle to the study is that there is currently little support of Java for Xeon Phi. This paper presents the result based on the first porting of Open JDK platform to Xeon Phi, in which the Hot Spot virtual machine acts as the kernel execution engine. The main difficulty includes the incompatibility between Xeon Phi ISA and the assembly library of Hotspot VM.By evaluating the multithreaded Java Grande benchmark suite and our ported Java Phoenix benchmarks, we quantitatively study the performance and scalability issues of JVM on Xeon Phi and draw several conclusions from the study. To fully utilize the vector computing capability and hide the significant memory access latency on the coprocessor, we present a semi-automatic vectorization scheme and software prefetching model in Hot Spot.Together with 60 physical cores and tuning, our optimized JVM achieves averagely 2.7 x and 3.5 x speedup compared to Xeon CPU processor by using vectorization and prefetching accordingly. Our study also indicates that it is viable and potentially performance-beneficial to run applications written for such a managed runtime like JVM on Xeon Phi.
关 键 词:many-core Java Xeon Phi HPC prefetching
分 类 号:TP38[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...