ИСТИНА |
Войти в систему Регистрация |
|
ИПМех РАН |
||
The paper describes implementation approaches to large-graph pro- cessing on two modern high-performance computational platforms: NVIDIA GPU and Intel KNL. The described approach is based on a deep a priori analy- sis of algorithm properties that helps to choose implementation method correct- ly. To demonstrate the proposed approach, shortest paths and strongly connect- ed components computation problems have been solved for sparse graphs. The results include detailed description of the whole algorithm’s development cycle: from algorithm information structure research and selection of efficient imple- mentation methods, suitable for the particular platforms, to specific optimiza- tions for each of the architectures. Based on the joint analysis of algorithm properties and architecture features, a performance tuning, including graph stor- age format optimizations, efficient usage of the memory hierarchy and vectori- zation is performed. The developed implementations demonstrate high perfor- mance and good scalability of the proposed solutions. In addition, a lot of atten- tion was paid to profiling implemented algorithms with NVIDIA Visual Profiler and Intel® VTune TM Amplifier utilities. This allows current paper to present a fair comparison, demonstrating advantages and disadvantages of each platform for large-scale graph processing.