ИСТИНА |
Войти в систему Регистрация |
|
ИПМех РАН |
||
Joint Institute of Nuclear Research (JINR) is a word leading multinational research organization in high energy physics, it’s a site for a novel experiment NICA (Nuclotron-based Ion Collider fAсility). The NICA experiment goal is to explore previously unknown properties of quark-gluon plasma, that should be formed after heavy atomic ion nuclei collisions. Data collection for NICA is due to start in 2022, but even preparation phase requires multi-petabyte storage with capability to rapidly process hundred terabyte datasets for collision events simulation and reconstruction. Data flow during experiment is expected from tens to hundreds of GB/sec and more with several PB for one experimental run. Meshcheryakov Laboratory of Information Technology (MLIT) at JINR should provide IT support for NICA, i.e. MLIT should provide data collection, data processing in real time and data processing in off-line mode. Data processing workflows at JINR are different in demands for storage access bandwidth (sequential read and/or write speed) and number of short operations on data or meta-data (IOPS). While some are mostly bandwidth-limited, a large and important fraction is mostly IOPS-limited. Additionally, MLIT should enable simulation that requires high-performance compute resources and vast amounts of main memory (tens of gigabytes per CPU core). Such combination of workloads makes fusion of compute and storage elements to be the most appealing option, with storage-class memory (SCM) devices to be the media for dual purpose: meta-data storage and RAM extension. To address the issues described above, MLIT JINR deployed Govorun system, comprised of Intel Xeon Scalable 2nd generation dual-socket nodes equipped with a mixture of storage devices. With the help of NVMe-over-fabric technology, system administrator can rcreate on-demand storage volumes out of SSDs physically installed on compute nodes (up to 2 2TB M.2 devices per node), specialized storage nodes (up to 12 M.2 2TB or 375 GB Intel Optane devices) or SCM devices on persistent memory nodes. Filesystem options include Lustre, ZFS, NFS and others. Recently, experiments with Intel’s Distributed Asynchronous Object Storage (DAOS) demonstrated substantial performance increase. IO500 benchmark results on meta-data performance grew substantially from Lustre 50-clients run to Intel DAOS 10-client run (see table below), with more profound advantage for more irregular (“hard”) operations. In the both cases, NVM-over-fabric was used for device pooling and client-server communication, the fabric was non-blocking fat-tree topology Intel OmniPath interconnect at 100 Bit/s speed. Operating system was CentOS 7.7 build 1908.