ИСТИНА |
Войти в систему Регистрация |
|
ИПМех РАН |
||
The goal of the Octotron project is to design an approach that guarantees the reliable autonomous operation of supercomputers. The approach is based on a formal model of a supercomputer that describes the proper functioning of its components and their interconnections. The supercomputer compares continually its current state with the information in the model. If the reality (current supercomputer state) deviates from the theory (the supercomputer model), Octotron performs one of the predefined actions: notifying administrators via email and/or SMS, disabling malfunctioning services, restarting software, etc. This approach guarantees not only reliable operation of the existing fleet of systems at a supercomputing center, but also ensures really high-quality maintenance when moving to a new generation of machines. Indeed, once an emergency situation arises, it is reflected in the model, along with the root causes and symptoms of its occurrence, and an adequate reaction is programmed into the model.