Practical machine learning methods for QSPR and QSAR predictions - доклад на конференции | ИСТИНА – Интеллектуальная Система Тематического Исследования НАукометрических данных

Авторы: Alexander Korotcov, Rick Zakharov, Valery Tkachenko, Artem Mitrofanov, Boris Sattarov
Международная Конференция : 255th ACS National Meeting & Exposition
Даты проведения конференции: 18-22 марта 2018
Дата доклада: 18 марта 2018
Тип доклада: Устный
Докладчик: Artem Mitrofanov
Место проведения: Новый Орлеан, United States
Аннотация доклада:
In the past several years machine learning techniques have played an important role and become absolute necessity in the modern drug discovery process. Multiple methods for predicting physicochemical and chemo-biological endpoints have proven their robustness and significantly improve our current state of understanding of molecular features/properties associated with some specific pharmacological features. Despite a good number of drug discovery supporting toolkits and methods available to public, academy and pharma there is a demand to have a tool which can combine mining/curation of the heterogeneous chemical data and multiple sophisticated molecular machine learning algorithms. This kind of toolkit have to be able to train models using a variety of machine learning algorithms with minimum user intervention or/and have access to a ready to use pre-trained models. In this study we have evaluated our toolbox (Open Science Data Repository, currently under development) for data curation and machine learning modelling for drug discovery. Different heterogeneous publicly available datasets related to Tuberculosis, Malaria, Bubonic plaque, Chagas disease, and others have been used to tune and train multiple machine methods including traditional methods such as Naïve Bayes, k-Nearest Neighbors, Random Forest, Boosted Decision Trees, Regularized Logistic Regression, and Support Vector Machines, as well as novel deep learning methods with Neural Networks models of different complexity. A wide range of model evaluation metrics such as Receiver Operating Characteristic, Area Under Curve, F1-score, Cohen’s kappa, Matthews correlation coefficient have been used to evaluate and compare machine learning models performance. A variety of commonly used in cheminformatics molecular descriptors for compounds representation was built in our methods, thus an additional layer of tuning by searching of the best molecular descriptor for a particular model can be used. Most of the models performed pretty well and the developed workflows are ready to be used for QSPR and QSAR. Moreover, all already tuned and trained models from this study are ready to use for public and can be found on https://figshare.com/s/0286924045d50441bf98. We strongly believe that the modern in silico approaches combined with advances in data mining, curation, and machine learning methods will only accelerate the drug discovery processes.
Добавил в систему: Митрофанов Артем Александрович

	ИСТИНА	Войти в систему Регистрация
	ИПМех РАН
	Главная Поиск Статистика О проекте Помощь

ИСТИНА

ИПМех РАН

Practical machine learning methods for QSPR and QSAR predictionsдоклад на конференции