Feature selection for OPLS discriminant analysis of cancer tissue lipidomics dataстатья
Информация о цитировании статьи получена из
Web of Science,
Scopus
Статья опубликована в журнале из списка Web of Science и/или Scopus
Дата последнего поиска статьи во внешних источниках: 26 января 2022 г.
Аннотация:The mass spectrometry‐based molecular profiling can be used for better differentia-tion between normal and cancer tissues and for the detection of neoplastic transfor-mation, which is of great importance for diagnostics of a pathology, prognosis of itsevolution trend, and development of a treatment strategy. The aim of the presentstudy is the evaluation of tissue classification approaches based on various data setsderived from the molecular profile of the organic solvent extracts of a tissue. A set ofpossibilities are considered for the orthogonal projections to latent structures dis-criminant analysis: all mass spectrometric peaks over 300 counts threshold, subsetof peaks selected by ranking with support vector machine algorithm, peaks selectedby random forest algorithm, peaks with the statistically significant difference of theintensity determined by the Mann‐Whitney U test, peaks identified as lipids, and bothidentified and significantly different peaks. The best predictive potential is obtainedfor OPLS‐DA model built on nonpolar glycerolipids (Q2= 0.64, area under curve[AUC] = 0.95); the second one is OPLS‐DA model with lipid peaks selected by ran-dom forest algorithm (Q2= 0.58, AUC = 0.87). Moreover, models based on particularmolecular classes are more preferable from biological point of view, resulting in newexplanatory mechanisms of pathophysiology and providing a pathway analysis.Another promising features for OPLS‐DA modeling are phosphatidylethanolamines(Q2= 0.48, AUC = 0.86).