Аннотация:The development of new methods for analysis of data having huge dimensions is of great importance. For example a challenging problem is to find the genetic and non-genetic (or environmental) factors which could increase the risk of complex diseases such as diabetes, myocardial infarction and others. In this regard recall that human genom contains more than
milliard nucleotide bases. The vast research domain called the genome-wide association studies (GWAS) requires new techniques for handling large arrays of biostatistical data.
The plan of the talk is as follows. After the brief introduction we concentrate on the modern methods such as multifactor dimensionality reduction (MDR) and its modications, logic regression and machine learning. We deal with optimization problems for random functions dened on various graphs. The model selection is discussed as well. We apply also K-fold cross validation and permutation tests. Along with survey we present our quite recent results. We propose the basis for application of the MDR-method when one uses an arbitrary penalty function to describe the prediction error of the binary response variable by means of a function in factors. We also establish the asymptotic normality of appropriately normalized statistics used to justify the optimal choice of a subcollection of the explanatory variables. Moreover, we consider self-normalization in this variant of CLT.