CONVERGENCE OF THE ALGORITHM OF ADDITIVE REGULARIZATION OF TOPIC MODELSстатья
Статья опубликована в журнале из списка RSCI Web of Science
Статья опубликована в журнале из перечня ВАК
Статья опубликована в журнале из списка Web of Science и/или Scopus
Дата последнего поиска статьи во внешних источниках: 14 октября 2021 г.
Аннотация:The problem of probabilistic topic modeling is as follows. Given a collection of text documents, find the conditional distribution over topics for each document and the conditional distribution over words or terms for each topic. Log-likelihood maximization is used to solve this problem. The problem has generally an infinite set of solutions, being ill-posed according to Hadamard. In the framework of Additive Regularization of Topic Models (ARTM), a weighted sum of regularization criteria is added to the main log-likelihood criterion. The numerical method for solving this optimization problem is a kind of iterative EM-algorithm. In ARTM it is inferred in a quite general form for an arbitrary smooth regularizer, as well as for a linear combination of smooth regularizers. This paper studies the problem of convergence of the EM iterative process. Sufficient conditions are obtained for the convergence to a stationary point of the regularized log-likelihood. The constraints imposed on the regularizer are not too restrictive. We give their interpretations from the point of view of the practical implementation of the algorithm. A modification of the algorithm is proposed that improves the convergence without additional time and memory costs. Experiments on the news text collection have shown that our modification both accelerates the convergence and improves the value of the criterion to which it converges.