The Method of Automatic Construction of Training Collections for the Task of Abstractive Summarization of News Articlesстатья
Статья опубликована в журнале из списка RSCI Web of Science
Информация о цитировании статьи получена из
Scopus
Статья опубликована в журнале из перечня ВАК
Статья опубликована в журнале из списка Web of Science и/или Scopus
Дата последнего поиска статьи во внешних источниках: 10 апреля 2024 г.
Аннотация:Creating a collection of examples for training abstractive summarization systems is a costly process owing to the high time costs and high requirements for the qualification of experts necessary for writing highquality summaries. A new method of creating collections for training neural summarization methods is proposed—ClusterVote, designed to simulate the features of the task by taking into account information in related documents. The method can be used to form abstractive summaries of various levels of detail, as well as to obtain extractive summaries. Using the ClusterVote method, a new collection was formed in English and Russian to train the news article summarization systems—Telegram NewsCV. Experimental results show that, under certain parameters, the collections formed by ClusterVote have similar extractive characteristics with such well-known datasets as CNN/Daily Mail and at the same time have higher indicators of “factuality”— reproduction in summaries of named entities of source texts, as well as their relationships