Text Normalization in Russian Text-to-Speech Synthesis: Taxonomy and Processing of Non-Standard Words - доклад на конференции | ИСТИНА – Интеллектуальная Система Тематического Исследования НАукометрических данных

Автор: Черепанова О.Д.
Международная Конференция : Диалог 2017
Даты проведения конференции: 31 мая - 3 июня 2017
Дата доклада: 3 июня 2017
Тип доклада: Стендовый
Докладчик: не указан
Место проведения: Москва, Russia
Аннотация доклада:
Alongside with ordinary words, natural-language text also contains nonstandard words (NSWs), such as abbreviations, acronyms, dates, phone numbers, currency amounts etc. Before phonetizing these text elements in Text-to-Speech synthesis, it is necessary to normalize them by replacing them with an appropriate ordinary word or word sequence. NSWs are increasingly diverse and most of them require specific normalization rules. In this paper, we present a taxonomy of NSWs for the Russian language developed on the basis of news texts, software and car reviews and instruction manuals. We grouped NSWs that have similar normalization rules or patterns taking into account their graphic form and their context dependence. We propose five main groups of NSWs: abbreviations (including acronyms and initialisms), text elements containing numbers, special characters, foreign words written in the Latin alphabet and mixed-type non-standard words. In this work, we describe these NSW types and address the issue of their normalization in Russian Text-to-Speech synthesis.
Добавил в систему: Черепанова Ольга Дмитриевна

	ИСТИНА	Войти в систему Регистрация
	ИПМех РАН
	Главная Поиск Статистика О проекте Помощь

ИСТИНА

ИПМех РАН

Text Normalization in Russian Text-to-Speech Synthesis: Taxonomy and Processing of Non-Standard Wordsдоклад на конференции