![]() |
ИСТИНА |
Войти в систему Регистрация |
ИПМех РАН |
||
Internet companies use crowdsourcing to collect large amounts of data needed for creating products based on machine learning techniques. A significant source of such labels for OCR data sets is (re) CAPTCHA, which distinguishes humans from automated bots by asking them to recognize text and, at the same time, receives new labeled data in this way. An important component of such approach to data collection is the reduction of noisy labels produced by bots and non-qualified users.