ИСТИНА |
Войти в систему Регистрация |
|
ИПМех РАН |
||
This talk presents set of advances which significantly improves GIN index. Their primary target is to make full-text search (FTS) in PostgreSQL to be as fast as it's in stand-alone solutions such as Sphinx and Solr. However it has many other applications. The set of advances is following: Compression of item pointers in index Store additional information in posting trees and posting lists Fast scan: skip parts of posting trees during scan Sorting result in index These advances in GIN leads to following benefits to GIN indexes: Indexes will become about 2 time smaller without any work in opclass. Usage of additional information for filtering enables new features for GIN opclasses: better phrase search, better array similarity search, inverse FTS search (search for tsqueries matching tsvector), inverse regex search (search for regexes matching string), better string similarity using positioned n-grams. Fast scan dramatically GIN search in "frequest_term & rare_term" case. Usage of additional information for sorting in index accelerates ranking in FTS and dramatically reduces its IO. We present the results of benchmarks for FTS using several datasets (6 M and 15 M documents) and real-life load for PostgreSQL and Sphinx full-text search engines and demonstrate that improved PostgreSQL FTS (with all ACID overhead) outperforms the standalone Sphinx search engine.