官术网_书友最值得收藏!

Understanding full-text search

If you are looking up names or looking for simple strings, you are usually querying the entire content of a field. In full-text search, this is different. The purpose of the full-text search is to look for words or groups of words that can be found in a text. Therefore, full-text search is more of a contains operation, as you are basically never looking for an exact string.

In PostgreSQL, full-text search can be done using GIN indexes. The idea is to dissect a text, extract valuable lexemes (= "preprocessed tokens of words"), and index those elements rather than the underlying text. To make your search even more successful, those words are preprocessed.

Here is an example:

test=# SELECT to_tsvector('english', 
'A car, I want a car. I would not even mind
having many cars');
to_tsvector
---------------------------------------------------------------
'car':2,6,14 'even':10 'mani':13 'mind':11 'want':4 'would':8
(1 row)

This example shows a simple sentence. The to_tsvector function will take the string, apply English rules, and perform a stemming process. Based on the configuration (english), PostgreSQL will parse the string, throw away stop words, and stem individual words. For example, car and cars will be transformed to car. Note that this is not about finding the word stem. In the case of many, PostgreSQL will simply transform the string into mani by applying standard rules that work nicely with the English language.

Note that the output of the to_tsvector function is highly language-dependent. If you tell PostgreSQL to treat the string as dutch, the result will be totally different:

test=# SELECT to_tsvector('dutch', 'A car, I want a car. I would not even mind having many cars'); 
to_tsvector
-----------------------------------------------------------------
'a':1,5 'car':2,6,14 'even':10 'having':12 'i':3,7 'many':13
'mind':11 'not':9 'would':8
(1 row)

To figure out which configurations are supported, consider running the following query:

SELECT cfgname FROM pg_ts_config; 

Let's now compare the strings.

主站蜘蛛池模板: 潞西市| 西盟| 沛县| 封丘县| 开平市| 封开县| 大新县| 探索| 铁岭市| 基隆市| 台南市| 尉氏县| 大庆市| 集安市| 广南县| 泸州市| 阿巴嘎旗| 洪雅县| 上虞市| 望城县| 兴仁县| 中江县| 榆社县| 垣曲县| 通化市| 开原市| 西藏| 旬邑县| 沙田区| 长葛市| 关岭| 广东省| 衡南县| 滦平县| 广安市| 栾城县| 五台县| 和平县| 河东区| 淮北市| 五家渠市|