官术网_书友最值得收藏!

Using indexes in an intelligent way

So far, applying an index feels like the Holy Grail, which always improves performance magically. However, this is not the case. Indexes can also be pretty pointless in some cases.

Before digging into things more deeply, here is the data structure we are using for this example. Remember that there are only two distinct names and unique IDs:

test=# \d t_test 
Table "public.t_test"
Column | Type | Modifiers
--------+---------+------------------------------------
id | integer | not null default nextval('t_test_id_seq'::regclass)
name | text |
Indexes:
"idx_id" btree (id)

At this point, one index has been defined, which covers the id column. In the next step, the name column will be queried. Before doing that, an index on name will be created:

test=# CREATE INDEX idx_name ON t_test (name); 
CREATE INDEX

Now it is time to see if the index is used correctly:

test=# EXPLAIN SELECT * FROM  t_test WHERE name = 'hans2'; 
QUERY PLAN
-------------------------------------------------------------
Index Scan using idx_name on t_test (cost=0.43..4.45 rows=1 width=9)
Index Cond: (name = 'hans2'::text)
(2 rows)

As expected, PostgreSQL will decide on using the index. Most users would expect that. But note that my query says hans2. Remember: hans2 does not exist in the table and the query plan perfectly reflects this. rows=1 indicates that the planner only expects a very small subset of data being returned by the query.

There is not a single row in the table, but PostgreSQL will never estimate zero rows because it would make subsequent estimations a lot harder.

Let us see what happens if we look for more data:

test=# EXPLAIN SELECT * FROM  t_test WHERE name = 'hans' OR name = 'paul';
QUERY PLAN
--------------------------------------------------------------
Seq Scan on t_test (cost=0.00..81622.00 rows=3000011 width=9)
Filter: ((name = 'hans'::text) OR (name = 'paul'::text))
(2 rows)

In this case, PostgreSQL will go for a straight sequential scan. Why is that? Why is the system ignoring all indexes? The reason is simple: hans and paul make up the entire dataset because there are no other values. Therefore, PostgreSQL figures that the entire table has to be read anyway. There is no reason to read all of the index and the full table if reading just the table is sufficient.

In other words, PostgreSQL will not use an index just because there is one. PostgreSQL will use indexes when they make sense. If the number of rows is smaller, PostgreSQL will again consider bitmap scans and normal index scans:

test=# EXPLAIN SELECT * FROM  t_test WHERE name = 'hans2' OR name = 'paul2'; 
QUERY PLAN
----------------------------------------------------------
Bitmap Heap Scan on t_test (cost=8.88..12.89 rows=1 width=9)
Recheck Cond: ((name = 'hans2'::text) OR (name = 'paul2'::text))
-> BitmapOr (cost=8.88..8.88 rows=1 width=0)
-> Bitmap Index Scan on idx_name (cost=0.00..4.44 rows=1 width=0)
Index Cond: (name = 'hans2'::text)
-> Bitmap Index Scan on idx_name (cost=0.00..4.44 rows=1 width=0)
Index Cond: (name = 'paul2'::text)

The most important point to learn here is that execution plans depend on input values. They are not static and not independent of the data inside the table. This is a very important observation, which has to be kept in mind all the time. In real-world examples, the fact that plans change can often be the reason for unpredictable runtimes.

主站蜘蛛池模板: 贺州市| 驻马店市| 台北县| 马关县| 容城县| 凤庆县| 乌海市| 盱眙县| 青州市| 大丰市| 峨眉山市| 聊城市| 兴和县| 隆化县| 思茅市| 佳木斯市| 张家口市| 巫溪县| 通江县| 舞阳县| 鄂伦春自治旗| 蒙阴县| 齐河县| 赤城县| 莱州市| 台东县| 股票| 康平县| 凤城市| 仪征市| 大化| 衡山县| 于都县| 栾川县| 平泉县| 井陉县| 二连浩特市| 永新县| 偃师市| 潮安县| 吴川市|