官术网_书友最值得收藏!

Using indexes in an intelligent way

So far, applying an index feels like the Holy Grail, always improving performance magically. However, this is not the case. Indexes can also be pretty pointless in some cases.

Before digging into things more deeply, here is the data structure we have used for this example. Remember that there are only two distinct names and unique IDs:

test=# \d t_test 
Table "public.t_test"
Column | Type | Modifiers
--------+---------+------------------------------------
id | integer | not null default nextval('t_test_id_seq'::regclass)
name | text |
Indexes:
"idx_id" btree (id)

At this point, one index has been defined, which covers the id column. In the next step, the name column will be queried. Before doing this, an index on the name will be created:

test=# CREATE INDEX idx_name ON t_test (name);  
CREATE INDEX 

Now, it is time to see if the index is used correctly; consider the following code block:

test=# EXPLAIN SELECT * FROM t_test WHERE name = 'hans2'; 
QUERY PLAN
-----------------------------------------------------
Index Scan using idx_name on t_test
(cost=0.43..4.45 rows=1 width=9)
Index Cond: (name = 'hans2'::text)
(2 rows)

As expected, PostgreSQL will decide on using the index. Most users would expect this. But note that my query says hans2. Remember, hans2 does not exist in the table and the query plan perfectly reflects this. rows=1 indicates that the planner only expects a very small subset of data being returned by the query.

There is not a single row in the table, but PostgreSQL will never estimate zero rows because it would make subsequent estimations a lot harder, since useful cost calculations of other nodes in the plan would be close to impossible.

Let's see what happens if we look for more data:

test=# EXPLAIN SELECT * 
FROM t_test
WHERE name = 'hans'
OR name = 'paul';
QUERY PLAN
---------------------------------------------------------------
Seq Scan on t_test (cost=0.00..81622.00 rows=3000011 width=9)
Filter: ((name = 'hans'::text) OR (name = 'paul'::text))
(2 rows)

In this case, PostgreSQL will go for a straight sequential scan. Why is that? Why is the system ignoring all indexes? The reason is simple: hans and paul make up the entire dataset because there are no other values (PostgreSQL knows that by checking the system statistics). Therefore, PostgreSQL figures that the entire table has to be read anyway. There is no reason to read all of the index and the full table if reading the table is sufficient.

In other words, PostgreSQL will not use an index just because there is one. PostgreSQL will use indexes when they make sense. If the number of rows is smaller, PostgreSQL will again consider bitmap scans and normal index scans:

test=# EXPLAIN SELECT * 
FROM t_test
WHERE name = 'hans2'
OR name = 'paul2';

QUERY PLAN
---------------------------------------------------------------------
Bitmap Heap Scan on t_test (cost=8.88..12.89 rows=1 width=9)
Recheck Cond: ((name = 'hans2'::text) OR (name = 'paul2'::text))
-> BitmapOr (cost=8.88..8.88 rows=1 width=0)
-> Bitmap Index Scan on idx_name
(cost=0.00..4.44 rows=1 width=0)

Index Cond: (name = 'hans2'::text)
-> Bitmap Index Scan on idx_name
(cost=0.00..4.44 rows=1 width=0)

Index Cond: (name = 'paul2'::text)

The most important point to learn here is that execution plans depend on input values.

They are not static and not independent of the data inside the table. This is a very important observation, which has to be kept in mind at all times. In real-world examples, the fact that plans change can often be the reason for unpredictable runtimes.

主站蜘蛛池模板: 水城县| 顺义区| 深州市| 桃园市| 崇信县| 景东| 南充市| 吉安县| 宁武县| 金山区| 绥江县| 兴国县| 阿合奇县| 美姑县| 余姚市| 苍梧县| 社旗县| 永登县| 阜新| 徐汇区| 华池县| 通道| 武山县| 玛纳斯县| 竹北市| 石台县| 泰顺县| 泰兴市| 漳平市| 长海县| 万全县| 车险| 达日县| 垦利县| 平乡县| 寿阳县| 延安市| 钦州市| 宣城市| 南江县| 宽城|