書名： Mastering PostgreSQL 12
作者名： Hans Jürgen Sch?nig
本章字數： 605字
更新時間： 2021-08-20 10:00:25

Digging into the PostgreSQL cost model

If only one CPU is used, the execution plan will look like this:

test=# EXPLAIN SELECT * FROM t_test WHERE id = 432332; 
                        QUERY PLAN 
---------------------------------------------------------- 
 Seq Scan on t_test (cost=0.00..71622.00 rows=1 width=9) 
   Filter: (id = 432332) 
(2 rows)

PostgreSQL will sequentially read (sequential scan) the entire table and apply the filter. It expects the operation to cost 71622 penalty points. Now, what does this mean? Penalty points (or costs) are mostly an abstract concept. They are needed to compare different ways to execute a query. If a query can be executed by the executor in many different ways, PostgreSQL will decide on the execution plan by promising the lowest cost possible. The question now is, how did PostgreSQL end up with 71622 points?

Here is how it works:

test=# SELECT pg_relation_size('t_test') / 8192.0; 
      ?column?
--------------------
 21622.000000
(1 row)

The pg_relation_size function will return the size of the table in bytes. Given this example, you can see that the relation consists of 21622 blocks (8,000 each). According to the cost model, PostgreSQL will add a cost of one for each block it has to read sequentially.

The configuration parameter to influence that is as follows:

test=# SHOW seq_page_cost; 
 seq_page_cost
---------------
 1
(1 row)

However, reading a couple of blocks from a disk is not everything we have to do. It is also necessary to apply the filter and to send these rows through a CPU. Two parameters shown in the block shown here account for these costs:

test=# SHOW cpu_tuple_cost; 
 cpu_tuple_cost
----------------
 0.01
(1 row)
test=# SHOW cpu_operator_cost; 
 cpu_operator_cost
-------------------
 0.0025
(1 row)

This leads to the following calculation:

test=# SELECT 21622*1 + 4000000*0.01 + 4000000*0.0025; 
   ?column?
------------
 71622.0000
(1 row)

As you can see, this is exactly the number that's shown in the plan. Costs will consist of a CPU part and an I/O part, which will all be turned into a single number. The important thing here is that costs have nothing to do with real execution, so it is impossible to translate costs into milliseconds. The number the planner comes up with is really just an estimate.

Of course, there are some more parameters outlined in this brief example. PostgreSQL also has special parameters for index-related operations, as follows:

random_page_cost = 4: If PostgreSQL uses an index, there is usually a lot of random I/O involved. On traditional spinning disks, random reads are much more important than sequential reads, so PostgreSQL will account for them accordingly. Note that, on SSDs, the difference between random and sequential reads does not exist anymore, so it can make sense to set random_page_cost = 1 in the postgresql.conf file.
cpu_index_tuple_cost = 0.005: If indexes are used, PostgreSQL will also consider that there is some CPU cost invoiced.

If you are utilizing parallel queries, there are even more cost parameters:

parallel_tuple_cost = 0.1: This defines the cost of transferring one tuple from a parallel worker process to another process. It basically accounts for the overhead of moving rows around inside the infrastructure.
parallel_setup_cost = 1000.0: This adjusts the costs of firing up a worker process. Of course, starting processes to run queries in parallel is not free, and so this parameter tries to model those costs associated with process management.
min_parallel_tables_scan_size = 8 MB: This defines the minimum size of a table that's considered for parallel queries. The larger a table grows, the more CPUs PostgreSQL will use. The size of the table has to triple to allow for one more worker process.
min_parallel_index_scan_size = 512kB: This defines the size of an index, which is necessary to consider a parallel scan.

官术网_书友最值得收藏!

Mastering PostgreSQL 12

Digging into the PostgreSQL cost model