- Mastering PostgreSQL 12
- Hans Jürgen Sch?nig
- 605字
- 2021-08-20 10:00:25
Digging into the PostgreSQL cost model
If only one CPU is used, the execution plan will look like this:
test=# EXPLAIN SELECT * FROM t_test WHERE id = 432332;
QUERY PLAN
----------------------------------------------------------
Seq Scan on t_test (cost=0.00..71622.00 rows=1 width=9)
Filter: (id = 432332)
(2 rows)
PostgreSQL will sequentially read (sequential scan) the entire table and apply the filter. It expects the operation to cost 71622 penalty points. Now, what does this mean? Penalty points (or costs) are mostly an abstract concept. They are needed to compare different ways to execute a query. If a query can be executed by the executor in many different ways, PostgreSQL will decide on the execution plan by promising the lowest cost possible. The question now is, how did PostgreSQL end up with 71622 points?
Here is how it works:
test=# SELECT pg_relation_size('t_test') / 8192.0;
?column?
--------------------
21622.000000
(1 row)
The pg_relation_size function will return the size of the table in bytes. Given this example, you can see that the relation consists of 21622 blocks (8,000 each). According to the cost model, PostgreSQL will add a cost of one for each block it has to read sequentially.
The configuration parameter to influence that is as follows:
test=# SHOW seq_page_cost; seq_page_cost
---------------
1
(1 row)
However, reading a couple of blocks from a disk is not everything we have to do. It is also necessary to apply the filter and to send these rows through a CPU. Two parameters shown in the block shown here account for these costs:
test=# SHOW cpu_tuple_cost;
cpu_tuple_cost
----------------
0.01
(1 row)
test=# SHOW cpu_operator_cost;
cpu_operator_cost
-------------------
0.0025
(1 row)
This leads to the following calculation:
test=# SELECT 21622*1 + 4000000*0.01 + 4000000*0.0025;
?column?
------------
71622.0000
(1 row)
As you can see, this is exactly the number that's shown in the plan. Costs will consist of a CPU part and an I/O part, which will all be turned into a single number. The important thing here is that costs have nothing to do with real execution, so it is impossible to translate costs into milliseconds. The number the planner comes up with is really just an estimate.
Of course, there are some more parameters outlined in this brief example. PostgreSQL also has special parameters for index-related operations, as follows:
- random_page_cost = 4: If PostgreSQL uses an index, there is usually a lot of random I/O involved. On traditional spinning disks, random reads are much more important than sequential reads, so PostgreSQL will account for them accordingly. Note that, on SSDs, the difference between random and sequential reads does not exist anymore, so it can make sense to set random_page_cost = 1 in the postgresql.conf file.
- cpu_index_tuple_cost = 0.005: If indexes are used, PostgreSQL will also consider that there is some CPU cost invoiced.
If you are utilizing parallel queries, there are even more cost parameters:
- parallel_tuple_cost = 0.1: This defines the cost of transferring one tuple from a parallel worker process to another process. It basically accounts for the overhead of moving rows around inside the infrastructure.
- parallel_setup_cost = 1000.0: This adjusts the costs of firing up a worker process. Of course, starting processes to run queries in parallel is not free, and so this parameter tries to model those costs associated with process management.
- min_parallel_tables_scan_size = 8 MB: This defines the minimum size of a table that's considered for parallel queries. The larger a table grows, the more CPUs PostgreSQL will use. The size of the table has to triple to allow for one more worker process.
- min_parallel_index_scan_size = 512kB: This defines the size of an index, which is necessary to consider a parallel scan.
- Seven NoSQL Databases in a Week
- Mastercam 2017數控加工自動編程經典實例(第4版)
- Managing Mission:Critical Domains and DNS
- Verilog HDL數字系統設計入門與應用實例
- 7天精通Dreamweaver CS5網頁設計與制作
- Mastering D3.js
- Docker Quick Start Guide
- SharePoint 2010開發最佳實踐
- 數據掘金
- 精通LabVIEW程序設計
- Artificial Intelligence By Example
- Excel 2007終極技巧金典
- 中國戰略性新興產業研究與發展·數控系統
- Data Analysis with R(Second Edition)
- JSP通用范例開發金典