- Mastering PostgreSQL 9.6
- Hans Jurgen Schonig
- 563字
- 2021-07-09 19:57:25
Making use of ordered sets
Ordered sets are a powerful feature, which is not widely regarded and not widely known in the developer community. The idea is actually quite simple: data is grouped normally and then the data inside each group is ordered given a certain condition. The calculation is then performed on this sorted data.
A classical example would be the calculation of the median.
One way to get the median is to take sorted data and move 50% into the dataset. This is an example of what the WITHIN GROUP clause will ask PostgreSQL to do:
test=# SELECT region, percentile_disc(0.5) WITHIN GROUP (ORDER BY production) FROM t_oil GROUP BY 1;
region | percentile_disc
---------------+-----------------
Middle East | 1082
North America | 3054
(2 rows)
percentile_disc will skip 50% of the group and return the desired value. Note that the median can significantly deviate from the average. In economics, the deviation between median and average income can even be used as an indicator for social equality or inequality. The higher the median compared to the average, the more the income inequality. To provide more flexibility, the ANSI standard does not just propose a median function. Instead, percentile_disc allows you to use any value between 0 and 1.
The beauty is that you can even use ordered sets along with grouping sets:
test=# SELECT region,
percentile_disc(0.5) WITHIN GROUP (ORDER BY production)
FROM t_oil
GROUP BY ROLLUP (1);
region | percentile_disc
---------------+-----------------
Middle East | 1082
North America | 3054
| 1696
(3 rows)
In this case PostgreSQL will again inject additional lines into the result set.
As proposed by the ANSI SQL standard, PostgreSQL provides you with two percentile_ functions. percentile_disc will return a value, which is really contained by the dataset. percentile_cont will interpolate a value if no exact match is found. The following example shows, how this works:
test=# SELECT
percentile_disc(0.62) WITHIN GROUP (ORDER BY id),
percentile_cont(0.62) WITHIN GROUP (ORDER BY id)
FROM generate_series(1, 5) AS id;
percentile_disc | percentile_cont
-----------------+-----------------
4 | 3.48
(1 row)
4 is a value, which really exists—3.48 has been interpolated.
The percentile_ functions are not the only ones provided by PostgreSQL. To find the most frequent value within a group the mode function is available. Before showing an example of how to use mode function, I have compiled a query telling us a bit more about the content of the table:
test=# SELECT production, count(*)
FROM t_oil
WHERE country = 'Other Middle East'
GROUP BY production
ORDER BY 2 DESC
LIMIT 4;
production | count
------------+-------
50 | 5
48 | 5
52 | 5
53 | 4
(4 rows)
Three different values occur exactly five times. Of course, the mode function can only give us one of them:
test=# SELECT country, mode() WITHIN GROUP (ORDER BY production)
FROM t_oil
WHERE country = 'Other Middle East'
GROUP BY 1;
country | mode
-------------------+------
Other Middle East | 48
(1 row)
The most frequent value is returned but SQL won't tell us how often the number actually shows up. It can even happen that the number only shows up once.
- 最簡數據挖掘
- 機器人創新實訓教程
- C語言開發技術詳解
- RPA(機器人流程自動化)快速入門:基于Blue Prism
- Implementing Oracle API Platform Cloud Service
- Nginx高性能Web服務器詳解
- 單片機C語言程序設計完全自學手冊
- 分析力!專業Excel的制作與分析實用法則
- INSTANT Puppet 3 Starter
- Unreal Development Kit Game Design Cookbook
- FANUC工業機器人虛擬仿真教程
- 信息系統安全保障評估
- 基于Quartus Ⅱ的數字系統Verilog HDL設計實例詳解
- Learning Couchbase
- ARM嵌入式系統開發完全入門與主流實踐