官术网_书友最值得收藏!

  • Mastering PostgreSQL 12
  • Hans Jürgen Sch?nig
  • 370字
  • 2021-08-20 10:00:35

Partitioning data

So far, the same result can also easily be achieved using sub-select. However, if you want more than just the overall average, sub-select will turn your queries into nightmares due to complexity. Suppose you don't just want the overall average but the average of the country you are dealing with. A PARTITION BY clause is what you need:

test=# SELECT country, year, production, consumption, 
avg(production) OVER (PARTITION BY country)
FROM t_oil;
country | year | production | consumption | avg ----------+-------+------------+-------------+----------- Canada | 1965 | 920 | 1108 | 2123.2173 Canada | 2010 | 3332 | 2316 | 2123.2173 Canada | 2009 | 3202 | 2190 | 2123.2173 ... Iran | 1966 | 2132 | 148 | 3631.6956 Iran | 2010 | 4352 | 1874 | 3631.6956 Iran | 2009 | 4249 | 2012 | 3631.6956 ...

The point here is that each country will be assigned to the average of the country. The OVER clause defines the window we are looking at. In this case, the window is the country the row belongs to. In other words, the query returns the rows compared to all rows in this country.

The year column is not sorted. The query does not contain an explicit sort order, so it might be that data is returned in random order. Remember, SQL does not promise sorted output unless you explicitly state what you want.

Basically, a PARTITION BY clause takes any expression. Usually, most people will use a column to partition the data. Here is an example:

test=# SELECT year, production, 
              avg(production) OVER (PARTITION BY year < 1990)  
FROM  t_oil 
WHERE  country = 'Canada'  
ORDER  BY year; 
 year  | production |       avg 
-------+------------+----------------------- 
 1965  |        920 | 1631.6000000000000000 
 1966  |       1012 | 1631.6000000000000000 
 ... 
 1990  |       1967 | 2708.4761904761904762 
 1991  |        1983| 2708.4761904761904762 
 1992  |        2065| 2708.4761904761904762 
 ... 

The point is that data is split using an expression. year < 1990 can return two values: true or false. Depending on the group a year is in, it will be assigned to the pre-1990 average or to the post-1990 average. PostgreSQL is really flexible here. Using functions to determine group membership is not uncommon in real-world applications.

主站蜘蛛池模板: 通海县| 宁阳县| 酒泉市| 通许县| 惠州市| 长治市| 林周县| 遂昌县| 上栗县| 通海县| 色达县| 旌德县| 蒙山县| 汶上县| 翼城县| 泰安市| 京山县| 福安市| 东方市| 华亭县| 台前县| 灌云县| 大姚县| 乐山市| 手游| 青河县| 泗洪县| 岗巴县| 保康县| 抚远县| 马尔康县| 莱芜市| 屯昌县| 托克逊县| 万荣县| 宜昌市| 韶山市| 增城市| 兴仁县| 灵山县| 资溪县|