官术网_书友最值得收藏!

Removing duplicates using EXCLUDE TIES and EXCLUDE GROUP

Sometimes, you want to make sure that duplicates don't make it into the result of your windowing function. The EXCLUDE TIES clause helps you to achieve exactly that. If a value shows up in a window twice, it will be removed. This is a neat way to avoid complicated workarounds, which can be costly and slow. The following listing contains a simple example:

SELECT *, 
x / 3 AS y,
array_agg(x/3) OVER (ORDER BY x/3
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS rows_1,
array_agg(x/3) OVER (ORDER BY x/3
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING EXCLUDE TIES) AS rows_2 FROM generate_series(1, 10) AS x;
x | y | rows_1 | rows_2
----+---+---------+--------
1 | 0 | {0,0} | {0}
2 | 0 | {0,0,1} | {0,1}
3 | 1 | {0,1,1} | {0,1}
4 | 1 | {1,1,1} | {1}
5 | 1 | {1,1,2} | {1,2}
6 | 2 | {1,2,2} | {1,2}
7 | 2 | {2,2,2} | {2}
8 | 2 | {2,2,3} | {2,3}
9 | 3 | {2,3,3} | {2,3}
10 | 3 | {3,3} | {3}
(10 rows)

I have again used the generate_series function to create data. Using a simple time series is a lot easier than digging through some more complicated real-world data. array_agg will turn all values added to the window into an array. As you can see in the last column, however, the array is a lot shorter. Duplicates have been removed automatically. 

In addition to the EXCLUDE TIES clause, PostgreSQL also supports EXCLUDE GROUP. The idea here is that you want to remove an entire set of rows from the dataset before it makes it to the aggregation function. Let's take a look at the following example. We have 4 windowing functions here. The first one is the classical ROWS BETWEEN example you have already seen. I have included this column so that it is easier to spot the differences between the standard and the EXCLUDE GROUP version. What is also important to note here is that the array_agg function is not the only one you can use here—avg or any other window or aggregation function works just fine. I simply used array_agg to make it easier to see what PostgreSQL does. In the following example, you can see that EXCLUDE GROUP removes the entire set of rows:

SELECT *, 
x / 3 AS y,
array_agg(x/3) OVER (ORDER BY x/3
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS rows_1,
avg(x/3) OVER (ORDER BY x/3
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS avg_1,
array_agg(x/3) OVER (ORDER BY x/3
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING EXCLUDE GROUP) AS rows_2,
avg(x/3) OVER (ORDER BY x/3
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING EXCLUDE GROUP) AS avg_2
FROM generate_series(1, 10) AS x;
x | y | rows_1 | avg_1 | rows_2 | avg_2
----+---+---------+----------+--------+----------
1 | 0 | {0,0} | 0.000000 | |
2 | 0 | {0,0,1} | 0.333333 | {1} | 1.000000
3 | 1 | {0,1,1} | 0.666666 | {0} | 0.000000
4 | 1 | {1,1,1} | 1.000000 | |
5 | 1 | {1,1,2} | 1.333333 | {2} | 2.000000
6 | 2 | {1,2,2} | 1.666666 | {1} | 1.000000
7 | 2 | {2,2,2} | 2.000000 | |
8 | 2 | {2,2,3} | 2.333333 | {3} | 3.000000
9 | 3 | {2,3,3} | 2.666666 | {2} | 2.000000
10 | 3 | {3,3} | 3.0000000 | |
(10 rows)

The entire group containing the same value is removed. That, of course also impacts the average calculated on top of this result.

主站蜘蛛池模板: 桃源县| 太仆寺旗| 夹江县| 正宁县| 文化| 琼海市| 津市市| 阜康市| 密云县| 河曲县| 安庆市| 磐石市| 景谷| 定陶县| 宁夏| 页游| 林口县| 滦平县| 昆明市| 莱芜市| 开江县| 和静县| 响水县| 望谟县| 梨树县| 肃宁县| 宁武县| 北辰区| 嵩明县| 天等县| 来凤县| 眉山市| 新田县| 南部县| 华容县| 罗源县| 靖宇县| 集贤县| 康平县| 保定市| 襄城县|