- Mastering PostgreSQL 12
- Hans Jürgen Sch?nig
- 619字
- 2021-08-20 10:00:35
Removing duplicates using EXCLUDE TIES and EXCLUDE GROUP
Sometimes, you want to make sure that duplicates don't make it into the result of your windowing function. The EXCLUDE TIES clause helps you to achieve exactly that. If a value shows up in a window twice, it will be removed. This is a neat way to avoid complicated workarounds, which can be costly and slow. The following listing contains a simple example:
SELECT *,
x / 3 AS y,
array_agg(x/3) OVER (ORDER BY x/3
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS rows_1,
array_agg(x/3) OVER (ORDER BY x/3
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING EXCLUDE TIES) AS rows_2 FROM generate_series(1, 10) AS x;
x | y | rows_1 | rows_2
----+---+---------+--------
1 | 0 | {0,0} | {0}
2 | 0 | {0,0,1} | {0,1}
3 | 1 | {0,1,1} | {0,1}
4 | 1 | {1,1,1} | {1}
5 | 1 | {1,1,2} | {1,2}
6 | 2 | {1,2,2} | {1,2}
7 | 2 | {2,2,2} | {2}
8 | 2 | {2,2,3} | {2,3}
9 | 3 | {2,3,3} | {2,3}
10 | 3 | {3,3} | {3}
(10 rows)
I have again used the generate_series function to create data. Using a simple time series is a lot easier than digging through some more complicated real-world data. array_agg will turn all values added to the window into an array. As you can see in the last column, however, the array is a lot shorter. Duplicates have been removed automatically.
In addition to the EXCLUDE TIES clause, PostgreSQL also supports EXCLUDE GROUP. The idea here is that you want to remove an entire set of rows from the dataset before it makes it to the aggregation function. Let's take a look at the following example. We have 4 windowing functions here. The first one is the classical ROWS BETWEEN example you have already seen. I have included this column so that it is easier to spot the differences between the standard and the EXCLUDE GROUP version. What is also important to note here is that the array_agg function is not the only one you can use here—avg or any other window or aggregation function works just fine. I simply used array_agg to make it easier to see what PostgreSQL does. In the following example, you can see that EXCLUDE GROUP removes the entire set of rows:
SELECT *,
x / 3 AS y,
array_agg(x/3) OVER (ORDER BY x/3
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS rows_1,
avg(x/3) OVER (ORDER BY x/3
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS avg_1,
array_agg(x/3) OVER (ORDER BY x/3
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING EXCLUDE GROUP) AS rows_2,
avg(x/3) OVER (ORDER BY x/3
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING EXCLUDE GROUP) AS avg_2
FROM generate_series(1, 10) AS x;
x | y | rows_1 | avg_1 | rows_2 | avg_2
----+---+---------+----------+--------+----------
1 | 0 | {0,0} | 0.000000 | |
2 | 0 | {0,0,1} | 0.333333 | {1} | 1.000000
3 | 1 | {0,1,1} | 0.666666 | {0} | 0.000000
4 | 1 | {1,1,1} | 1.000000 | |
5 | 1 | {1,1,2} | 1.333333 | {2} | 2.000000
6 | 2 | {1,2,2} | 1.666666 | {1} | 1.000000
7 | 2 | {2,2,2} | 2.000000 | |
8 | 2 | {2,2,3} | 2.333333 | {3} | 3.000000
9 | 3 | {2,3,3} | 2.666666 | {2} | 2.000000
10 | 3 | {3,3} | 3.0000000 | |
(10 rows)
The entire group containing the same value is removed. That, of course also impacts the average calculated on top of this result.