- Mastering PostgreSQL 12
- Hans Jürgen Sch?nig
- 358字
- 2021-08-20 10:00:30
BRIN indexes
Block range indexes (BRINs) are of great practical use. All of the indexes we've discussed up until now need quite a lot of disk space. Although a lot of work has gone into shrinking GIN indexes and the like, they still need quite a lot of disk space because an index pointer is needed for each entry. So, if there are 10 million entries, there will be 10 million index pointers. Space is the main concern that's addressed by BRIN indexes. A BRIN does not keep an index entry for each tuple, but will store the minimum and the maximum value of 128 (default) blocks of data (1 MB). The index is therefore very small but lossy. Scanning the index will return more data than we asked for. PostgreSQL has to filter out these additional rows in a later step.
The following example demonstrates how small a BRIN index really is:
test=# CREATE INDEX idx_brin ON t_test USING brin(id);
CREATE INDEX
test=# \di+ idx_brin List of relations
Schema | Name | Type | Owner | Table | Size
--------+----------+-------+-------+--------+-------
public | idx_brin | index | hs | t_test | 48 KB
(1 row)
In my example, the BRIN index is 2,000 times smaller than a standard B-tree. The question naturally arising now is, why don't we always use BRIN indexes? To answer this kind of question, it is important to reflect on the layout of BRIN; the minimum and maximum values for 1 MB are stored. If the data is sorted (high correlation), BRIN is pretty efficient because we can fetch 1 MB of data and scan it, and we are done. However, what if the data is shuffled? In this case, BRIN won't be able to exclude chunks of data anymore because it is very likely that something close to the overall high and the overall low is within 1 MB of data. Therefore, BRIN is mostly made for highly correlated data. In reality, correlated data is quite likely in data warehousing applications. Often, data is loaded every day and therefore dates can be highly correlated.
- AWS:Security Best Practices on AWS
- 網絡布線與小型局域網搭建
- 大數據技術基礎:基于Hadoop與Spark
- 手機游戲程序開發
- Hands-On Data Warehousing with Azure Data Factory
- 統計挖掘與機器學習:大數據預測建模和分析技術(原書第3版)
- Mastering OpenStack(Second Edition)
- Advanced Deep Learning with Keras
- 分布式Java應用
- 網頁設計與制作
- EDA技術及其創新實踐(Verilog HDL版)
- ARM嵌入式開發實例
- Office 2010輕松入門
- 創客機器人實戰:基于Arduino和樹莓派
- Geospatial Data Science Quick Start Guide