官术网_书友最值得收藏!

Making streaming-only replication more robust

The first thing a slave has to do when connecting to a master is to play catch up. But can this always work? We have already seen that we can use a mixed setup consisting of a streaming-based and a file-based component. This gives us some extra security if streaming does not work.

In many real-world scenarios, two ways of transporting the XLOG might be too complicated. In many cases, it is enough to have just streaming. The point is that in a normal setup, as described already, the master can throw the XLOG away as soon as it is not needed to repair the master anymore. Depending on your checkpoint configuration, the XLOG might be around for quite a while or only a short time. The trouble is that if your slave connects to the master, it might happen that the desired XLOG is not around anymore. The slave cannot resync itself in this scenario. You might find this a little annoying, because it implicitly limits the maximum downtime of your slave to your master's checkpoint behavior.

Two choices are available to solve the problem:

  • wal_keep_segments: Keep some XLOG files on the master
  • Physical replication slots: Teach the master to recycle the XLOG only when it has been consumed

Using wal_keep_segments

To make your setup much more robust, we recommend making heavy use of wal_keep_segments. The idea of this postgresql.conf setting (on the master) is to teach the master to keep more XLOG files around than theoretically necessary. If you set this variable to 1000, it essentially means that the master will keep 16 GB of more XLOG than needed. In other words, your slave can be gone for 16 GB (in terms of changes to the master) longer than usual. This greatly increases the odds that a slave can join the cluster without having to completely resync itself from scratch. For a 500 MB database, this is not worth mentioning, but if your setup has to hold hundreds of gigabytes or terabytes, it becomes an enormous advantage. Producing a base backup of a 20 TB instance is a lengthy process. You might not want to do this too often, and you definitely won't want to do this over and over again.

Tip

If you want to update a large base backup, it might be beneficial to incrementally update it using rsync and the traditional method of taking base backups.

What are the reasonable values for wal_keep_segments? As always, this largely depends on your workloads. From experience, we can tell that a multi-GB implicit archive on the master is definitely an investment worth considering. Very low values for wal_keep_segments might be risky and not worth the effort. Nowadays, pace is usually cheap. Small systems might not need this setting, and large ones should have sufficient spare capacity to absorb the extra requirements. Personally, I am always in favor of using at least some extra XLOG segments.

Utilizing replication slots

With the introduction of PostgreSQL 9.4, a more sophisticated solution to the problem of deleted XLOG has been introduced—physical replication slots. As already outlined earlier in this book, replication slots make sure that the master deletes XLOG only when it has been safely consumed by the replica. In the case of the cleanup problem outlined in this section, this is exactly what is needed here.

The question now is: how can a replication slot be used? Basically, it is very simple. All that has to be done is create the replication slot on the master and tell the slave which slots to use through recovery.conf.

Here is how it works on the master:

postgres=# SELECT * FROM pg_create_physical_replication_slot('repl_slot);
 slot_name | xlog_position
-------------+---------------
 repl_slot |

postgres=# SELECT * FROM pg_replication_slots;
 slot_name | slot_type | datoid | database | active | xmin | restart_lsn
-------------+-----------+--------+----------+--------+------+-------------
 repl_slot | physical | | | f | |
(1 row)

Once the base backup has happened, the slave can be configured easily:

standby_mode = 'on'
primary_conninfo = 'host=master.postgresql-support.de port=5432 user=hans password=hanspass'
primary_slot_name = 'repl_slot'

The configuration is just as if there were no replication slots. The only change is that the primary_slot_name variable has been added. The slave will pass the name of the replication slot to the master, and the master knows when to recycle the transaction log. As mentioned already, if a slave is not in use anymore, make sure that the replication slot is properly deleted to avoid trouble on the master (running out of disk space and other troubles). The problem is that this is incredibly insidious. Slaves, being optional, are not always monitored as they should be. As such, it might be a good idea to recommend that you regularly compare pg_stat_replication with pg_replication_slots for mismatches worthy of further investigation.

主站蜘蛛池模板: 乃东县| 黑龙江省| 会宁县| 淮滨县| 金溪县| 建昌县| 安西县| 安陆市| 桦甸市| 镇远县| 温州市| 云南省| 神木县| 锡林浩特市| 广宗县| 东安县| 永靖县| 延长县| 三门县| 水富县| 陕西省| 南木林县| 门头沟区| 德惠市| 南和县| 克什克腾旗| 桃源县| 当雄县| 清新县| 江口县| 平舆县| 金溪县| 河曲县| 鄂尔多斯市| 磴口县| 新源县| 壤塘县| 阳信县| 高淳县| 屯门区| 信阳市|