Postgres主备切换及pg_rewind的使用|电子爱好者

admin管理员组
文章数量:1667032

前言：
经过测试，postgres的主备切换后，同步会出现问题，新的备库不会从主库同步数据过来。需要使用pg_rewind重新修复新的备库

参考文档：
http://postgres/docs/10/app-pgrewind.html

-- 主备信息
PG版本：10.15
主库 192.168.2.80 主机名test
备库 192.168.2.81 主机名test1

-- 查看主备库的信息

[root@test /opt/pgpool/bin]$pg_controldata  | grep "Database cluster state"
Database cluster state:               in production
[root@test /opt/pgpool/bin]$
[root@test1 /root]$pg_controldata  | grep "Database cluster state"
Database cluster state:               in archive recovery
[root@test1 /root]$

postgres=# \x
Expanded display is on.
postgres=# select * from pg_stat_replication;
-[ RECORD 1 ]----+------------------------------
pid              | 13456
usesysid         | 16406
usename          | repl
application_name | walreceiver
client_addr      | 192.168.2.81
client_hostname  | 
client_port      | 47797
backend_start    | 2021-08-27 08:13:11.217507+08
backend_xmin     | 
state            | streaming
sent_lsn         | 0/10000350
write_lsn        | 0/10000350
flush_lsn        | 0/10000350
replay_lsn       | 0/10000350
write_lag        | 
flush_lag        | 
replay_lag       | 
sync_priority    | 0
sync_state       | async

postgres=#

-- 停掉主库，同时将原来备库上的recovery.con文件copy一份到主库，后面变更主库为备库的时候会用到。（或者不copy也行，在后面使用pg_rewind的时候，会自动copy过来，包括pg_hba.conf，postgres.conf，recovery.conf文件等，所以做好conf文件的备份）

pg_ctl stop -m fast 
pg_controldata  | grep "Database cluster state"

[postgres@test /home/postgres]$pg_ctl stop -m fast
waiting for server to shut down.... done
server stopped
[postgres@test /home/postgres]$

[postgres@test /home/postgres]$pg_controldata  | grep "Database cluster state"
Database cluster state:               shut down
[postgres@test /home/postgres]$

-- 提升从库为新的主库，可以看到原来的从库上面，recovery.conf变成了recovery.done

pg_ctl promote 
pg_controldata  | grep "Database cluster state"

[postgres@test1 /home/postgres]$pg_ctl promote 
waiting for server to promote.... done
server promoted
[postgres@test1 /home/postgres]$pg_controldata  | grep "Database cluster state"
Database cluster state:               in production
[postgres@test1 /home/postgres]$

[postgres@test1 /opt/PostgreSQL/10/data]$ls *recov*
recovery.done
[postgres@test1 /opt/PostgreSQL/10/data]$more recovery.done 
standby_mode = 'on'
primary_conninfo = 'user=repl password=oracle host=192.168.2.80 port=5432 sslmode=prefer sslcompression=1 krbsrvname=postgres target_s
ession_attrs=any'
[postgres@test1 /opt/PostgreSQL/10/data]$

-- 将原来的主库，部署为从库，编辑recovery.conf文件后重启即可（这样做会有问题，问题是主库和从库不同步。后面使用pg_rewind来解决）

[postgres@test /opt/PostgreSQL/10/data]$more recovery.conf 
standby_mode=on
primary_conninfo = 'user=repl password=oracle host=192.168.2.81 port=5432 sslmode=prefer sslcompression=1 krbsrvname=postgres target_s
ession_attrs=any'
archive_cleanup_command = 'pg_archivecleanup /opt/PostgreSQL/10/data/pg_wal %r'
primary_slot_name = 'node_a_slot'

[postgres@test /opt/PostgreSQL/10/data]$

[postgres@test /opt/PostgreSQL/10/data]$pg_ctl start
waiting for server to start....2021-08-27 08:44:41.640 CST [15175] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2021-08-27 08:44:41.640 CST [15175] LOG:  listening on IPv6 address "::", port 5432
2021-08-27 08:44:41.643 CST [15175] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2021-08-27 08:44:41.660 CST [15175] LOG:  redirecting log output to logging collector process
2021-08-27 08:44:41.660 CST [15175] HINT:  Future log output will appear in directory "log".
 done
server started
[postgres@test /opt/PostgreSQL/10/data]$

-- 验证，主备状态是正常的，但是实际测试是无法同步数据的。当然，如果把备库上的recover.conf改名为recovery.done或者其他名字，启动备库后，备库状态为 in production （这样就相当于failover了，单独的一个读写库了，但是数据就和原来的主库不一样了）。

[postgres@test1 /opt/PostgreSQL/10/data]$pg_controldata  | grep "Database cluster state"
Database cluster state:               in production
[postgres@test1 /opt/PostgreSQL/10/data]$

[postgres@test /opt/PostgreSQL/10/data]$pg_controldata  | grep "Database cluster state"
Database cluster state:               in archive recovery
[postgres@test /opt/PostgreSQL/10/data]$

-- 新的从库中，提示没有复制槽，

[postgres@test /opt/PostgreSQL/10/data/log]$tail -f postgresql-2021-08-27_084441.log
2021-08-27 08:46:51.895 CST [15321] FATAL:  could not start WAL streaming: ERROR:  replication slot "node_a_slot" does not exist
2021-08-27 08:46:56.904 CST [15326] FATAL:  could not start WAL streaming: ERROR:  replication slot "node_a_slot" does not exist
2021-08-27 08:47:01.906 CST [15331] FATAL:  could not start WAL streaming: ERROR:  replication slot "node_a_slot" does not exist
2021-08-27 08:47:06.912 CST [15336] FATAL:  could not start WAL streaming: ERROR:  replication slot "node_a_slot" does not exist
2021-08-27 08:47:11.918 CST [15341] FATAL:  could not start WAL streaming: ERROR:  replication slot "node_a_slot" does not exist
2021-08-27 08:47:16.925 CST [15346] FATAL:  could not start WAL streaming: ERROR:  replication slot "node_a_slot" does not exist
2021-08-27 08:47:21.932 CST [15351] FATAL:  could not start WAL streaming: ERROR:  replication slot "node_a_slot" does not exist
2021-08-27 08:47:26.948 CST [15377] FATAL:  could not start WAL streaming: ERROR:  replication slot "node_a_slot" does not exist
2021-08-27 08:47:31.942 CST [15383] FATAL:  could not start WAL streaming: ERROR:  replication slot "node_a_slot" does not exist
2021-08-27 08:47:36.950 CST [15388] FATAL:  could not start WAL streaming: ERROR:  replication slot "node_a_slot"

-- 在新的备库上，将recovery.conf中的复制槽删除掉，主备正常（其实这个时候，数据是不能同步的，在后面的日志中，可以看到replication terminated by primary server、End of WAL reached on timeline 1 at 0/11000098.表明这个日志时间线之后的日志不能同步）

mydb=# select * from pg_stat_replication;
-[ RECORD 1 ]----+------------------------------
pid              | 78518
usesysid         | 16406
usename          | repl
application_name | walreceiver
client_addr      | 192.168.2.80
client_hostname  | 
client_port      | 29673
backend_start    | 2021-08-27 08:48:44.874848+08
backend_xmin     | 
state            | startup
sent_lsn         | 0/11000098
write_lsn        | 0/11000098
flush_lsn        | 0/11000098
replay_lsn       | 0/11000098
write_lag        | 
flush_lag        | 
replay_lag       | 
sync_priority    | 0
sync_state       | async

mydb=#

-- 新的主库上插入数据，备库上显示数据没有同步过来，同时，新的备库上显示"replication terminated by primary server"

mydb=# insert into pub_t1 values(2,'bbb',now());
INSERT 0 1
mydb=# select * from pub_t1;
-[ RECORD 1 ]------------------------
id       | 1
name     | aaa
crt_time | 2021-08-27 08:17:23.672054
-[ RECORD 2 ]------------------------
id       | 2
name     | bbb
crt_time | 2021-08-27 08:56:33.630264

mydb=# 

mydb=# select * from pub_t1;
 id | name |          crt_time          
----+------+----------------------------
  1 | aaa  | 2021-08-27 08:17:23.672054
(1 row)

mydb=# 

2021-08-27 09:12:05.575 CST [16448] LOG:  replication terminated by primary server
2021-08-27 09:12:05.575 CST [16448] DETAIL:  End of WAL reached on timeline 1 at 0/11000098.
2021-08-27 09:12:10.581 CST [16448] LOG:  restarted WAL streaming at 0/11000000 on timeline 1
2021-08-27 09:12:10.582 CST [16448] LOG:  replication terminated by primary server
2021-08-27 09:12:10.582 CST [16448] DETAIL:  End of WAL reached on timeline 1 at 0/11000098.

-- 经过多次的测试，主库和备库切换后，虽然状态为in production和in archive recovery，但是后续是无法继续同步的。需要使用pg_rewind修复新的备库

[postgres@test1 /opt/PostgreSQL/10/data/log]$tail -f postgresql-2021-08-27_104842.log
2021-08-27 10:49:37.163 CST [85899] DETAIL:  End of WAL reached on timeline 2 at 0/14000098.
2021-08-27 10:49:42.169 CST [85899] LOG:  restarted WAL streaming at 0/14000000 on timeline 2
2021-08-27 10:49:42.169 CST [85899] LOG:  replication terminated by primary server
2021-08-27 10:49:42.169 CST [85899] DETAIL:  End of WAL reached on timeline 2 at 0/14000098.
2021-08-27 10:49:47.181 CST [85899] LOG:  restarted WAL streaming at 0/14000000 on timeline 2
2021-08-27 10:49:47.181 CST [85899] LOG:  replication terminated by primary server
2021-08-27 10:49:47.181 CST [85899] DETAIL:  End of WAL reached on timeline 2 at 0/14000098.
2021-08-27 10:49:52.182 CST [85899] LOG:  restarted WAL streaming at 0/14000000 on timeline 2
2021-08-27 10:49:52.182 CST [85899] LOG:  replication terminated by primary server
2021-08-27 10:49:52.182 CST [85899] DETAIL:  End of WAL reached on timeline 2 at 0/14000098.
2021-08-27 10:49:57.188 CST [85899] LOG:  restarted WAL streaming at 0/14000000 on timeline 2

-- 使用pg_rewind ，将新的备库干净关闭。新的主库不用关闭。注意设置参数wal_log_hints。然后在新的备库上进行pg_rewind，在pg_rewind
的过程中，可以看到从这个0/11000028位置开始重新复制

[postgres@test /opt/PostgreSQL/10/data]$pg_rewind --target-pgdata=$PGDATA --source-server='host=192.168.2.81 port=5432 user=postgres password=oracle' -P
connected to server

target server needs to use either data checksums or "wal_log_hints = on"
Failure, exiting
[postgres@test /opt/PostgreSQL/10/data]$


[postgres@test /opt/PostgreSQL/10/data]$pg_ctl stop
waiting for server to shut down.... done
server stopped
[postgres@test /opt/PostgreSQL/10/data]$pg_rewind --target-pgdata=$PGDATA --source-server='host=192.168.2.81 port=5432 user=postgres password=oracle' -P
connected to server
servers diverged at WAL location 0/11000098 on timeline 1
rewinding from last common checkpoint at 0/11000028 on timeline 1
reading source file list
reading target file list
reading WAL in target
need to copy 103 MB (total source directory size is 135 MB)
105492/105492 kB (100%) copied
creating backup label and updating control file
syncing target data directory
Done!
[postgres@test /opt/PostgreSQL/10/data]$

-- backup_label文件

[postgres@test /opt/PostgreSQL/10/data]$more backup_label.old 
START WAL LOCATION: 0/11000028 (file 000000010000000000000011)
CHECKPOINT LOCATION: 0/11000028
BACKUP METHOD: pg_rewind
BACKUP FROM: standby
START TIME: 2021-08-27 09:38:25 CST
[postgres@test /opt/PostgreSQL/10/data]$

-- 编辑recovery.conf并启动新的备库（可以看到新的备库上有个recovery.done文件）

[postgres@test /opt/PostgreSQL/10/data]$ll recover*
-rw------- 1 postgres postgres 172 Aug 27 09:38 recovery.done
[postgres@test /opt/PostgreSQL/10/data]$

[postgres@test /opt/PostgreSQL/10/data]$more recovery.done 
standby_mode = 'on'
primary_conninfo = 'user=repl password=oracle host=192.168.2.80 port=5432 sslmode=prefer sslcompression=1 krbsrvname=postgres target_s
ession_attrs=any'
[postgres@test /opt/PostgreSQL/10/data]$cp recovery.done recovery.conf   -- 注意修改里面的IP地址 
[postgres@test /opt/PostgreSQL/10/data]$pg_ctl start
waiting for server to start....2021-08-27 09:41:54.533 CST [18217] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2021-08-27 09:41:54.534 CST [18217] LOG:  listening on IPv6 address "::", port 5432
2021-08-27 09:41:54.536 CST [18217] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2021-08-27 09:41:54.554 CST [18217] LOG:  redirecting log output to logging collector process
2021-08-27 09:41:54.554 CST [18217] HINT:  Future log output will appear in directory "log".
 done
server started
[postgres@test /opt/PostgreSQL/10/data]$

-- 验证新的备库的数据，可以看到id=2的记录已经同步过来。

postgres=# \c mydb
You are now connected to database "mydb" as user "postgres".
mydb=# select * from pub_t1;
 id | name |          crt_time          
----+------+----------------------------
  1 | aaa  | 2021-08-27 08:17:23.672054
  2 | bbb  | 2021-08-27 08:56:33.630264
(2 rows)

mydb=#

--再次验证数据，数据同步正常了

mydb=# insert into pub_t1 values(3,'cc',now());
INSERT 0 1
mydb=# select * from pub_t1;
 id | name |          crt_time          
----+------+----------------------------
  1 | aaa  | 2021-08-27 08:17:23.672054
  2 | bbb  | 2021-08-27 08:56:33.630264
  3 | cc   | 2021-08-27 09:43:39.673485
(3 rows)

mydb=# 

[postgres@test /opt/PostgreSQL/10/data]$psql
Password: 
psql.bin (10.15)
Type "help" for help.

postgres=# \c mydb
You are now connected to database "mydb" as user "postgres".
mydb=# select * from pub_t1;
 id | name |          crt_time          
----+------+----------------------------
  1 | aaa  | 2021-08-27 08:17:23.672054
  2 | bbb  | 2021-08-27 08:56:33.630264
  3 | cc   | 2021-08-27 09:43:39.673485
(3 rows)

mydb=#

END

本文标签： Postgres pgrewind

版权声明：本文标题：Postgres主备切换及pg_rewind的使用内容由热心网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：https://m.elefans.com/xitong/1730075533a1221727.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

电子爱好者 - 最新技术资讯及电子产品介绍！

Postgres主备切换及pg_rewind的使用

更多相关文章

【PostgresToDB2(Postgres数据库转db2工具)】PostgresToDB2(Postgres数据库转db2工具) V2.4官方版官方免费下载

【postgres】1、超详细介绍

postgres 退出_Postgresl选择正确的关闭模式

windows下修改postgresSQL10.2默认用户postgres的密码

[翻译]Looking Back at Postgres,Postgres 过往

Postgres-XL 简析

【PostgreSQL】Postgres数据库安装、配置、使用DBLink详解

Postgres Schema information_schema pg_catalog

DDL语句造成Postgres PreparStatement报错

postgres主备库切换测试

Postgres主备切换及pg_rewind的使用

Postgres主从（2）Repmgr基本命令

发表评论

推荐文章

读书笔记——【决胜B端：产品经理升级之路】

中国十大术区都道些

Windows查看连接WiFi密码

Windows环境下edge浏览器点击下载没有反应

win7假死的原因分析

热门文章

笔记本分类大全

麦克风什么牌子的音质效果好？十大音质最好的麦克风推荐

2024 4th International Conference on Electronic Information Engineering and Computer Technology

无敌给力Excel 实现。大佬分享 新手必看

c++小游戏（源码可复制）

2015出现的25项最佳发明

Windows中启用Linux命令功能WSL，并安装Linux子系统

固态硬盘win7+机械硬盘Ubuntu18.04安装经验

腾讯云CentOS 8.4搭建coturn服务

全网最简约的sklearn环境配置教程（百分百成功）

最新文章

android手机配什么蓝牙耳机,Airpods Pro搭配安卓手机+Windows电脑服用指南（避坑指南）...

windows系统镜像下载@ISO下载@安装引导跳过联网登录微软账号

如何在windows下同时多开微信

Windows7 x64 旗舰版 IIS7配置ASP+Access运行500错误解决方法

系统运维岗位面试： 北京某技术中心运维工程师笔试题

Windows7系统基础操作

手机能不能模拟windows系统

vs2015安装msdn_vs2015中文旗舰版下载

Windows 7旗舰版搭建andriod 4.0开发环境记录

【NAS】绿联NAS+alist+lsky+natfrp 实现图床服务

DuiLib和UiLib库源码下载地址

什么手机可以用windows系统

华为告别Windows，鸿蒙实现从手机到PC全面覆盖

HCL使用Openwrt测试组播的简单操作

【微信小程序控制硬件⑦ 进阶篇】巧借阿里云物联网平台的免费连接，从微信小程序颜色采集控制 esp8266 输出七彩灯效果，中秋节来个直播如何？！

小米手机肿么还原时钟

15000流明是多少瓦

一般普通投影机功率多大?

苹果绿联转换器有些投影机不能用

坚果V9投影机具体参数?

有关九年级作文850字精选

80后90后_高一作文

中级卫生专业资格中医全科学主治医师中级模拟题2021年(9)案与解析

(精品)师范大学招考硕士研究生课程八六0试卷

ZXMVC8900(V3

【模拟人生4（The Sims 4）性感露背黑色亮片礼服MOD V20190313】模拟人生4（The Sims 4）性感露背黑色亮片礼服MOD V20190313 官方免费下载

【生化危机2：重制版（Resident Evil 2 Remake）克莱尔红头发深色服装MOD】生化危机2：重制版（Resident Evil 2 Remake）克莱尔红头发深色服装MOD 官方免费下载

【模拟人生4（The Sims 4）性感露背深V领吊带裙MOD V20190311】模拟人生4（The Sims 4）性感露背深V领吊带裙MOD V20190311 官方免费下载

【模拟人生4（The Sims 4）科幻风宇宙飞船家庭住宅MOD V20190311】模拟人生4（The Sims 4）科幻风宇宙飞船家庭住宅MOD V20190311 官方免费下载

【鬼泣5（Devil May Cry V）v1.0十四项修改】鬼泣5（Devil May Cry V）v1.0十四项修改 官方免费下载

如何实现高效的treenode搜索算法

treenode与链表有何本质区别

在哪些场景下应优先考虑使用treenode

treenode在树形结构中的角色是什么

如何通过treenode实现二叉树

无敌给力Excel 实现。大佬分享新手必看

系统运维岗位面试：北京某技术中心运维工程师笔试题

【鬼泣5（Devil May Cry V）v1.0十四项修改】鬼泣5（Devil May Cry V）v1.0十四项修改官方免费下载