PostgreSQL-pgpool高可用方案

olei 3,825 views 5

准备

两台虚拟机或者实体机,IP,数据库以及数据库软件版本:

172.16.247.11   master      CentOS Linux release 7.2.1511 (Core)
172.16.247.12   slave       CentOS Linux release 7.2.1511 (Core)
172.16.247.13    vip

PostgreSQL 9.6.14
pgpool-II version 3.7.10

安装postgresql,各个节点均执行

联网安装

  • 设置postgresql
curl -O  https://download.postgresql.org/pub/repos/yum/reporpms/EL-7-x86_64/pgdg-redhat-repo-latest.noarch.rpm
rpm -ivh pgdg-redhat-repo-latest.noarch.rpm
  • 系统设置
systemctl stop firewalld
systemctl disable firewalld
setenforce 0
sed -i "s@SELINUX=enforcing@SELINUX=disabled@" /etc/sysconfig/selinux
  • 安装
yum -y install postgresql96 postgresql96-server

离线安装

  • 下载rpm安装包
curl -O https://yum.postgresql.org/9.6/redhat/rhel-6-x86_64/postgresql96-9.6.14-1PGDG.rhel6.x86_64.rpm
curl -O https://yum.postgresql.org/9.6/redhat/rhel-6-x86_64/postgresql96-contrib-9.6.14-1PGDG.rhel6.x86_64.rpm
curl -O https://yum.postgresql.org/9.6/redhat/rhel-6-x86_64/postgresql96-libs-9.6.14-1PGDG.rhel6.x86_64.rpm
curl -O https://yum.postgresql.org/9.6/redhat/rhel-6-x86_64/postgresql96-server-9.6.14-1PGDG.rhel6.x86_64.rpm
  • 安装
rpm -ivh postgresql96-*.rpm

初始化

/usr/pgsql-9.6/bin/postgresql96-setup initdb

设置开机自启

systemctl start postgresql-9.6
systemctl enable postgresql-9.6
# 或者
su – postgres; /usr/pgsql-9.6/bin/pg_ctl start –D /var/lig/pgsql/data

测试访问

su - postgres
-bash-4.2$  psql
postgres=#
# postgres用户默认密码是postgres,所以不用输入就可以进入了

创建流复制

主库(172.16.247.11)操作

  • 创建账号repuser,用于远程连接复制
su - postgres
-bash-4.2$  psql
postgres=# create role repuser login replication encrypted password 'repuser';
postgres=# \q
  • 配置数据目录

数据目录是在/var/lib/pgsql/9.6/data/下,可以通过systemctl status postgresql-9.6查看到service目录是在/usr/lib/systemd/system/postgresql-9.6.service,里面的Environment=PGDATA=/var/lib/pgsql/9.6/data,其中设置的数据目录是/var/lib/pgsql/9.6/data/

  • 修改配置文件pg_hba.conf
cd /var/lib/pgsql/9.6/data
vi pg_hba.conf
# 添加如下内容
host     replication     all             0.0.0.0/0               md5
host     all             all             0.0.0.0/0               md5
  • 修改配置文件postgresql.conf
vi postgresql.conf
# 内容修改如下
listen_addresses = '*'
wal_level = replica
max_wal_senders = 5
wal_keep_segments = 128
hot_standby = on
hot_standby_feedback = on
  • 创建配置文件recovery.done
vi /var/lib/pgsql/9.6/data/recovery.done
recovery_target_timeline = 'latest'
standby_mode = on
primary_conninfo = 'host=slave port=5432 user=repuser password=repuser'
trigger_file = '/var/lib/pgsql/data/trigger_file'
  • 设置权限
chown postgres.postgres recovery.done
  • 重启数据库
systemctl restart postgresql-9.6 
# 或者使用命令启动 
su – postgres; /usr/pgsql-9.6/bin/pg_ctl start –D /var/lig/pgsql/data

备库(172.16.247.12)操作

  • 关闭备库服务
systemctl stop postgresql-9.6
  • 备份数据
# 进入数据目录
cd /var/lib/pgsql/9.6/
# 备份数据,并创建空的数据存储目录
mv data/ data_back
mkdir data
chmod 0700 data
chown postgres.postgres data
  • 把主库整个备份到备库(切换到postgres用户,否则备份的文件属主为root
pg_basebackup -D /var/lib/pgsql/9.6/data -Fp -Xs -v -P -h master -p 5432 -U repuser

使用root账号同步的数据默认属主是root,需要设置权限执行:chown postgres.postgres -R *

  • 修改配置文件:postgresql.conf

该配置文件是从主库备份是自动复制过来的,需要修改如下内容

max_connections = 200
  • 修改配置文件
mv recovery.done recovery.conf
vi recovery.conf
# 如下内容
recovery_target_timeline = 'latest'
standby_mode = on
primary_conninfo = 'host=master port=5432 user=repuser password=repuser'
trigger_file = '/var/lib/pgsql/data/trigger_file'

修改host为master

  • 重启备库
systemctl restart postgresql-9.6
# 或者
su – postgres; /usr/pgsql-9.6/bin/pg_ctl start –D /var/lig/pgsql/data

验证流复制是否成功

  • 在主库查看是否有备库连接,并创建一个库test_ry,一个表test_table并添加数据

PostgreSQL-pgpool高可用方案

主库进程

PostgreSQL-pgpool高可用方案

  • 备库登录是否能看到创建的库和表

PostgreSQL-pgpool高可用方案

备份进程

PostgreSQL-pgpool高可用方案

PGPool-II + PG流复制实现HA主备切换

pgpool-II 3.7.10

免密设置

  • 修改hosts文件,添加如下内容(两台均添加)
172.16.247.11   master    
172.16.247.12   slave 
172.16.247.13    vip
  • 设置免密

    • 主库操作
    # 设置密码
    passwd postgrs 
    su – postgres
    -bash-4.2$ ssh-keygen
    ssh-copy-id -i .ssh/id_rsa.pub master
    ssh-copy-id -i .ssh/id_rsa.pub slave
    • 备库操作
    # 设置密码
    passwd postgrs 
    su – postgres
    ssh-keygen
    ssh-copy-id -i .ssh/id_rsa.pub master
    ssh-copy-id -i .ssh/id_rsa.pub slave

安装pgpool-II 3.7.10(两台均操作)

  • 联网安装
curl -O http://www.pgpool.net/yum/rpms/3.7/redhat/rhel-7-x86_64/pgpool-II-release-3.7-1.noarch.rpm
rpm -ivh pgpool-II-release-3.7-1.noarch.rpm
yum -y install pgpool-II-pg96
yum -y install pgpool-II-pg96-debuginfo
yum -y install pgpool-II-pg96-devel
yum -y install pgpool-II-pg96-extensions
  • 离线安装(准备好pgpool-II的包)
curl -O http://www.pgpool.net/yum/rpms/3.7/redhat/rhel-7-x86_64/pgpool-II-pg96-3.7.10-1pgdg.rhel7.x86_64.rpm
curl -O http://www.pgpool.net/yum/rpms/3.7/redhat/rhel-7-x86_64/pgpool-II-pg96-debuginfo-3.7.10-1pgdg.rhel7.x86_64.rpm
curl -O http://www.pgpool.net/yum/rpms/3.7/redhat/rhel-7-x86_64/pgpool-II-pg96-devel-3.7.10-1pgdg.rhel7.x86_64.rpm
curl -O http://www.pgpool.net/yum/rpms/3.7/redhat/rhel-7-x86_64/pgpool-II-pg96-extensions-3.7.10-1pgdg.rhel7.x86_64.rpm
rpm -ivh *.rpm
  • 配置系统命令权限

让普通用户也有权限可以执行命令,postgres用户执行切换脚本需要

chmod u+x /usr/sbin/ip
chmod u+s /usr/sbin/arping
chmod u+s /sbin/ip
  • 设置pid目录权限和创建日志目录
chown postgres.postgres /var/run/pgpool
mkdir –p /var/log/pgpool/
touch /var/log/pgpool/pgpool_status
chown –R postgres.postgres /var/log/pgpool/

主库(172.16.247.11)操作

  • 配置文件pool_hba.conf,添加如下内容

目录为:/etc/pgpool-II/

host    all         all          0.0.0.0/0             md5
#host    all         all         127.0.0.1/32          trust
host    all         all         ::1/128               trust
  • 配置文件pcp.conf

    目录为:/etc/pgpool-II/

    • 生成配置的用户名密码
    pg_md5 postgres
    # 对密码为postgres,进行pg_md5操作,会产生一个md5字符串,例如:
    fjaifiweqifgwoifjjsafjdhfaklhknbv
    • 主库节点将上面生成的内容复制进pcp.conf,内容如下

    用于管理集群(用于pcp_attach_node命令告知集群节点已经恢复,重新加入集群)

    # USERID:MD5PASSWD
    postgres:fjaifiweqifgwoifjjsafjdhfaklhknbv
  • pgpool中添加pg数据库的用户名密码

密码用postgres,和上面一致

pg_md5 -p -m -u postgres pool_passwd

会自动生成文件pool_passwd,这里面存着数据库的账号密码

当连接pgpool监听的端口时,pgpool会先检测pool_passwd文件是否存在输入的账号密码,然后再使用该账号密码连接后端服务器

如果数据库中有repuser这个账号但是pgpool没有,或者pgpooladmin账号但是数据库没有都会报错

  • 配置pgpool.conf,内容如下
listen_addresses = '*'
port = 9999
socket_dir = '/var/run/postgresql'
listen_backlog_multiplier = 2
serialize_accept = off
pcp_listen_addresses = '*'
pcp_port = 9898
pcp_socket_dir = '/var/run/postgresql'

backend_hostname0 = 'master'
backend_port0 = 5432
backend_weight0 = 5
backend_data_directory0 = '/var/lib/pgsql/9.6/data'
backend_flag0 = 'ALLOW_TO_FAILOVER'

backend_hostname1 = 'slave'
backend_port1 = 5432
backend_weight1 = 1
backend_data_directory1 = '/var/lib/pgsql/9.6/data'
backend_flag1 = 'ALLOW_TO_FAILOVER'

enable_pool_hba = on
pool_passwd = 'pool_passwd'
authentication_timeout = 60

ssl = off

replication_mode = off
replicate_select = off
insert_lock = on
lobj_lock_table = ''
replication_stop_on_mismatch = off
failover_if_affected_tuples_mismatch = off
load_balance_mode = on
master_slave_mode = on
master_slave_sub_mode = 'stream'

sr_check_period = 5
sr_check_user = 'repuser'
sr_check_password = 'repuser'
sr_check_database = 'postgres'
delay_threshold = 0

follow_master_command = ''

health_check_period = 10
health_check_timeout = 20
health_check_user = 'postgres'
health_check_password = 'postgres'
health_check_database = 'postgres'
health_check_max_retries = 0
health_check_retry_delay = 1
connect_timeout = 10000

failover_command = ' /var/lib/pgsql/9.6/failover_stream.sh %H'
failback_command = ''
fail_over_on_backend_error = on
search_primary_node_timeout = 300

recovery_user = 'nobody'
recovery_password = ''
recovery_1st_stage_command = ''
recovery_2nd_stage_command = ''
recovery_timeout = 90
client_idle_limit_in_recovery = 0

use_watchdog = on
trusted_servers = ''
ping_path = '/bin'
wd_hostname = 'master'
wd_port = 9000
wd_priority = 1
wd_authkey = ''
wd_ipc_socket_dir = '/var/run/postgresql'

delegate_IP = '172.16.247.13'                   # 虚拟IP地址
if_cmd_path = '/sbin'
if_up_cmd = 'ip addr add $_IP_$/24 dev eno16780032 label eno16780032:0'    # eno16780032是接口名
if_down_cmd = 'ip addr del $_IP_$/24 dev eno16780032:0'
arping_path = '/usr/sbin'
arping_cmd = 'arping -U $_IP_$ -w 1'

wd_monitoring_interfaces_list = ''
wd_lifecheck_method = 'heartbeat'
wd_interval = 10
wd_heartbeat_port = 9694
wd_heartbeat_keepalive = 2
wd_heartbeat_deadtime = 30
heartbeat_destination0 = 'slave'
heartbeat_destination_port0 = 9694
heartbeat_device0 = 'ens33'

wd_life_point = 3
wd_lifecheck_query = 'SELECT 1'
wd_lifecheck_dbname = 'template1'
wd_lifecheck_user = 'nobody'
wd_lifecheck_password = ''

other_pgpool_hostname0 = 'slave'
other_pgpool_port0 = 9999
other_wd_port0 = 9000

创建切换脚本/opt/failover_stream.sh,设置权限
#! /bin/sh
# Failover command for streaming replication.
# Arguments: $1: new master hostname.

new_master=$1
trigger_command="/usr/pgsql-9.6/bin/pg_ctl promote -D /var/lib/pgsql/9.6/data"

# Prompte standby database.
/usr/bin/ssh -T $new_master $trigger_command

exit 0;
  • 设置脚本权限
chmod 777 /var/lib/pgsql/9.6/failover_stream.sh
chown postgres.postgres /var/lib/pgsql/9.6/failover_stream.sh

备库(172.16.247.12)操作

  • 配置文件

从主库复制配置文件pgpool.conf, pcp.conf, pool_passwd, pool_hba.conf 覆盖掉备库的配置文件

scp -r /etc/pgpool-II/* slave:/etc/pgpool-II/
  • 修改配置文件pgpool.conf部分内容
wd_hostname = 'slave'
heartbeat_destination0 = 'master'
other_pgpool_hostname0 = 'master'
  • 脚本

从主库复制切换脚本/opt/failover_stream.sh到本地对于目录

scp -r /var/lib/pgsql/9.6/failover_stream.sh slave:/var/lib/pgsql/9.6/
chmod 777 /var/lib/pgsql/9.6/failover_stream.sh
chown postgres.postgres /var/lib/pgsql/9.6/failover_stream.sh

启动验证

启动pgpool服务(两台均启动)

su - postgres
pgpool -n -C –D
# -D 表示会清除以前的状态
# -n 前台模式启动  如果后台启动,不加-n

这里使用postgres账号启动,因为当故障切换时,需要执行脚本(脚本中有pg_ctl命令),root用户是无法执行的

验证

登录

使用虚IP登录pgpool监听的tcp 9999端口

psql -h vip -p 9999 -U postgres -d postgres
show pool_nodes; # 查看状态
SELECT pg_is_in_recovery();   # 查看当前库的状态,f表示不是备库,t表示备库

PostgreSQL-pgpool高可用方案

测试切换
  • 主库故障

    # 关闭master(172.16.247.11)上的主库
    systemctl stop postgresql-9.6
    

IP(虚IP可能会在172.16.247.11,也可能会在172.16.247.12,所以ssh免密登录时,要设置可以免密登录自己)所在的节点pgpoolssh远程到slave172.16.247.12)执行切换脚本

slave172.16.247.12)主机上的备库会把recover.conf重命名为recover.done后重启,变成主库

  • 从日志可以看到节点是否有shutdown,是否执行了命令/var/lib/pgsql/9.6/failover_stream.sh 主机IP(备库切换成主库)

PostgreSQL-pgpool高可用方案

  • 再次查看状态,master节点(172.16.247.11)变成down状态,同时变成standbyslave节点(172.16.247.12)变成primary

PostgreSQL-pgpool高可用方案

  • 主库恢复

    • master172.16.247.11)主机上
    mv recover.done recover.conf
    # 同时修改postgresql.conf确保max_connections参数的值是大于主库的
    
    • 重启服务
    systemctl restart postgresql-9.6
    
    • 在集群中添加节点状态,将master节点添加至集群中,此时的master节点是备库状态
    pcp_attach_node -d -U postgres -h vip -p 9898 -n 0
    
    • 查看状态,节点master的状态为waiting

    PostgreSQL-pgpool高可用方案

    • 再次模拟故障(此时要恢复至最初始的状态)

    • slave172.16.247.12)主库故障

    systemctl stop postgresql-9.6
    

PostgreSQL-pgpool高可用方案

此时主库切回到master节点(172.16.247.11

  • 恢复slave节点至集群

    
    mv recovery.done  recovery.conf
    systemctl start postgresql-9.6
    pcp_attach_node -d -U postgres -h vip -p 9898 -n 1
  • 再次查看状态

    show pool_nodes
    

常见问题

  • psql登录的时候提示SSL关闭等,都是权限问题,查看pg_hba.conf配置的权限是否正确

PostgreSQL-pgpool高可用方案

  • 时间线不一致导致无法同步

    • 增量备份,同步时间线的方式同步数据库
    
    pg_rewind  --target-pgdata=/home/postgres/data --source-server='host=master port=5432 user=postgres dbname=postgres password=nariadmin'
  • 参数设置master节点

  wal_log_hints = on
  • 可以使用全备份

    • 查看日志提示时间线不匹配cat pg_log/postgresql-Wed.log

PostgreSQL-pgpool高可用方案

# 停止服务,并删除data下的所有文件
systemctl stop postgresql 
# 全部备份
pg_basebackup -D /var/lib/pgsql/data -Fp -Xs -v -P -h slave -p 5432 -U repuser

chown -R postgres.postgres *
mv recovery.done  recovery.conf

# 修改内容master为slave
vi recovery.conf
# 启动服务
systemctl start postgresql

发表评论 取消回复
表情 图片 链接 代码

  1. repostone
    repostone Lv 1

    非技术的路过。

  2. VPS234
    VPS234 Lv 1

    博主你虚拟机用的是VM,还是VirtualBox[小纠结]

  3. 小蒋
    小蒋 Lv 1

    博客长草了,快来除草[aru_163]

    • olei
      olei 站长

      @小蒋哈哈 确实好久没更新了 太忙了...抽空会更新的..

分享