MHA专门用作多个MYSQL服务器(主从环境)提供master故障后转移,并保证数据一致性,由于MHA本身使用的资源很少,甚至于管理100对以上的。
MHA分为manager和node,manager可以运行在node节点之上,manager负责监控和故障转移,其中包含一些脚本配置,如:发送邮件等。
- 可以配置为如下:
一主多从:一个主节点,多个slave
单主多从:一个主节点,多个从节点,从节点不接替主节点
单主多从,一个候选节点:一个主节点,多个slave,其中一个slave为候选主节点
以及多主,三层复制等
一般情况下,我们使用一个主,多个从节点,并且从节点中有一个节点是候选主节点,当主节点down掉后,候选节点便成为主节点对外提供服务。这个场景中会分配一个VIP
(虚拟IP),主崩溃后候选节点接管,在MHA中可以通过配置文件参数调用外部脚本设置,也可以手动更新,更或者使用Keepalived来执行。
当然,如果读并不大,使用一主一从更为稳妥,因为数据一致性得到了保障,但是同时牺牲了读性能和扩展空间,并且可用性大大降低
在MHA场景中使用半同步复制大大降低了这种数据丢失的风险。通常MHA会试图从故障的主设备保存二进制文件,但是有些时候不总是成功,如果出现SSH链接不通的问题,则二进制文件会出现问题,将有丢失最新数据的分享
MHA是可以使用半同步复制,它基于MySQL复制机制。值得注意的是,如果只有一个从站接收到最新的二进制日志事件,MHA可以将这些事件应用到所有其他从站,以便它们可以保持一致。
优点:
主节点故障切换和从节点升级可以快速完成 主站崩溃不会导致数据不一致() 不用修改mysql配置 性能有一定的提升 适用场景多 自定义扩展(脚本) 基于GTID的故障转移
- 要求:
1,SSH公钥认证
Manager通过SSH内部连接到MySQL服务器。最新从站上的MHA节点也通过SSH(scp)在内部将中继日志文件发送到其他从站(如果大于10个主机, 修改/etc/ssh/ sshd_config,默认为10)。为了使这些程序自动化,SSH公钥认证必不可少。而后使用MHA Manager中包含的masterha_check_ssh命令来检查SSH连接是否正常工作。
2,MHA支持5.0或者以上的版本
3,主节点必须启用log-bin
4,备用写库需要授权
5,禁用中继日志自动删除
6,仅支持mysql
为了使MHA真正起作用,大多数设置在启动masterha_manager或masterha_check_repl时自动检查。
结构如下:
前端使用LVS(DR)+keepalived代理端口到中间件Cetus,Cetus是网易的开源项目(后面会说cetus和maxscale),LVS会将请求发送到两台Cetus,由Cetus发送到后端数据库,后端使用MHA,本篇关注红色区域内容即可
篇幅太长,如果有误请指正。所有的内容来自https://github.com/yoshinorim/mha4mysql-manager/wiki
MHA
+-----------------+----------------+------------------+------------+
+ IP + 类型 + 角色 + OS +
+-----------------+----------------+------------------+------------+
+ 10.0.1.61 + MHA + MHA manager(0.56)+ Centos7.2 +
+-----------------+----------------+------------------+------------+
+ VIP 10.0.1.161 +
+-----------------+----------------+------------------+------------+
+ 10.10.240.117 + Mysql-Master + MHA NODE + Centos7.2 +
+-----------------+----------------+------------------+------------+
+ 10.10.0.98 + Mysql-Slave + MHA NODE + Centos7.2 +
+-----------------+----------------+------------------+------------+
+ 10.0.1.49 + Mysql-Slave + MHA NODE + Centos7.2 +
+-----------------+----------------+------------------+------------+
- 一主多从环境准备
1.1 免密码登陆
+----------------------------+
+ 10.10.240.117 master +
+----------------------------+
+ ssh-keygen -t rsa +
+ ssh-copy-id 10.10.0.98 +
+ ssh-copy-id 10.0.1.49 +
+ ssh-copy-id 10.0.1.61 +
+----------------------------+
+----------------------------+
+ 10.10.0.98 slave +
+----------------------------+
+ ssh-keygen -t rsa +
+ ssh-copy-id 10.10.240.117 +
+ ssh-copy-id 10.0.1.49 +
+ ssh-copy-id 10.0.1.61 +
+----------------------------+
+----------------------------+
+ 10.0.1.49 slave +
+----------------------------+
+ ssh-keygen -t rsa +
+ ssh-copy-id 10.10.240.117 +
+ ssh-copy-id 10.10.0.98 +
+ ssh-copy-id 10.0.1.61 +
+----------------------------+
+----------------------------+
+ 10.0.1.61 MHA +
+----------------------------+
+ ssh-keygen -t rsa +
+ ssh-copy-id 10.10.240.117 +
+ ssh-copy-id 10.10.0.98 +
+ ssh-copy-id 10.0.1.49 +
+----------------------------+
1.2 三台机器安装好Mysql
[root@Linuxea-VM-Node117 ~]# yum -y install make automake libtool pkgconfig libaio-devel mysql-devel
[root@Linuxea-VM-Node117 ~]# axel -n 50 http://ftp.ntu.edu.tw/MySQL/DownloaLinuxea/MySQL-5.6/mysql-5.6.40-linux-glibc2.12-x86_64.tar.gz
[root@Linuxea-VM-Node117 ~]# tar xf mysql-5.6.40-linux-glibc2.12-x86_64.tar.gz -C /usr/local
[root@Linuxea-VM-Node117 ~]# mkdir /data/mysql -p
[root@Linuxea-VM-Node117 ~]# groupadd -r -g 306 mysql
[root@Linuxea-VM-Node117 ~]# useradd -g 306 -r -u 307 mysql
[root@Linuxea-VM-Node117 ~]# chown -R mysql.mysql /data/mysql/
[root@Linuxea-VM-Node117 ~]# cd /usr/local && ln -s mysql-5.6.40-linux-glibc2.12-x86_64 mysql && cd mysql
[root@Linuxea-VM-Node117 /usr/local/mysql]# chown -R mysql.mysql /usr/local/mysql
[root@Linuxea-VM-Node117 /usr/local/mysql]# ./scripts/mysql_install_db --user=mysql --datadir=/data/mysql --basedir=/usr/local/mysql
[root@Linuxea-VM-Node117 /usr/local/mysql]# echo 'export PATH=/usr/local/mysql/bin:$PATH' > /etc/profile.d/mysql.sh && source /etc/profile.d/mysql.sh
[root@Linuxea-VM-Node117 /usr/local/mysql]# cp support-files/mysql.server /etc/init.d/mysqld
其中配置文件之一,10.10.240.117 master
[root@Linuxea-VM-Node117 /etc]# cat /etc/my.cnf
[client]
port = 3306
socket = /tmp/mysql.sock
#default-character-set = utf8
[mysqld]
relay_log = relay-bin
basedir = /usr/local/mysql
datadir = /data/mysql
pid-file = /data/mysql/mysql.pid
user = mysql
bind-address = 0.0.0.0
server-id = 2
init-connect = 'SET NAMES utf8'
character-set-server = utf8
default-time_zone = '-4:00'
skip-name-resolve
back_log = 300
max_connections = 999
max_connect_errors = 1024000
open_files_limit = 65535
table_open_cache = 256
max_allowed_packet = 50M
max_heap_table_size = 512M
tmp_table_size = 32M
read_buffer_size = 2M
read_rnd_buffer_size = 16M
sort_buffer_size = 8M
join_buffer_size = 8M
key_buffer_size = 16M
thread_cache_size = 16
query_cache_type = 2
query_cache_size = 16M
query_cache_limit = 5M
thread_stack = 192k
ft_min_word_len = 4
log_bin = mysql-bin
binlog_format = ROW
expire_logs_days = 7
log_error = /data/mysql/mysql-error.log
slow_query_log = 1
long_query_time = 1
#log_slow_verbosity=query_plan
slow_query_log_file = /data/mysql/mysql-slow.log
performance_schema = 0
skip-external-locking #跳过外部锁定,避免external locking
###myisam###
bulk_insert_buffer_size = 64M
myisam_sort_buffer_size = 16M
myisam_max_sort_file_size = 1G
myisam_repair_threaLinuxea = 1
###InnoDB###
default_storage_engine = InnoDB
innodb_file_per_table = 1
innodb_open_files = 500
innodb_buffer_pool_size = 2G
innodb_write_io_threaLinuxea = 8
innodb_read_io_threaLinuxea = 8
innodb_thread_concurrency = 16
innodb_purge_threaLinuxea = 1
innodb_flush_log_at_trx_commit = 1
innodb_log_buffer_size = 8M
innodb_log_file_size = 256M
innodb_log_files_in_group = 3
innodb_max_dirty_pages_pct = 90
innodb_lock_wait_timeout = 10
innodb_autoinc_lock_mode = 2
innodb_doublewrite = 1
innodb_rollback_on_timeout = 1
innodb_additional_mem_pool_size = 16M
innodb_force_recovery=0
innodb_locks_unsafe_for_binlog = 1
interactive_timeout = 28800
wait_timeout = 120
[mysqldump]
quick
max_allowed_packet = 16M
[myisamchk]
key_buffer_size = 16M
sort_buffer_size = 8M
read_buffer = 4M
write_buffer = 4M
[mysqld_safe]
malloc_lib=/usr/lib64/libjemalloc.so.1
1.3 数据库初始化
DELETE FROM mysql.user WHERE User='';
DELETE FROM mysql.db WHERE Db LIKE 'test%';
DROP DATABASE test;
UPDATE mysql.user SET password = password('linuxea') WHERE user = 'root';
flush privileges;
1.4 在master上创建用户进行同步
10.10.240.117 master
修改添加
server-id = 1 # 主从中,这里的ID号不能一样
log_bin = master-bin
binlog_format = ROW
read-only = on
1.5 创建复制的用户
mysql> grant replication slave on *.* to slave@'%' identified by "password";
Query OK, 0 rows affected (0.00 sec)
主节点的logs位置
mysql> show master logs;
+------------------+-----------+
| Log_name | File_size |
+------------------+-----------+
| mysql-bin.000001 | 2510 |
+------------------+-----------+
1 row in set (0.00 sec)
查看中继日志打开状态
mysql> SHOW GLOBAL VARIABLES LIKE '%only';
+-------------------------------+-------+
| Variable_name | Value |
+-------------------------------+-------+
| innodb_optimize_fulltext_only | OFF |
| innodb_read_only | OFF |
| read_only | ON |
| tx_read_only | OFF |
+-------------------------------+-------+
4 rows in set (0.00 sec)
- 开始主从同步
2 在slave上进行同步
10.10.0.98 slave
可以使用SHOW GLOBAL VARIABLES LIKE '%only';查看状态,如果没有开启则需要开启
mysql> SHOW GLOBAL VARIABLES LIKE '%only';
+-------------------------------+-------+
| Variable_name | Value |
+-------------------------------+-------+
| innodb_optimize_fulltext_only | OFF |
| innodb_read_only | OFF |
| read_only | ON |
| tx_read_only | OFF |
+-------------------------------+-------+
4 rows in set (0.00 sec)
打开中继日志和修改id
server-id = 2 # 主从中,这里的ID号不能一样
relay-log = relay-bin
2.1 开始复制
开始复制的配置,指明开始的log和点
mysql> CHANGE MASTER TO MASTER_HOST='10.10.240.117',MASTER_USER='slave',MASTER_PASSWORD='password',MASTER_LOG_FILE='mysql-bin.000001',MASTER_LOG_POS=2788,MASTER_CONNECT_RETRY=5,MASTER_HEARTBEAT_PERIOD=2;
Query OK, 0 rows affected, 2 warnings (0.00 sec)
打开slave io进程和sql进程
mysql> start slave;
Query OK, 0 rows affected (0.00 sec)
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 10.10.240.117
Master_User: slave
Master_Port: 3306
Connect_Retry: 5
Master_Log_File: mysql-bin.000001
Read_Master_Log_Pos: 2788
Relay_Log_File: relay-bin.000002
Relay_Log_Pos: 283
Relay_Master_Log_File: mysql-bin.000001
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
mysql>
重点关注这两个状态先
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
2.2 在第二台从库授权(10.0.1.49)
10.0.1.49 slave 同样执行和10.10.0.98一样的操作
- 半复制状态
3.半复制
在上面已经配置好主从了,现在开启半复制
mysql> show global variables like 'have_dynamic_loading';
+----------------------+-------+
| Variable_name | Value |
+----------------------+-------+
| have_dynamic_loading | YES |
+----------------------+-------+
1 row in set (0.00 sec)
在所有的主从节点安装插件(master/slave)
mysql> INSTALL PLUGIN rpl_semi_sync_master SONAME 'semisync_master.so';
Query OK, 0 rows affected (0.10 sec)
mysql> INSTALL PLUGIN rpl_semi_sync_slave SONAME 'semisync_slave.so';
Query OK, 0 rows affected (0.01 sec)
状态信息
mysql> SHOW GLOBAL VARIABLES LIKE '%semi%';
+------------------------------------+-------+
| Variable_name | Value |
+------------------------------------+-------+
| rpl_semi_sync_master_enabled | OFF |
| rpl_semi_sync_master_timeout | 10000 | # 等待响应时间
| rpl_semi_sync_master_trace_level | 32 | # 跟踪级别
| rpl_semi_sync_master_wait_no_slave | ON |
| rpl_semi_sync_slave_enabled | OFF |
| rpl_semi_sync_slave_trace_level | 32 |
+------------------------------------+-------+
6 rows in set (0.00 sec)
3.1 主节点开启同步(master)配置
SET GLOBAL rpl_semi_sync_master_enabled=1; 表示在 master 上已经开启半同步复制模式。
mysql> SET GLOBAL rpl_semi_sync_master_enabled=1;
Query OK, 0 rows affected (0.00 sec)
mysql> SET GLOBAL rpl_semi_sync_slave_enabled = 1;
Query OK, 0 rows affected (0.00 sec)
mysql> SET GLOBAL rpl_semi_sync_master_timeout =4000;
Query OK, 0 rows affected (0.00 sec)
mysql> SHOW GLOBAL VARIABLES LIKE '%semi%';
+------------------------------------+-------+
| Variable_name | Value |
+------------------------------------+-------+
| rpl_semi_sync_master_enabled | ON |
| rpl_semi_sync_master_timeout | 4000 |
| rpl_semi_sync_master_trace_level | 32 |
| rpl_semi_sync_master_wait_no_slave | ON |
| rpl_semi_sync_slave_enabled | ON |
| rpl_semi_sync_slave_trace_level | 32 |
+------------------------------------+-------+
6 rows in set (0.00 sec)
mysql>
3.2 从库半同步(slave)配置
mysql> show variables like "rpl_semi_sync_slave_enabled";
+-----------------------------+-------+
| Variable_name | Value |
+-----------------------------+-------+
| rpl_semi_sync_slave_enabled | OFF |
+-----------------------------+-------+
1 row in set (0.00 sec)
3.3 需要一次停止和开启IO_THREAD
mysql> STOP SLAVE IO_THREAD;
Query OK, 0 rows affected (0.25 sec)
mysql> SET GLOBAL rpl_semi_sync_slave_enabled = 1;
Query OK, 0 rows affected (0.00 sec)
mysql> START SLAVE IO_THREAD;
Query OK, 0 rows affected (0.00 sec)
mysql> show variables like "rpl_semi_sync_slave_enabled";
+-----------------------------+-------+
| Variable_name | Value |
+-----------------------------+-------+
| rpl_semi_sync_slave_enabled | ON |
+-----------------------------+-------+
1 row in set (0.00 sec)
3.4 在查看主节点状态
mysql> show global status like '%semi%';
+--------------------------------------------+---------+
| Variable_name | Value |
+--------------------------------------------+---------+
| Rpl_semi_sync_master_clients | 2 |
| Rpl_semi_sync_master_net_avg_wait_time | 472 |
| Rpl_semi_sync_master_net_wait_time | 1972924 |
| Rpl_semi_sync_master_net_waits | 4174 |
| Rpl_semi_sync_master_no_times | 0 |
| Rpl_semi_sync_master_no_tx | 0 |
| Rpl_semi_sync_master_status | ON |
| Rpl_semi_sync_master_timefunc_failures | 0 |
| Rpl_semi_sync_master_tx_avg_wait_time | 573 |
| Rpl_semi_sync_master_tx_wait_time | 1578871 |
| Rpl_semi_sync_master_tx_waits | 2752 |
| Rpl_semi_sync_master_wait_pos_backtraverse | 0 |
| Rpl_semi_sync_master_wait_sessions | 0 |
| Rpl_semi_sync_master_yes_tx | 2752 |
| Rpl_semi_sync_slave_status | OFF | # 主节点上这个是关闭的
+--------------------------------------------+---------+
15 rows in set (0.00 sec)
mysql>
3.5 在查看从节点状态
mysql> show global status like 'Rpl_semi_sync_%';
+--------------------------------------------+-------+
| Variable_name | Value |
+--------------------------------------------+-------+
| Rpl_semi_sync_master_clients | 0 |
| Rpl_semi_sync_master_net_avg_wait_time | 0 |
| Rpl_semi_sync_master_net_wait_time | 0 |
| Rpl_semi_sync_master_net_waits | 0 |
| Rpl_semi_sync_master_no_times | 0 |
| Rpl_semi_sync_master_no_tx | 0 |
| Rpl_semi_sync_master_status | OFF | # 从节点上这个也是关闭的
| Rpl_semi_sync_master_timefunc_failures | 0 |
| Rpl_semi_sync_master_tx_avg_wait_time | 0 |
| Rpl_semi_sync_master_tx_wait_time | 0 |
| Rpl_semi_sync_master_tx_waits | 0 |
| Rpl_semi_sync_master_wait_pos_backtraverse | 0 |
| Rpl_semi_sync_master_wait_sessions | 0 |
| Rpl_semi_sync_master_yes_tx | 0 |
| Rpl_semi_sync_slave_status | ON |
+--------------------------------------------+-------+
15 rows in set (0.00 sec)
mysql>
- 注意
Rpl_semi_sync_master_yes_tx Semi-sync模式下,成功的事务数
Rpl_semi_sync_master_no_tx 库未及时响应的事务数,如果这个值很大就有问题
- 写入配置文件
写可以写入到配置文件中永久生效
Master上:
[mysqld]
rpl_semi_sync_master_enabled=1
rpl_semi_sync_master_timeout=10000 # 10秒(默认)
所有Slave上:
[mysqld]
rpl_semi_sync_slave_enabled=1
- 注意:两台slave服务器设置read_only(由mha设置原master备机为read_only=0 ,从库对外提供读服务,只所以没有写进配置文件,是因为随时slave会提升为master)
mysql> select @@read_only;
+-------------+
| @@read_only |
+-------------+
| 0 |
+-------------+
1 row in set (0.00 sec)
mysql> set global read_only=1;
Query OK, 0 rows affected (0.00 sec)
mysql> select @@read_only;
+-------------+
| @@read_only |
+-------------+
| 1 |
+-------------+
1 row in set (0.00 sec)
- 4,MHA安装
在10.0.1.61 mha-manager安装
4.1 mha4mysql-manager安装
[root@Linuxea-VM-Node61 ~]# yum install perl-DBD-MySQL -y
[root@Linuxea-VM-Node61 ~]# rpm -Uvh http://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
[root@Linuxea-VM-Node61 ~]# yum -y install perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager perl-Time-HiRes perl-devel \
perl-Module-Install.noarch perl cpan perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager \
perl-Time-HiRes net-tools
[root@Linuxea-VM-Node61 ~]# wget https://downloads.mariadb.com/MHA/mha4mysql-manager-0.56.tar.gz
[root@Linuxea-VM-Node61 ~]# tar xf mha4mysql-manager-0.56.tar.gz
[root@Linuxea-VM-Node61 ~]# cd mha4mysql-manager-0.56/
安装便是一路回车和yes即可
[root@Linuxea-VM-Node61 ~/mha4mysql-manager-0.56]# perl Makefile.PL
*** Module::AutoInstall version 1.03
*** Checking for Perl dependencies...
[Core Features]
- DBI ...loaded. (1.627)
- DBD::mysql ...loaded. (4.023)
- Time::HiRes ...loaded. (1.9725)
- Config::Tiny ...loaded. (2.14)
- Log::Dispatch ...loaded. (2.41)
- Parallel::ForkManager ...loaded. (1.18)
- MHA::NodeConst ...missing.
==> Auto-install the 1 mandatory module(s) from CPAN? [y] y
*** Dependencies will be installed the next time you type 'make'.
*** Module::AutoInstall configuration finished.
Checking if your kit is complete...
Would you like to configure as much as possible automatically? [yes] yes
Would you like me to automatically choose some CPAN mirror
sites for you? (This means connecting to the Internet) [yes] yes
Looks good
Warning: prerequisite MHA::NodeConst 0 not found.
Writing Makefile for mha4mysql::manager
Writing MYMETA.yml and MYMETA.json
[root@Linuxea-VM-Node61 ~/mha4mysql-manager-0.56]# make
[root@Linuxea-VM-Node61 ~/mha4mysql-manager-0.56]# make install
4.2 mha4mysql-node安装(所有MHA NODE节点都需要安装,包括MHA管理节点)
rpm -Uvh http://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
yum -y install perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager perl-Time-HiRes perl-devel \
perl-Module-Install.noarch perl cpan perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager \
perl-Time-HiRes net-tools
wget https://downloaLinuxea.mariadb.com/MHA/mha4mysql-node-0.56.tar.gz
tar xf mha4mysql-node-0.56.tar.gz
cd mha4mysql-node-0.56/
perl Makefile.PL
make
make install
4.3 所有机器授权mha账户
创建mha用户
grant all privileges on *.* to mha@'%' identified by 'password_123';
4.4 管理配置文件配置(MHA 主机10.0.1.61 )
创建工作目录
mkdir /data/masterha/logs -p
配置文件如下
vim /etc/masterha_default.cnf
[server default]
# 授权的账户和密码
user=mha
password=password_123
# 工作目录和日志文件为孩子
manager_workdir=/data/masterha
manager_log=/data/masterha/manager.log
# bing-log的保存位置
master_binlog_dir=/data/mysql
#设置远端mysql在发生切换时binlog的保存位置
remote_workdir=/data/masterha
# 免密钥登录的账户名
ssh_user=root
# 授权的数据库复制的账户和密码
repl_user=slave
repl_password=password
#设置监控主库,发送ping包的时间间隔,默认是3秒,尝试三次没有回应的时候自动进行railover
ping_interval=1
#---------------------------------------------------------------------------------------------------
# 设置故障发生后关闭故障主机脚本(该脚本的主要作用是关闭主机放在发生脑裂,这里没有使用)
# shutdown_script=""
# 设置手动切换时候的切换脚本(可省略)
# master_ip_online_change_script=/data/masterha/master_ip_failover
# 设置发生切换后发送的报警的脚本(可省略)
# report_script=/data/masterha/send_report
# 设置自动failover时候的切换脚本(可省略)
# master_ip_failover_script=/data/masterha/master_ip_failover
#----------------------------------------------------------------------------------------------------
# secondary_check_script= /usr/local/bin/masterha_secondary_check -s server1 -s server2 --master_host=server1 --master_ip=10.10.240.117 --master6
[server1]
hostname=10.10.240.117
#设置master 保存binlog
master_binlog_dir=/data/mysql
#设置为候选master,如果设置该参数以后,发生主从切换以后将会将此从库提升为主库,即使这个主库不是集群中时间最新的slave
candidate_master=1
#relay_log_purge=0
port=3306
#-------------------------check_repl_delay=0--------------------------------
check_repl_delay=0
#默认情况下如果一个slave落后master 100M的relay logs的话,MHA将不会选择该
#slave作为一个新的master,因为对于这个slave的恢复需要花费很长时间通过设置
# check_repl_delay=0,MHA触发切换在选择一个新的master的时候将会忽略复制延时
# 这个参数对于设置了candidate_master=1的主机非常有用
# 因为这个候选主在切换的过程中一定是新的master
#---------------------------------------------------------------------------
[server2]
hostname=10.10.0.98
port=3306
master_binlog_dir=/data/mysql
candidate_master=1
check_repl_delay=0
[server3]
hostname=10.0.1.49
port=3306
master_binlog_dir=/data/mysql
no_master=1
4.5验证互信
[root@Linuxea-VM-Node146 ~/mha4mysql-node-0.56]# masterha_check_ssh --conf=/etc/masterha_default.cnf
Sat May 26 13:29:31 2018 - [info] Reading default configuratoins from /etc/masterha_default.cnf..
Sat May 26 13:29:31 2018 - [info] Reading application default configurations from /etc/masterha_default.cnf..
Sat May 26 13:29:31 2018 - [info] Reading server configurations from /etc/masterha_default.cnf..
Sat May 26 13:29:31 2018 - [info] Starting SSH connection tests..
Sat May 26 13:29:31 2018 - [debug]
Sat May 26 13:29:31 2018 - [debug] Connecting via SSH from root@10.10.240.117(10.10.240.117:22) to root@10.10.0.98(10.10.0.98:22)..
Sat May 26 13:29:31 2018 - [debug] ok.
Sat May 26 13:29:31 2018 - [debug] Connecting via SSH from root@10.10.240.117(10.10.240.117:22) to root@10.0.1.49(10.0.1.49:22)..
Sat May 26 13:29:31 2018 - [debug] ok.
Sat May 26 13:29:32 2018 - [debug]
Sat May 26 13:29:31 2018 - [debug] Connecting via SSH from root@10.10.0.98(10.10.0.98:22) to root@10.10.240.117(10.10.240.117:22)..
Sat May 26 13:29:32 2018 - [debug] ok.
Sat May 26 13:29:32 2018 - [debug] Connecting via SSH from root@10.10.0.98(10.10.0.98:22) to root@10.0.1.49(10.0.1.49:22)..
Sat May 26 13:29:32 2018 - [debug] ok.
Sat May 26 13:29:33 2018 - [debug]
Sat May 26 13:29:32 2018 - [debug] Connecting via SSH from root@10.0.1.49(10.0.1.49:22) to root@10.10.240.117(10.10.240.117:22)..
Sat May 26 13:29:32 2018 - [debug] ok.
Sat May 26 13:29:32 2018 - [debug] Connecting via SSH from root@10.0.1.49(10.0.1.49:22) to root@10.10.0.98(10.10.0.98:22)..
Sat May 26 13:29:32 2018 - [debug] ok.
Sat May 26 13:29:33 2018 - [info] All SSH connection tests passed successfully.
4.6 检查配置
[root@Linuxea-VM-Node61 ~]# masterha_check_repl --conf=/etc/masterha_default.cnf
Sat May 26 14:31:39 2018 - [info] Reading default configuratoins from /etc/masterha_default.cnf..
Sat May 26 14:31:39 2018 - [info] Reading application default configurations from /etc/masterha_default.cnf..
Sat May 26 14:31:39 2018 - [info] Reading server configurations from /etc/masterha_default.cnf..
Sat May 26 14:31:39 2018 - [info] MHA::MasterMonitor version 0.56.
Sat May 26 14:31:40 2018 - [info] Dead Servers:
Sat May 26 14:31:40 2018 - [info] Alive Servers:
Sat May 26 14:31:40 2018 - [info] 10.10.240.117(10.10.240.117:3306)
Sat May 26 14:31:40 2018 - [info] 10.10.0.98(10.10.0.98:3306)
Sat May 26 14:31:40 2018 - [info] 10.0.1.49(10.0.1.49:3306)
Sat May 26 14:31:40 2018 - [info] Alive Slaves:
Sat May 26 14:31:40 2018 - [info] 10.10.0.98(10.10.0.98:3306) Version=5.6.40-log (oldest major version between slaves) log-bin:enabled
Sat May 26 14:31:40 2018 - [info] Replicating from 10.10.240.117(10.10.240.117:3306)
Sat May 26 14:31:40 2018 - [info] Primary candidate for the new Master (candidate_master is set)
Sat May 26 14:31:40 2018 - [info] 10.0.1.49(10.0.1.49:3306) Version=5.6.40-log (oldest major version between slaves) log-bin:enabled
Sat May 26 14:31:40 2018 - [info] Replicating from 10.10.240.117(10.10.240.117:3306)
Sat May 26 14:31:40 2018 - [info] Not candidate for the new Master (no_master is set)
Sat May 26 14:31:40 2018 - [info] Current Alive Master: 10.10.240.117(10.10.240.117:3306)
Sat May 26 14:31:40 2018 - [info] Checking slave configurations..
Sat May 26 14:31:40 2018 - [warning] relay_log_purge=0 is not set on slave 10.10.0.98(10.10.0.98:3306).
Sat May 26 14:31:40 2018 - [warning] relay_log_purge=0 is not set on slave 10.0.1.49(10.0.1.49:3306).
Sat May 26 14:31:40 2018 - [info] Checking replication filtering settings..
Sat May 26 14:31:40 2018 - [info] binlog_do_db= , binlog_ignore_db=
Sat May 26 14:31:40 2018 - [info] Replication filtering check ok.
Sat May 26 14:31:40 2018 - [info] Starting SSH connection tests..
Sat May 26 14:31:42 2018 - [info] All SSH connection tests passed successfully.
Sat May 26 14:31:42 2018 - [info] Checking MHA Node version..
Sat May 26 14:31:42 2018 - [info] Version check ok.
Sat May 26 14:31:42 2018 - [info] Checking SSH publickey authentication settings on the current master..
Sat May 26 14:31:42 2018 - [info] HealthCheck: SSH to 10.10.240.117 is reachable.
Sat May 26 14:31:42 2018 - [info] Master MHA Node version is 0.53.
Sat May 26 14:31:42 2018 - [info] Checking recovery script configurations on the current master..
Sat May 26 14:31:42 2018 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql --output_file=/data/masterha/save_binary_logs_test --manager_version=0.56 --start_f
Sat May 26 14:31:42 2018 - [info] Connecting to root@10.10.240.117(10.10.240.117)..
Creating /data/masterha if not exists.. ok.
Checking output directory is accessible or not..
ok.
Binlog found at /data/mysql, up to mysql-bin.000001
Sat May 26 14:31:42 2018 - [info] Master setting check done.
Sat May 26 14:31:42 2018 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Sat May 26 14:31:42 2018 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='mha' --slave_host=10.10.0.98 --slave_ip=10.10.0.98 --slave_port=3306 --workdir=/data/masterha --targetx
Sat May 26 14:31:42 2018 - [info] Connecting to root@10.10.0.98(10.10.0.98:22)..
Checking slave recovery environment settings..
Opening /data/mysql/relay-log.info ... ok.
Relay log found at /data/mysql, up to relay-bin.000002
Temporary relay log file is /data/mysql/relay-bin.000002
Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.
done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Sat May 26 14:31:42 2018 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='mha' --slave_host=10.0.1.49 --slave_ip=10.0.1.49 --slave_port=3306 --workdir=/data/masterha --target_vx
Sat May 26 14:31:42 2018 - [info] Connecting to root@10.0.1.49(10.0.1.49:22)..
Checking slave recovery environment settings..
Opening /data/mysql/relay-log.info ... ok.
Relay log found at /data/mysql, up to relay-bin.000002
Temporary relay log file is /data/mysql/relay-bin.000002
Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.
done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Sat May 26 14:31:43 2018 - [info] Slaves settings check done.
Sat May 26 14:31:43 2018 - [info]
10.10.240.117 (current master)
+--10.10.0.98
+--10.0.1.49
Sat May 26 14:31:43 2018 - [info] Checking replication health on 10.10.0.98..
Sat May 26 14:31:43 2018 - [info] ok.
Sat May 26 14:31:43 2018 - [info] Checking replication health on 10.0.1.49..
Sat May 26 14:31:43 2018 - [info] ok.
Sat May 26 14:31:43 2018 - [warning] master_ip_failover_script is not defined.
Sat May 26 14:31:43 2018 - [warning] shutdown_script is not defined.
Sat May 26 14:31:43 2018 - [info] Got exit code 0 (Not master dead).
MySQL Replication Health is OK.
- 报错:
Sat May 26 13:35:35 2018 - [error][/usr/local/share/perl5/MHA/Server.pm, ln383] 10.10.0.98(10.10.0.98:3306): User slave does not exist or does not have REPLICATION SLAVE privilege! Other slaves can not start replication from this host.
10.10.0.98是候选master,所以要授权有复制的权限
GRANT REPLICATION SLAVE ON . TO 'slave'@'%' IDENTIFIED by 'password';
使用type查看mysql和mysqlbinlog位置,链接到/usr/bin下,示例如下:
ln -s /usr/local/mysql/bin/mysql /usr/bin/mysql
ln -s /usr/local/mysql/bin/mysqlbinlog /usr/bin/mysqlbinlog
- 高可用配置
5.配置VIP
到此位置,仅仅配置好了mha,我们测试他的高可用性
配置中打开这个配置
master_ip_failover_script=/data/masterha/master_ip_failover
master_ip_online_change_script=/data/masterha/master_ip_failover
脚本内容如下,要设置好VIP,不过一开始的VIP要在主机master上手动设置,脚本中设置VIP地址
[root@Linuxea-Node61 /data/masterha]# cat master_ip_failover
#!/usr/bin/env perl
use strict;
use warnings FATAL => 'all';
use Getopt::Long;
my (
$command, $ssh_user, $orig_master_host, $orig_master_ip,
$orig_master_port, $new_master_host, $new_master_ip, $new_master_port
);
my $vip = '10.0.1.161/24'; # Virtual IP #设置VIP地址
my $key = "1";
my $ssh_start_vip = "/sbin/ifconfig eth0:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig eth0:$key down";
$ssh_user = "root";
GetOptions(
'command=s' => \$command,
'ssh_user=s' => \$ssh_user,
'orig_master_host=s' => \$orig_master_host,
'orig_master_ip=s' => \$orig_master_ip,
'orig_master_port=i' => \$orig_master_port,
'new_master_host=s' => \$new_master_host,
'new_master_ip=s' => \$new_master_ip,
'new_master_port=i' => \$new_master_port,
);
exit &main();
sub main {
print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";
if ( $command eq "stop" || $command eq "stopssh" ) {
# $orig_master_host, $orig_master_ip, $orig_master_port are passed.
# If you manage master ip address at global catalog database,
# invalidate orig_master_ip here.
my $exit_code = 1;
eval {
print "Disabling the VIP on old master: $orig_master_host \n";
&stop_vip();
$exit_code = 0;
};
if ($@) {
warn "Got Error: $@\n";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "start" ) {
# all arguments are passed.
# If you manage master ip address at global catalog database,
# activate new_master_ip here.
# You can also grant write access (create user, set read_only=0, etc) here.
my $exit_code = 10;
eval {
print "Enabling the VIP - $vip on the new master - $new_master_host \n";
&start_vip();
$exit_code = 0;
};
if ($@) {
warn $@;
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "status" ) {
print "Checking the Status of the script.. OK \n";
`ssh $ssh_user\@cluster1 \" $ssh_start_vip \"`;
exit 0;
}
else {
&usage();
exit 1;
}
}
# A simple system call that enable the VIP on the new master
sub start_vip() {
`ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
# A simple system call that disable the VIP on the old_master
sub stop_vip() {
`ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}
sub usage {
print
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}
脚本修改后在配置Master VIP(在主节点/W)
[root@Linuxea-Node117 ~/mha4mysql-node-0.56]# /sbin/ifconfig eth0:1 10.0.1.161/24
在mha-manager节点启动
[root@Linuxea-Node61 /data/masterha]# nohup masterha_manager --conf=/etc/masterha_default.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /data/masterha/manager.log 2>&1 &
观察日志(日志太长,删了一些不需要的)
[root@Linuxea-Node61 /data/masterha]# tail -f /data/masterha/manager.log
Sun Jun 3 18:12:30 2018 - [info] read_only=1 is not set on slave 10.10.240.117(10.10.240.117:3306).
Sun Jun 3 18:12:30 2018 - [warning] relay_log_purge=0 is not set on slave 10.10.240.117(10.10.240.117:3306).
Sun Jun 3 18:12:30 2018 - [info] read_only=1 is not set on slave 10.0.1.49(10.0.1.49:3306).
Sun Jun 3 18:12:30 2018 - [warning] relay_log_purge=0 is not set on slave 10.0.1.49(10.0.1.49:3306).
Sun Jun 3 18:12:30 2018 - [info] Checking replication filtering settings..
Sun Jun 3 18:12:30 2018 - [info] binlog_do_db= , binlog_ignore_db=
Sun Jun 3 18:12:30 2018 - [info] Replication filtering check ok.
Sun Jun 3 18:12:30 2018 - [info] Starting SSH connection tests..
Sun Jun 3 18:12:31 2018 - [info] All SSH connection tests passed successfully.
Sun Jun 3 18:12:31 2018 - [info] Checking MHA Node version..
Sun Jun 3 18:12:32 2018 - [info] Version check ok.
Sun Jun 3 18:12:32 2018 - [info] Checking SSH publickey authentication settings on the current master..
Sun Jun 3 18:12:32 2018 - [info] HealthCheck: SSH to 10.10.0.98 is reachable.
Sun Jun 3 18:12:32 2018 - [info] Master MHA Node version is 0.56.
Sun Jun 3 18:12:32 2018 - [info] Checking recovery script configurations on the current master..
Sun Jun 3 18:12:32 2018 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql --output_file=/data/masterha/save_binary_logs_test --manager_version=0.56 --start_file=mysql-bin.000002
Sun Jun 3 18:12:32 2018 - [info] Connecting to root@10.10.0.98(10.10.0.98)..
Creating /data/masterha if not exists.. ok.
Checking output directory is accessible or not..
ok.
Binlog found at /data/mysql, up to mysql-bin.000002
Sun Jun 3 18:12:32 2018 - [info] Master setting check done.
Sun Jun 3 18:12:32 2018 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Sun Jun 3 18:12:32 2018 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='mha' --slave_host=10.10.240.117 --slave_ip=10.10.240.117 --slave_port=3306 --workdir=/data/masterha --target_version=5.6.40-log --manager_version=0.56 --relay_log_info=/data/mysql/relay-log.info --relay_dir=/data/mysql/ --slave_pass=xxx
Sun Jun 3 18:12:32 2018 - [info] Connecting to root@10.10.240.117(10.10.240.117:22)..
Checking slave recovery environment settings..
Opening /data/mysql/relay-log.info ... ok.
Relay log found at /data/mysql, up to relay-bin.000004
Temporary relay log file is /data/mysql/relay-bin.000004
Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.
done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Sun Jun 3 18:12:32 2018 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='mha' --slave_host=10.0.1.49 --slave_ip=10.0.1.49 --slave_port=3306 --workdir=/data/masterha --target_version=5.6.40-log --manager_version=0.56 --relay_log_info=/data/mysql/relay-log.info --relay_dir=/data/mysql/ --slave_pass=xxx
Sun Jun 3 18:12:32 2018 - [info] Connecting to root@10.0.1.49(10.0.1.49:22)..
Checking slave recovery environment settings..
Opening /data/mysql/relay-log.info ... ok.
Relay log found at /data/mysql, up to relay-bin.000004
Temporary relay log file is /data/mysql/relay-bin.000004
Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.
done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Sun Jun 3 18:12:32 2018 - [info] Slaves settings check done.
Sun Jun 3 18:12:32 2018 - [info]
10.10.0.98 (current master)
+--10.10.240.117
+--10.0.1.49
Sun Jun 3 18:12:32 2018 - [info] Checking master_ip_failover_script status:
Sun Jun 3 18:12:32 2018 - [info] /data/masterha/master_ip_failover --command=status --ssh_user=root --orig_master_host=10.10.0.98 --orig_master_ip=10.10.0.98 --orig_master_port=3306
IN SCRIPT TEST====/sbin/ifconfig eth0:1 down==/sbin/ifconfig eth0:1 10.0.1.161/24===
Checking the Status of the script.. OK
ssh: Could not resolve hostname cluster1: Name or service not known
Sun Jun 3 18:12:32 2018 - [info] OK.
Sun Jun 3 18:12:32 2018 - [warning] shutdown_script is not defined.
Sun Jun 3 18:12:32 2018 - [info] Set master ping interval 1 seconds.
Sun Jun 3 18:12:32 2018 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes.
Sun Jun 3 18:12:32 2018 - [info] Starting ping health check on 10.10.0.98(10.10.0.98:3306)..
Sun Jun 3 18:12:32 2018 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
- 可以运行masterha_master_switch来测试
[root@linuea-VM-Node145 /data/masterha]# masterha_master_switch --conf=/etc/masterha_default.cnf --master_state=alive
Sun Jun 3 18:05:47 2018 - [info] MHA::MasterRotate version 0.56.
Sun Jun 3 18:05:47 2018 - [info] Starting online master switch..
Sun Jun 3 18:05:47 2018 - [info]
Sun Jun 3 18:05:47 2018 - [info] * Phase 1: Configuration Check Phase..
Sun Jun 3 18:05:47 2018 - [info]
Sun Jun 3 18:05:47 2018 - [info] Reading default configuratoins from /etc/masterha_default.cnf..
Sun Jun 3 18:05:47 2018 - [info] Reading application default configurations from /etc/masterha_default.cnf..
Sun Jun 3 18:05:47 2018 - [info] Reading server configurations from /etc/masterha_default.cnf..
Sun Jun 3 18:05:48 2018 - [info] Current Alive Master: 10.10.240.117(10.10.240.117:3306)
Sun Jun 3 18:05:48 2018 - [info] Alive Slaves:
Sun Jun 3 18:05:48 2018 - [info] 10.10.0.98(10.10.0.98:3306) Version=5.6.40-log (oldest major version between slaves) log-bin:enabled
Sun Jun 3 18:05:48 2018 - [info] Replicating from 10.10.240.117(10.10.240.117:3306)
Sun Jun 3 18:05:48 2018 - [info] Primary candidate for the new Master (candidate_master is set)
Sun Jun 3 18:05:48 2018 - [info] 10.0.1.49(10.0.1.49:3306) Version=5.6.40-log (oldest major version between slaves) log-bin:enabled
Sun Jun 3 18:05:48 2018 - [info] Replicating from 10.10.240.117(10.10.240.117:3306)
Sun Jun 3 18:05:48 2018 - [info] Not candidate for the new Master (no_master is set)
It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on 10.10.240.117(10.10.240.117:3306)? (YES/no): yes
Sun Jun 3 18:05:50 2018 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..
Sun Jun 3 18:05:50 2018 - [info] ok.
Sun Jun 3 18:05:50 2018 - [info] Checking MHA is not monitoring or doing failover..
Sun Jun 3 18:05:50 2018 - [info] Checking replication health on 10.10.0.98..
Sun Jun 3 18:05:50 2018 - [info] ok.
Sun Jun 3 18:05:50 2018 - [info] Checking replication health on 10.0.1.49..
Sun Jun 3 18:05:50 2018 - [info] ok.
Sun Jun 3 18:05:50 2018 - [info] Searching new master from slaves..
Sun Jun 3 18:05:50 2018 - [info] Candidate masters from the configuration file:
Sun Jun 3 18:05:50 2018 - [info] 10.10.240.117(10.10.240.117:3306) Version=5.6.40-log log-bin:enabled
Sun Jun 3 18:05:50 2018 - [info] 10.10.0.98(10.10.0.98:3306) Version=5.6.40-log (oldest major version between slaves) log-bin:enabled
Sun Jun 3 18:05:50 2018 - [info] Replicating from 10.10.240.117(10.10.240.117:3306)
Sun Jun 3 18:05:50 2018 - [info] Primary candidate for the new Master (candidate_master is set)
Sun Jun 3 18:05:50 2018 - [info] Non-candidate masters:
Sun Jun 3 18:05:50 2018 - [info] 10.0.1.49(10.0.1.49:3306) Version=5.6.40-log (oldest major version between slaves) log-bin:enabled
Sun Jun 3 18:05:50 2018 - [info] Replicating from 10.10.240.117(10.10.240.117:3306)
Sun Jun 3 18:05:50 2018 - [info] Not candidate for the new Master (no_master is set)
Sun Jun 3 18:05:50 2018 - [info] Searching from candidate_master slaves which have received the latest relay log events..
Sun Jun 3 18:05:50 2018 - [info]
From:
10.10.240.117 (current master)
+--10.10.0.98
+--10.0.1.49
To:
10.10.0.98 (new master)
+--10.0.1.49
Starting master switch from 10.10.240.117(10.10.240.117:3306) to 10.10.0.98(10.10.0.98:3306)? (yes/NO): yes
Sun Jun 3 18:05:53 2018 - [info] Checking whether 10.10.0.98(10.10.0.98:3306) is ok for the new master..
Sun Jun 3 18:05:53 2018 - [info] ok.
Sun Jun 3 18:05:53 2018 - [info] ** Phase 1: Configuration Check Phase completed.
Sun Jun 3 18:05:53 2018 - [info]
Sun Jun 3 18:05:53 2018 - [info] * Phase 2: Rejecting updates Phase..
Sun Jun 3 18:05:53 2018 - [info]
Sun Jun 3 18:05:53 2018 - [info] Executing master ip online change script to disable write on the current master:
Sun Jun 3 18:05:53 2018 - [info] /data/masterha/master_ip_online_change_script --command=stop --orig_master_host=10.10.240.117 --orig_master_ip=10.10.240.117 --orig_master_port=3306 --orig_master_user='mha' --orig_master_password='password_123' --new_master_host=10.10.0.98 --new_master_ip=10.10.0.98 --new_master_port=3306 --new_master_user='mha' --new_master_password='password_123' --orig_master_ssh_user=root --new_master_ssh_user=root
Unknown option: orig_master_user
Unknown option: orig_master_password
Unknown option: new_master_user
Unknown option: new_master_password
Unknown option: orig_master_ssh_user
Unknown option: new_master_ssh_user
IN SCRIPT TEST====/sbin/ifconfig eth0:1 down==/sbin/ifconfig eth0:1 10.0.1.161/24===
Disabling the VIP on old master: 10.10.240.117
Sun Jun 3 18:05:53 2018 - [info] ok.
Sun Jun 3 18:05:53 2018 - [info] Locking all tables on the orig master to reject updates from everybody (including root):
Sun Jun 3 18:05:53 2018 - [info] Executing FLUSH TABLES WITH READ LOCK..
Sun Jun 3 18:05:53 2018 - [info] ok.
Sun Jun 3 18:05:53 2018 - [info] Orig master binlog:pos is mysql-bin.000002:120.
Sun Jun 3 18:05:53 2018 - [info] Waiting to execute all relay logs on 10.10.0.98(10.10.0.98:3306)..
Sun Jun 3 18:05:53 2018 - [info] master_pos_wait(mysql-bin.000002:120) completed on 10.10.0.98(10.10.0.98:3306). Executed 0 events.
Sun Jun 3 18:05:53 2018 - [info] done.
Sun Jun 3 18:05:53 2018 - [info] Getting new master's binlog name and position..
Sun Jun 3 18:05:53 2018 - [info] mysql-bin.000001:637
Sun Jun 3 18:05:53 2018 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='10.10.0.98', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000001', MASTER_LOG_POS=637, MASTER_USER='slave', MASTER_PASSWORD='xxx';
Sun Jun 3 18:05:53 2018 - [info] Executing master ip online change script to allow write on the new master:
Sun Jun 3 18:05:53 2018 - [info] /data/masterha/master_ip_online_change_script --command=start --orig_master_host=10.10.240.117 --orig_master_ip=10.10.240.117 --orig_master_port=3306 --orig_master_user='mha' --orig_master_password='password_123' --new_master_host=10.10.0.98 --new_master_ip=10.10.0.98 --new_master_port=3306 --new_master_user='mha' --new_master_password='password_123' --orig_master_ssh_user=root --new_master_ssh_user=root
Unknown option: orig_master_user
Unknown option: orig_master_password
Unknown option: new_master_user
Unknown option: new_master_password
Unknown option: orig_master_ssh_user
Unknown option: new_master_ssh_user
IN SCRIPT TEST====/sbin/ifconfig eth0:1 down==/sbin/ifconfig eth0:1 10.0.1.161/24===
Enabling the VIP - 10.0.1.161/24 on the new master - 10.10.0.98
Sun Jun 3 18:05:53 2018 - [info] ok.
Sun Jun 3 18:05:53 2018 - [info]
Sun Jun 3 18:05:53 2018 - [info] * Switching slaves in parallel..
Sun Jun 3 18:05:53 2018 - [info]
Sun Jun 3 18:05:53 2018 - [info] -- Slave switch on host 10.0.1.49(10.0.1.49:3306) started, pid: 3835
Sun Jun 3 18:05:53 2018 - [info]
Sun Jun 3 18:05:54 2018 - [info] Log messages from 10.0.1.49 ...
Sun Jun 3 18:05:54 2018 - [info]
Sun Jun 3 18:05:53 2018 - [info] Waiting to execute all relay logs on 10.0.1.49(10.0.1.49:3306)..
Sun Jun 3 18:05:53 2018 - [info] master_pos_wait(mysql-bin.000002:120) completed on 10.0.1.49(10.0.1.49:3306). Executed 0 events.
Sun Jun 3 18:05:53 2018 - [info] done.
Sun Jun 3 18:05:53 2018 - [info] Resetting slave 10.0.1.49(10.0.1.49:3306) and starting replication from the new master 10.10.0.98(10.10.0.98:3306)..
Sun Jun 3 18:05:53 2018 - [info] Executed CHANGE MASTER.
Sun Jun 3 18:05:53 2018 - [info] Slave started.
Sun Jun 3 18:05:54 2018 - [info] End of log messages from 10.0.1.49 ...
Sun Jun 3 18:05:54 2018 - [info]
Sun Jun 3 18:05:54 2018 - [info] -- Slave switch on host 10.0.1.49(10.0.1.49:3306) succeeded.
Sun Jun 3 18:05:54 2018 - [info] Unlocking all tables on the orig master:
Sun Jun 3 18:05:54 2018 - [info] Executing UNLOCK TABLES..
Sun Jun 3 18:05:54 2018 - [info] ok.
Sun Jun 3 18:05:54 2018 - [info] All new slave servers switched successfully.
Sun Jun 3 18:05:54 2018 - [info]
Sun Jun 3 18:05:54 2018 - [info] * Phase 5: New master cleanup phase..
Sun Jun 3 18:05:54 2018 - [info]
Sun Jun 3 18:05:54 2018 - [info] 10.10.0.98: Resetting slave info succeeded.
Sun Jun 3 18:05:54 2018 - [info] Switching master to 10.10.0.98(10.10.0.98:3306) completed successfully.
如果没有报错,我们需要重新组成mha,因为运行masterha_master_switch已经使mha主节点(10.10.240.117)发生改变,将剔除的机器重新同步主从即可,也就是说原来的主节点(10.10.240.117)已经手动切换到候选节点10.10.0.98了,10.10.240.117需要重新同步加入主从
- 模拟故障
现在主节点是10.10.0.98,关闭master(10.10.0.98)的mysql,模拟自动切换,脚本在上面已经有
注意:现在已经将10.10.240.117组成一主两从了,配置中候选节点分别是10.10.240.117和10.10.0.98,现在关闭98,主节点将会切换到10.10.240.117,并且vip也会随之偏移
[root@Linuxea-Node98 ~/mha4mysql-node-0.56]# /etc/init.d/mysqld stop
Shutting down MySQL..... SUCCESS!
观察日志(日志贴的不完整):
Sun Jun 3 18:21:58 2018 - [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away)
Sun Jun 3 18:21:58 2018 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql --output_file=/data/masterha/save_binary_logs_test --manager_version=0.56 --binlog_prefix=mysql-bin
Sun Jun 3 18:21:58 2018 - [info] HealthCheck: SSH to 10.10.0.98 is reachable.
Sun Jun 3 18:21:59 2018 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.10.0.98' (111))
Sun Jun 3 18:21:59 2018 - [warning] Connection failed 1 time(s)..
Sun Jun 3 18:22:00 2018 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.10.0.98' (111))
Sun Jun 3 18:22:00 2018 - [warning] Connection failed 2 time(s)..
Sun Jun 3 18:22:01 2018 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.10.0.98' (111))
Sun Jun 3 18:22:01 2018 - [warning] Connection failed 3 time(s)..
Sun Jun 3 18:22:01 2018 - [warning] Master is not reachable from health checker!
Sun Jun 3 18:22:01 2018 - [warning] Master 10.10.0.98(10.10.0.98:3306) is not reachable!
Sun Jun 3 18:22:01 2018 - [warning] SSH is reachable.
Sun Jun 3 18:22:01 2018 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/masterha_default.cnf again, and trying to connect to all servers to check server status..
Sun Jun 3 18:22:02 2018 - [info] 10.0.1.49(10.0.1.49:3306)
Sun Jun 3 18:22:02 2018 - [info] Alive Slaves:
Sun Jun 3 18:22:02 2018 - [info] 10.10.240.117(10.10.240.117:3306) Version=5.6.40-log (oldest major version between slaves) log-bin:enabled
Sun Jun 3 18:22:02 2018 - [info] Replicating from 10.10.0.98(10.10.0.98:3306)
Sun Jun 3 18:22:02 2018 - [info] Primary candidate for the new Master (candidate_master is set)
Sun Jun 3 18:22:02 2018 - [info] 10.0.1.49(10.0.1.49:3306) Version=5.6.40-log (oldest major version between slaves) log-bin:enabled
Sun Jun 3 18:22:02 2018 - [info] Replicating from 10.10.0.98(10.10.0.98:3306)
Sun Jun 3 18:22:02 2018 - [info] Not candidate for the new Master (no_master is set)
Sun Jun 3 18:22:02 2018 - [info] Checking slave configurations..
Sun Jun 3 18:22:02 2018 - [info] read_only=1 is not set on slave 10.10.240.117(10.10.240.117:3306)
Sun Jun 3 18:22:03 2018 - [info] Dead Servers:
Sun Jun 3 18:22:03 2018 - [info] 10.10.0.98(10.10.0.98:3306)
Sun Jun 3 18:22:03 2018 - [info] Checking master reachability via mysql(double check)..
Sun Jun 3 18:22:03 2018 - [info] ok.
Sun Jun 3 18:22:03 2018 - [info] Alive Servers:
Sun Jun 3 18:22:03 2018 - [info] 10.10.240.117(10.10.240.117:3306)
Sun Jun 3 18:22:03 2018 - [info] 10.0.1.49(10.0.1.49:3306)
Sun Jun 3 18:22:03 2018 - [info] Alive Slaves:
Sun Jun 3 18:22:03 2018 - [info] 10.10.240.117(10.10.240.117:3306) Version=5.6.40-log (oldest major version between slaves) log-bin:enabled
Sun Jun 3 18:22:03 2018 - [info] Replicating from 10.10.0.98(10.10.0.98:3306)
Sun Jun 3 18:22:03 2018 - [info] Primary candidate for the new Master (candidate_master is set)
Sun Jun 3 18:22:03 2018 - [info] 10.0.1.49(10.0.1.49:3306) Version=5.6.40-log (oldest major version between slaves) log-bin:enabled
Sun Jun 3 18:22:03 2018 - [info] Replicating from 10.10.0.98(10.10.0.98:3306)
Sun Jun 3 18:22:03 2018 - [info] Not candidate for the new Master (no_master is set)
Sun Jun 3 18:22:03 2018 - [info] ** Phase 1: Configuration Check Phase completed.
Sun Jun 3 18:22:03 2018 - [info]
Sun Jun 3 18:22:03 2018 - [info] * Phase 2: Dead Master Shutdown Phase..
Sun Jun 3 18:22:03 2018 - [info]
Sun Jun 3 18:22:03 2018 - [info] Forcing shutdown so that applications never connect to the current master..
Sun Jun 3 18:22:03 2018 - [info] Executing master IP deactivatation script:
Sun Jun 3 18:22:03 2018 - [info] /data/masterha/master_ip_failover --orig_master_host=10.10.0.98 --orig_master_ip=10.10.0.98 --orig_master_port=3306 --command=stopssh --ssh_user=root
IN SCRIPT TEST====/sbin/ifconfig eth0:1 down==/sbin/ifconfig eth0:1 10.0.1.161/24===
Disabling the VIP on old master: 10.10.0.98
Sun Jun 3 18:22:03 2018 - [info] done
Sun Jun 3 18:22:04 2018 - [info] Not candidate for the new Master (no_master is set)
Sun Jun 3 18:22:04 2018 - [info] Searching from candidate_master slaves which have received the latest relay log events..
Sun Jun 3 18:22:04 2018 - [info] New master is 10.10.240.117(10.10.240.117:3306)
Sun Jun 3 18:22:04 2018 - [info] Starting master failover..
Sun Jun 3 18:22:04 2018 - [info]
From:
10.10.0.98 (current master)
+--10.10.240.117
+--10.0.1.49
To:
10.10.240.117 (new master)
+--10.0.1.49
Sun Jun 3 18:22:04 2018 - [info]
Sun Jun 3 18:22:04 2018 - [info] * Phase 3.3: New Master Diff Log Generation Phase..
Sun Jun 3 18:22:04 2018 - [info] Read_Master_Log_Pos(mysql-bin.000002:120). No need to recover from Exec_Master_Log_Pos.
Sun Jun 3 18:22:04 2018 - [info] Connecting to the target slave host 10.10.240.117, running recover script..
Sun Jun 3 18:22:04 2018 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='mha' --slave_host=10.10.240.117 --slave_ip=10.10.240.117 --slave_port=3306 --apply_files=/data/masterha/saved_master_binlog_from_10.10.0.98_3306_20180603182202.binlog --workdir=/data/masterha --target_version=5.6.40-log --timestamp=20180603182202 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.56 --slave_pass=xxx
Sun Jun 3 18:22:04 2018 - [info]
MySQL client version is 5.6.40. Using --binary-mode.
Applying differential binary/relay log files /data/masterha/saved_master_binlog_from_10.10.0.98_3306_20180603182202.binlog on 10.10.240.117:3306. This may take long time...
Applying log files succeeded.
Sun Jun 3 18:22:04 2018 - [info] All relay logs were successfully applied.
Sun Jun 3 18:22:04 2018 - [info] Getting new master's binlog name and position..
Sun Jun 3 18:22:04 2018 - [info] mysql-bin.000002:532
Sun Jun 3 18:22:04 2018 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='10.10.240.117', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000002', MASTER_LOG_POS=532, MASTER_USER='slave', MASTER_PASSWORD='xxx';
Sun Jun 3 18:22:04 2018 - [info] Executing master IP activate script:
Sun Jun 3 18:22:04 2018 - [info] /data/masterha/master_ip_failover --command=start --ssh_user=root --orig_master_host=10.10.0.98 --orig_master_ip=10.10.0.98 --orig_master_port=3306 --new_master_host=10.10.240.117 --new_master_ip=10.10.240.117 --new_master_port=3306 --new_master_user='mha' --new_master_password='password_123'
Unknown option: new_master_user
Unknown option: new_master_password
IN SCRIPT TEST====/sbin/ifconfig eth0:1 down==/sbin/ifconfig eth0:1 10.0.1.161/24===
Enabling the VIP - 10.0.1.161/24 on the new master - 10.10.240.117
Sun Jun 3 18:22:04 2018 - [info] OK.
Sun Jun 3 18:22:06 2018 - [info] Slave started.
Sun Jun 3 18:22:06 2018 - [info] End of log messages from 10.0.1.49.
Sun Jun 3 18:22:06 2018 - [info] Master failover to 10.10.240.117(10.10.240.117:3306) completed successfully.
Sun Jun 3 18:22:06 2018 - [info] Deleted server2 entry from /etc/masterha_default.cnf .
Sun Jun 3 18:22:06 2018 - [info]
----- Failover Report -----
masterha_default: MySQL Master failover 10.10.0.98 to 10.10.240.117 succeeded
Master 10.10.0.98 is down!
Check MHA Manager logs at DS-VM-Node145.cluster.com:/data/masterha/manager.log for details.
Started automated(non-interactive) failover.
Invalidated master IP address on 10.10.0.98.
The latest slave 10.10.240.117(10.10.240.117:3306) has all relay logs for recovery.
Selected 10.10.240.117 as a new master.
10.10.240.117: OK: Applying all logs succeeded.
10.10.240.117: OK: Activated master IP address.
10.0.1.49: This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
10.0.1.49: OK: Applying all logs succeeded. Slave started, replicating from 10.10.240.117.
10.10.240.117: Resetting slave info succeeded.
Master failover to 10.10.240.117(10.10.240.117:3306) completed successfully.
如果配置没有问题,vip会如愿飘到配置文件中的备用master(10.10.240.117)上,而主从就剩下10.10.240.117(主),10.0.1.49(从)
[root@Linuxea-Node117 ~]# ip a|grep eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
inet 10.10.240.117/8 brd 10.255.255.255 scope global dynamic eth0
inet 10.0.1.161/24 brd 10.0.1.255 scope global eth0:0
现在只有10.10.240.117(master)和10.0.1.49(slave),并且mha节点的配置文件已经发生变更,默认会将关闭的节点,或者说是宕机的mysql节点配置会被剔除,大概如下(如果恢复需要将配置加回来):
[server default]
manager_log=/data/masterha/manager.log
manager_workdir=/data/masterha
master_binlog_dir=/data/mysql
master_ip_failover_script=/data/masterha/master_ip_failover
master_ip_online_change_script=/data/masterha/master_ip_online_change_script
password=password_123
ping_interval=1
remote_workdir=/data/masterha
repl_password=password
repl_user=slave
ssh_user=root
user=mha
[server1]
candidate_master=1
check_repl_delay=0
hostname=10.10.240.117
master_binlog_dir=/data/mysql
port=3306
[server3]
hostname=10.0.1.49
master_binlog_dir=/data/mysql
no_master=1
port=3306
- 恢复节点
删除mha上的/data/masterha/masterha_default.failover.complete
6.恢复节点
现在候选节点已经接替原有的master,原有的(10.10.0.98)master以从节点的角色上线即可(总之需要重新加入到主从中),也就是说将10.10.0.98重新同步主节点
mysql> reset slave;
Query OK, 0 rows affected (0.00 sec)
mysql> CHANGE MASTER TO
-> MASTER_HOST='10.10.240.117',
-> MASTER_USER='slave',
-> MASTER_PASSWORD='password',
-> MASTER_PORT=3306,
-> MASTER_LOG_FILE='mysql-bin.000002',
-> MASTER_LOG_POS=1653,
-> MASTER_CONNECT_RETRY=20;
Query OK, 0 rows affected, 2 warnings (0.01 sec)
mysql> start slave;
Query OK, 0 rows affected (0.00 sec)
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 10.10.240.117
Master_User: slave
Master_Port: 3306
Connect_Retry: 20
Master_Log_File: mysql-bin.000002
Read_Master_Log_Pos: 1653
Relay_Log_File: relay-bin.000002
Relay_Log_Pos: 283
Relay_Master_Log_File: mysql-bin.000002
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
注意:故障后配置文件/etc/masterha_default.cnf已经变更,需要添加原主节点的信息(10.10.0.98),因为在切换的过程中配置将被删除
将配置添加
[server2]
candidate_master=1
check_repl_delay=0
hostname=10.10.0.98
master_binlog_dir=/data/mysql
port=3306
并且此IP地址是10.10.240.117,并不是VIP地址,VIP地址只会随着脚本飘逸供应用使用(也就是会飘逸到主节点)
评论 (0)