Redis 集群Sentinel哨兵模式

释放双眼,带上耳机,听听看~!
哨兵模式其实就是主从架构的升级版,主从架构为每个节点持有全量数据,并且数据保持一致。但是主从架构无法进行主从切换,哨兵模式解决主从节点切换问题。
🤖 由 ChatGPT 生成的文章摘要

Redis Sentinel 是集群的高可用的保障,为避免 Sentinel 发生意外,它一般是由 3~5 个节点组成,这样就算挂了个别节点,该集群仍然可以正常运转

哨兵模式和主从模式非常的像,搭建步骤可以参考下主从模式

Redis 二进制主从搭建及原理

哨兵模式架构图

Redis 集群Sentinel哨兵模式

哨兵模式优点

1) 主观下线
主观下线,适用于主服务器和从服务器。如果在规定的时间内(配置参数:down-after-milliseconds),Sentinel 节点没有收到目标服务器的有效回复,则判定该服务器为“主观下线”。比如 Sentinel1 向主服务发送了PING命令,在规定时间内没收到主服务器PONG回复,则 Sentinel1 判定主服务器为“主观下线”。
2) 客观下线
客观下线,只适用于主服务器。 Sentinel1 发现主服务器出现了故障,它会通过相应的命令,询问其它 Sentinel 节点对主服务器的状态判断。如果超过半数以上的 Sentinel 节点认为主服务器 down 掉,则 Sentinel1 节点判定主服务为“客观下线”。
3) 投票选举
投票选举,所有 Sentinel 节点会通过投票机制,按照谁发现谁去处理的原则,选举 Sentinel1 为领头节点去做 Failover(故障转移)操作。Sentinel1 节点则按照一定的规则在所有从节点中选择一个最优的作为主服务器,然后通过发布订功能通知其余的从节点(slave)更改配置文件,跟随新上任的主服务器(master)。至此就完成了主从切换的操作。

总结: 哨兵模式其实就是主从架构的升级版,主从架构为每个节点持有全量数据,并且数据保持一致。但是主从架构无法进行主从切换,哨兵模式解决主从节点切换问题。

哨兵模式搭建

使用哨兵节点,需要关闭slave只读,关闭参数如下

replica-read-only no
slave-read-only no

因为从节点还需要变更为master节点,同样需要设置主从的密码

protected-mode no
masterauth "123123"
requirepass "123123"

完整redis Master配置文件如下:

bind 0.0.0.0

protected-mode no
masterauth "123123"
requirepass "123123"

port 6379
tcp-backlog 511
timeout 0
tcp-keepalive 300
daemonize yes
pidfile "/var/run/redis_6379.pid"
loglevel notice
logfile "/opt/redis.log"
databases 16
always-show-logo no
set-proc-title yes
proc-title-template "{title} {listen-addr} {server-mode}"
stop-writes-on-bgsave-error yes
rdbcompression yes
rdbchecksum yes
dbfilename "dump.rdb"
rdb-del-sync-files no
replica-serve-stale-data yes
replica-read-only no
repl-diskless-sync no
repl-diskless-sync-delay 5
repl-diskless-sync-max-replicas 0
repl-diskless-load disabled
repl-disable-tcp-nodelay yes
replica-priority 100
acllog-max-len 128
lazyfree-lazy-eviction no
lazyfree-lazy-expire no
lazyfree-lazy-server-del no
replica-lazy-flush no
lazyfree-lazy-user-del no
lazyfree-lazy-user-flush no
oom-score-adj no
oom-score-adj-values 0 200 800
disable-thp yes

#aof持久化相关配置
appendonly yes
appendfilename "appendonly.aof"
appenddirname "appendonlydir"

#存储目录
dir "/opt"

#同步策略
appendfsync always

no-appendfsync-on-rewrite no
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
aof-load-truncated yes
aof-use-rdb-preamble yes
aof-timestamp-enabled no

slowlog-log-slower-than 10000
slowlog-max-len 128
latency-monitor-threshold 0
notify-keyspace-events ""
hash-max-listpack-entries 512
hash-max-listpack-value 64
list-max-listpack-size -2
list-compress-depth 0
set-max-intset-entries 512
zset-max-listpack-entries 128
zset-max-listpack-value 64
hll-sparse-max-bytes 3000
stream-node-max-bytes 4kb
stream-node-max-entries 100
activerehashing yes
client-output-buffer-limit normal 0 0 0
client-output-buffer-limit replica 256mb 64mb 60
client-output-buffer-limit pubsub 32mb 8mb 60
hz 10
dynamic-hz yes
aof-rewrite-incremental-fsync yes
rdb-save-incremental-fsync yes
jemalloc-bg-thread yes

min-replicas-to-write 0
min-replicas-max-lag 10

完整redis slave节点配置文件如下

[root@web02 redis-7.0.8]# cat redis.conf
bind 0.0.0.0
replicaof 192.168.31.70 6379
protected-mode no
masterauth "123123"
requirepass "123123"

port 6379
tcp-backlog 511
timeout 0
tcp-keepalive 300
daemonize yes
pidfile "/var/run/redis_6379.pid"
loglevel notice
logfile "/opt/redis.log"
databases 16
always-show-logo no
set-proc-title yes
proc-title-template "{title} {listen-addr} {server-mode}"
stop-writes-on-bgsave-error yes
rdbcompression yes
rdbchecksum yes
dbfilename "dump.rdb"
rdb-del-sync-files no
replica-serve-stale-data yes
replica-read-only no
repl-diskless-sync no
repl-diskless-sync-delay 5
repl-diskless-sync-max-replicas 0
repl-diskless-load disabled
repl-disable-tcp-nodelay yes
replica-priority 100
acllog-max-len 128
lazyfree-lazy-eviction no
lazyfree-lazy-expire no
lazyfree-lazy-server-del no
replica-lazy-flush no
lazyfree-lazy-user-del no
lazyfree-lazy-user-flush no
oom-score-adj no
oom-score-adj-values 0 200 800
disable-thp yes

#aof持久化相关配置
appendonly yes
appendfilename "appendonly.aof"
appenddirname "appendonlydir"

#存储目录
dir "/opt"

#同步策略
appendfsync always

no-appendfsync-on-rewrite no
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
aof-load-truncated yes
aof-use-rdb-preamble yes
aof-timestamp-enabled no

slowlog-log-slower-than 10000
slowlog-max-len 128
latency-monitor-threshold 0
notify-keyspace-events ""
hash-max-listpack-entries 512
hash-max-listpack-value 64
list-max-listpack-size -2
list-compress-depth 0
set-max-intset-entries 512
zset-max-listpack-entries 128
zset-max-listpack-value 64
hll-sparse-max-bytes 3000
stream-node-max-bytes 4kb
stream-node-max-entries 100
activerehashing yes
client-output-buffer-limit normal 0 0 0
client-output-buffer-limit replica 256mb 64mb 60
client-output-buffer-limit pubsub 32mb 8mb 60
hz 10
dynamic-hz yes
aof-rewrite-incremental-fsync yes
rdb-save-incremental-fsync yes
jemalloc-bg-thread yes

min-replicas-to-write 0
min-replicas-max-lag 10

创建哨兵集群配置文件

cat >>sentinel-26379.conf<<EOF
port 26379
protected-mode no
daemonize yes
dir "/opt"
logfile "sentinel-26379.log"
# 配置监听的主服务器,这里sentinel monitor代表监控,mymaster代表服务器的名称,可以自定义,192.168.31.70代表监控的主服务器,6379代表端口,2代表只有两个或两个以上的哨兵认为主服务器不可用的时候,才会进行failover操作。
sentinel monitor mymaster 192.168.31.70 6379 2
# 判断主节点时间
sentinel down-after-milliseconds mymaster 10000

sentinel failover-timeout mymaster 10000
# redis pssword
sentinel auth-pass mymaster 123123
EOF

启动哨兵模式

  • web01 手动设置为redis master节点
  • web02 slave01节点
  • web03 slave03节点
#启动服务
[root@web01 redis-7.0.8]# redis-sentinel sentinel-26379.conf 

#查看服务日志,下面可以看到哨兵已经监控redis slave节点集群
[root@web01 redis-7.0.8]# redis-sentinel sentinel-26379.conf 
[root@web01 redis-7.0.8]# tail -f /opt/sentinel-26379.log 
24598:X 17 Mar 2023 02:07:42.993 * Increased maximum number of open files to 10032 (it was originally set to 1024).
24598:X 17 Mar 2023 02:07:42.993 * monotonic clock: POSIX clock_gettime
24598:X 17 Mar 2023 02:07:42.994 * Running mode=sentinel, port=26379.
24598:X 17 Mar 2023 02:07:42.997 * Sentinel new configuration saved on disk
24598:X 17 Mar 2023 02:07:42.997 # Sentinel ID is 4ad3c2761f81d04b2512a1b1b47a4d7b00c9195b
24598:X 17 Mar 2023 02:07:42.997 # +monitor master mymaster 192.168.31.70 6379 quorum 2
24598:X 17 Mar 2023 02:07:42.998 * +slave slave 192.168.31.71:6379 192.168.31.71 6379 @ mymaster 192.168.31.70 6379
24598:X 17 Mar 2023 02:07:42.999 * Sentinel new configuration saved on disk
24598:X 17 Mar 2023 02:07:42.999 * +slave slave 192.168.31.72:6379 192.168.31.72 6379 @ mymaster 192.168.31.70 6379
24598:X 17 Mar 2023 02:07:43.020 * Sentinel new configuration saved on disk

#我们启动其它服务器上的哨兵,日志会提示下面
24598:X 17 Mar 2023 02:08:36.266 * +sentinel sentinel 69d0bf4b6b7d2f56ae31d97b312ab17645c53b2e 192.168.31.71 26379 @ mymaster 192.168.31.70 6379
24598:X 17 Mar 2023 02:08:36.268 * Sentinel new configuration saved on disk
24598:X 17 Mar 2023 02:08:37.783 * +sentinel sentinel a37ff42df311880c5e37c8a7253d6a0955e18362 192.168.31.72 26379 @ mymaster 192.168.31.70 6379
24598:X 17 Mar 2023 02:08:37.785 * Sentinel new configuration saved on disk

检查Redis sentinel状态

#在集群任意节点连接26379,执行info sentinel 可以看到哨兵集群状态
[root@web01 redis-7.0.8]# redis-cli -p 26379
127.0.0.1:26379> info sentinel
# Sentinel
sentinel_masters:1
sentinel_tilt:0
sentinel_tilt_since_seconds:-1
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
sentinel_simulate_failure_flags:0
master0:name=mymaster,status=ok,address=192.168.31.70:6379,slaves=2,sentinels=3

#name=集群名称
#status=状态
#address master节点
#slaves 节点
#sentinels 哨兵数量

接下来模拟master节点异常

[root@web01 redis-7.0.8]# redis-cli 
127.0.0.1:6379> auth 123123
OK
127.0.0.1:6379> KEYS *
1) "stu"
2) "age"
3) "abc"
127.0.0.1:6379> set now 99
OK
127.0.0.1:6379> KEYS *
1) "stu"
2) "age"
3) "abc"
4) "now"
127.0.0.1:6379> get now
"99"
127.0.0.1:6379> 
127.0.0.1:6379> exit
[root@web01 redis-7.0.8]# ps -ef|grep redis |awk -F "[ ]+" '{print $2}'|xargs kill -9
kill: sending signal to 22566 failed: No such process
[root@web01 redis-7.0.8]# 
[root@web01 redis-7.0.8]# lsof -i:6379

接下来查看各个节点日志

master 节点sentinels日志

当我们停止master后

[root@web01 redis-7.0.8]# ps -ef|grep redis
root     24589     1  0 02:07 ?        00:00:01 redis-server 0.0.0.0:6379
root     24598     1  0 02:07 ?        00:00:01 redis-sentinel *:26379 [sentinel]
root     24616 24518  0 02:14 pts/0    00:00:00 grep --color=auto redis
[root@web01 redis-7.0.8]# kkll -9 24589

slave01节点日志
slave02节点日志

节点会马上进行协商,更换新的master节点

10585:X 17 Mar 2023 02:48:02.357 # +sdown master mymaster 192.168.31.70 6379
10585:X 17 Mar 2023 02:48:02.429 # +odown master mymaster 192.168.31.70 6379 #quorum 2/2
10585:X 17 Mar 2023 02:48:02.429 # +new-epoch 7
10585:X 17 Mar 2023 02:48:02.429 # +try-failover master mymaster 192.168.31.70 6379
10585:X 17 Mar 2023 02:48:02.431 * Sentinel new configuration saved on disk
10585:X 17 Mar 2023 02:48:02.431 # +vote-for-leader c43d0679b6c3843a426044757b5977b3417e92e3 7
10585:X 17 Mar 2023 02:48:02.436 # e3220710aa969eaa9539e2af2c894ec5aa2724c5 voted for c43d0679b6c3843a426044757b5977b3417e92e3 7
10585:X 17 Mar 2023 02:48:02.532 # +elected-leader master mymaster 192.168.31.70 6379
10585:X 17 Mar 2023 02:48:02.532 # +failover-state-select-slave master mymaster 192.168.31.70 6379
10585:X 17 Mar 2023 02:48:02.633 # +selected-slave slave 192.168.31.72:6379 192.168.31.72 6379 @ mymaster 192.168.31.70 6379
10585:X 17 Mar 2023 02:48:02.633 * +failover-state-send-slaveof-noone slave 192.168.31.72:6379 192.168.31.72 6379 @ mymaster 192.168.31.70 6379
10585:X 17 Mar 2023 02:48:02.699 * +failover-state-wait-promotion slave 192.168.31.72:6379 192.168.31.72 6379 @ mymaster 192.168.31.70 6379
10585:X 17 Mar 2023 02:48:03.131 * Sentinel new configuration saved on disk
10585:X 17 Mar 2023 02:48:03.131 # +promoted-slave slave 192.168.31.72:6379 192.168.31.72 6379 @ mymaster 192.168.31.70 6379
10585:X 17 Mar 2023 02:48:03.132 # +failover-state-reconf-slaves master mymaster 192.168.31.70 6379
10585:X 17 Mar 2023 02:48:03.132 * +slave-reconf-sent slave 192.168.31.71:6379 192.168.31.71 6379 @ mymaster 192.168.31.70 6379

10585:X 17 Mar 2023 02:48:03.561 # -odown master mymaster 192.168.31.70 6379

10585:X 17 Mar 2023 02:48:04.025 * +slave-reconf-inprog slave 192.168.31.71:6379 192.168.31.71 6379 @ mymaster 192.168.31.70 6379
10585:X 17 Mar 2023 02:48:04.025 * +slave-reconf-done slave 192.168.31.71:6379 192.168.31.71 6379 @ mymaster 192.168.31.70 6379
10585:X 17 Mar 2023 02:48:04.082 # +failover-end master mymaster 192.168.31.70 6379
10585:X 17 Mar 2023 02:48:04.083 # +switch-master mymaster 192.168.31.70 6379 192.168.31.72 6379
10585:X 17 Mar 2023 02:48:04.083 * +slave slave 192.168.31.71:6379 192.168.31.71 6379 @ mymaster 192.168.31.72 6379
10585:X 17 Mar 2023 02:48:04.083 * +slave slave 192.168.31.70:6379 192.168.31.70 6379 @ mymaster 192.168.31.72 6379

10585:X 17 Mar 2023 02:48:05.273 * Sentinel new configuration saved on disk

10585:X 17 Mar 2023 02:48:14.137 # +sdown slave 192.168.31.70:6379 192.168.31.70 6379 @ mymaster 192.168.31.72 6379

15087:X 17 Mar 2023 02:48:03.205 * Sentinel new configuration saved on disk
15087:X 17 Mar 2023 02:48:03.205 # +new-epoch 7
15087:X 17 Mar 2023 02:48:03.205 # +config-update-from sentinel c43d0679b6c3843a426044757b5977b3417e92e3 192.168.31.71 26379 @ mymaster 192.168.31.71 6379
15087:X 17 Mar 2023 02:48:03.205 # +switch-master mymaster 192.168.31.71 6379 192.168.31.72 6379
15087:X 17 Mar 2023 02:48:03.206 * +slave slave 192.168.31.71:6379 192.168.31.71 6379 @ mymaster 192.168.31.72 6379
15087:X 17 Mar 2023 02:48:03.209 * Sentinel new configuration saved on disk

检查目前集群信息

#首先看一下redis 集群master状态,并且查看数据是否可以正常同步
[root@web-03 redis-7.0.8]#  redis-cli -p 6379
127.0.0.1:6379> auth 123123
OK
127.0.0.1:6379> KEYS *
1) "age"
2) "stu"
3) "web01"
4) "abcdocker"
5) "abc"
127.0.0.1:6379> info replication 
# Replication
role:master
connected_slaves:1
slave0:ip=192.168.31.71,port=6379,state=online,offset=69649,lag=0  #可以看到还有一台节点
master_failover_state:no-failover
master_replid:58de236ec22aaf91f13c9c71369afdb4cde0c520
master_replid2:72bef762f6d74a84931423a4f0d4d018a3d041d3
master_repl_offset:69649
second_repl_offset:37410
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:337
repl_backlog_histlen:69313

我们添加一条redis数据,检查71节点是否可以看到

#web03新master添加redis 数据

127.0.0.1:6379> set web03 ok
OK
127.0.0.1:6379> get web03
"ok"
127.0.0.1:6379> get web01
"ok1"
127.0.0.1:6379> 

到web02节点(71)查看是否有数据

[root@web02 redis-7.0.8]# redis-cli -p 6379
127.0.0.1:6379> auth 123123
OK
127.0.0.1:6379> get web03
"ok"
127.0.0.1:6379> get web01
"ok1"
127.0.0.1:6379> KEYS *
1) "abc"
2) "age"
3) "abcdocker"
4) "web03"
5) "web01"
6) "stu"
127.0.0.1:6379> 

#目测数据读取一切正常

进入哨兵集群,检查下集群信息是否记录正确

[root@web01 redis-7.0.8]# redis-cli -p 26379
127.0.0.1:26379> info sentinel
# Sentinel
sentinel_masters:1
sentinel_tilt:0
sentinel_tilt_since_seconds:-1
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
sentinel_simulate_failure_flags:0
master0:name=mymaster,status=ok,address=192.168.31.72:6379,slaves=2,sentinels=3

在redis日志中,我们可以看到哨兵模式已经将我们把redis master节点替换了

10575:S 17 Mar 2023 02:47:58.730 # Error condition on socket for SYNC: Connection refused
10575:S 17 Mar 2023 02:47:59.735 * Connecting to MASTER 192.168.31.70:6379
10575:S 17 Mar 2023 02:47:59.735 * MASTER <-> REPLICA sync started
10575:S 17 Mar 2023 02:47:59.735 # Error condition on socket for SYNC: Connection refused
10575:S 17 Mar 2023 02:48:00.742 * Connecting to MASTER 192.168.31.70:6379
10575:S 17 Mar 2023 02:48:00.742 * MASTER <-> REPLICA sync started
10575:S 17 Mar 2023 02:48:00.742 # Error condition on socket for SYNC: Connection refused
10575:S 17 Mar 2023 02:48:01.747 * Connecting to MASTER 192.168.31.70:6379
10575:S 17 Mar 2023 02:48:01.747 * MASTER <-> REPLICA sync started
10575:S 17 Mar 2023 02:48:01.747 # Error condition on socket for SYNC: Connection refused
10575:S 17 Mar 2023 02:48:02.752 * Connecting to MASTER 192.168.31.70:6379
10575:S 17 Mar 2023 02:48:02.752 * MASTER <-> REPLICA sync started
10575:S 17 Mar 2023 02:48:02.752 # Error condition on socket for SYNC: Connection refused
10575:S 17 Mar 2023 02:48:03.132 * Connecting to MASTER 192.168.31.72:6379
10575:S 17 Mar 2023 02:48:03.132 * MASTER <-> REPLICA sync started
10575:S 17 Mar 2023 02:48:03.132 * REPLICAOF 192.168.31.72:6379 enabled (user request from 'id=7 addr=192.168.31.71:51772 laddr=192.168.31.71:6379 fd=11 name=sentinel-c43d0679-cmd age=345 idle=0 flags=x db=0 sub=0 psub=0 ssub=0 multi=4 qbuf=342 qbuf-free=20132 argv-mem=4 multi-mem=181 rbs=8192 rbp=5482 obl=45 oll=0 omem=0 tot-mem=29745 events=r cmd=exec user=default redir=-1 resp=2')
10575:S 17 Mar 2023 02:48:03.148 # CONFIG REWRITE executed with success.
10575:S 17 Mar 2023 02:48:03.148 * Non blocking connect for SYNC fired the event.
10575:S 17 Mar 2023 02:48:03.149 * Master replied to PING, replication can continue...
10575:S 17 Mar 2023 02:48:03.149 * Trying a partial resynchronization (request 72bef762f6d74a84931423a4f0d4d018a3d041d3:37410).
10575:S 17 Mar 2023 02:48:03.149 * Successful partial resynchronization with master.
10575:S 17 Mar 2023 02:48:03.149 # Master replication ID changed to 58de236ec22aaf91f13c9c71369afdb4cde0c520
10575:S 17 Mar 2023 02:48:03.149 * MASTER <-> REPLICA sync: Master accepted a Partial Resynchronization.

#10秒后发现无法写入数据,redis 哨兵就进行节点更换,更换master节点

恢复

当我们节点down后,master节点恢复,也不会马上调度回去

[root@web01 redis-7.0.8]# redis-server redis.conf

在web03节点上看,master节点还是在web03上

[root@web-03 ~]#  redis-cli -p 6379
127.0.0.1:6379> auth 123123
OK
127.0.0.1:6379> info replication 
# Replication
role:master
connected_slaves:2
slave0:ip=192.168.31.71,port=6379,state=online,offset=1315422,lag=0
slave1:ip=192.168.31.70,port=6379,state=online,offset=1315422,lag=0  #恢复的节点变成slave
master_failover_state:no-failover
master_replid:58de236ec22aaf91f13c9c71369afdb4cde0c520
master_replid2:72bef762f6d74a84931423a4f0d4d018a3d041d3
master_repl_offset:1315436
second_repl_offset:37410
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:266057
repl_backlog_histlen:1049380

给TA打赏
共{{data.count}}人
人已打赏
Redis报错锦集

Redis Sentinel哨兵模式停止Master节点后无法调度Slave

2023-3-16 18:39:47

Redis报错锦集

Redis Cluster集群连接错误RedisCommandTimeoutException: Command timed out after 6 second(s)

2023-3-30 17:20:12

0 条回复 A文章作者 M管理员
    暂无讨论,说说你的看法吧
个人中心
购物车
优惠劵
今日签到
有新私信 私信列表
搜索