Redis Cluster集群hash tag原理分析

工欲善其事必先利其器,在正式开始研究redis cluster hash tag之前,我们先以最小的成本搭建一套cluster集群。

docker-compose搭建redis cluster

这里使用dockerhub的bitnami/redis-cluster进行搭建,过程相比网上其他教程而言非常简单,2个shell命令搞定。

1
2
curl -sSL https://raw.githubusercontent.com/bitnami/containers/main/bitnami/redis-cluster/docker-compose.yml > docker-compose.yml
docker-compose up -d

bitnami redis-cluster 对应的 docker-compose.yml (笔者做了细微的修改)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
version: '3'
services:
redis-node-0:
image: docker.io/bitnami/redis-cluster:7.0
container_name: node-0
volumes:
- redis-cluster_data-0:/bitnami/redis/data
environment:
- 'REDIS_PASSWORD=123'
- 'REDIS_NODES=redis-node-0 redis-node-1 redis-node-2 redis-node-3 redis-node-4 redis-node-5'

redis-node-1:
image: docker.io/bitnami/redis-cluster:7.0
container_name: node-1
volumes:
- redis-cluster_data-1:/bitnami/redis/data
environment:
- 'REDIS_PASSWORD=123'
- 'REDIS_NODES=redis-node-0 redis-node-1 redis-node-2 redis-node-3 redis-node-4 redis-node-5'

redis-node-2:
image: docker.io/bitnami/redis-cluster:7.0
container_name: node-2
volumes:
- redis-cluster_data-2:/bitnami/redis/data
environment:
- 'REDIS_PASSWORD=123'
- 'REDIS_NODES=redis-node-0 redis-node-1 redis-node-2 redis-node-3 redis-node-4 redis-node-5'

redis-node-3:
image: docker.io/bitnami/redis-cluster:7.0
container_name: node-3
volumes:
- redis-cluster_data-3:/bitnami/redis/data
environment:
- 'REDIS_PASSWORD=123'
- 'REDIS_NODES=redis-node-0 redis-node-1 redis-node-2 redis-node-3 redis-node-4 redis-node-5'

redis-node-4:
image: docker.io/bitnami/redis-cluster:7.0
container_name: node-4
volumes:
- redis-cluster_data-4:/bitnami/redis/data
environment:
- 'REDIS_PASSWORD=123'
- 'REDIS_NODES=redis-node-0 redis-node-1 redis-node-2 redis-node-3 redis-node-4 redis-node-5'

redis-node-5:
image: docker.io/bitnami/redis-cluster:7.0
container_name: node-5
volumes:
- redis-cluster_data-5:/bitnami/redis/data
depends_on:
- redis-node-0
- redis-node-1
- redis-node-2
- redis-node-3
- redis-node-4
environment:
- 'REDIS_PASSWORD=123'
- 'REDISCLI_AUTH=123'
- 'REDIS_CLUSTER_REPLICAS=1'
- 'REDIS_NODES=redis-node-0 redis-node-1 redis-node-2 redis-node-3 redis-node-4 redis-node-5'
- 'REDIS_CLUSTER_CREATOR=yes'

volumes:
redis-cluster_data-0:
driver: local
redis-cluster_data-1:
driver: local
redis-cluster_data-2:
driver: local
redis-cluster_data-3:
driver: local
redis-cluster_data-4:
driver: local
redis-cluster_data-5:
driver: local

看到如下所示,即可认为redis cluster集群成功启动。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
➜  ivansli$ docker-compose up -d
[+] Running 6/6
⠿ Container node-2 Started 9.2s
⠿ Container node-3 Started 10.6s
⠿ Container node-1 Started 9.0s
⠿ Container node-4 Started 11.4s
⠿ Container node-0 Started 9.8s
⠿ Container node-5 Started 21.5s
➜ ivansli$ docker exec -it node-0 bash
ivansli@17affe689ed6:/$ redis-cli
127.0.0.1:6379> auth 123
OK
127.0.0.1:6379> cluster nodes
9ecc22a32815eb4f769069541279569c36b4b1d9 172.26.0.7:6379@16379 slave cc6e6b899e6f2b90f829aeff9ec36316433258a6 0 1685681651000 2 connected
f36960e21b778e1483237504fbffa2086ede7f4d 172.26.0.4:6379@16379 master - 0 1685681652000 3 connected 10923-16383
cc6e6b899e6f2b90f829aeff9ec36316433258a6 172.26.0.5:6379@16379 master - 0 1685681651976 2 connected 5461-10922
4523ca5ef7bfa6b7a22e39a418e2417204d5f34f 172.26.0.2:6379@16379 slave f36960e21b778e1483237504fbffa2086ede7f4d 0 1685681651000 3 connected
9cc6263d5ea15dcfb8795ce5e19f7844c5958781 172.26.0.3:6379@16379 slave 5bb46d5bb5df18a1977260da6183ec1cef9b8b00 0 1685681652984 1 connected
5bb46d5bb5df18a1977260da6183ec1cef9b8b00 172.26.0.6:6379@16379 myself,master - 0 1685681650000 1 connected 0-5460

可以看到redis cluster集群启动成功,其架构为:3主(master)3从(slave)。

cluster master节点 slot范围
master1 0-5460
master2 5461-10922
master3 10923-16383

集群共有16384(0~16383,2^14个)个slot(哈希槽),每个master节点拥有连续的一段slot。

redis cluster集群key的存储方式

众所周知,对于单台服务器来说,在计算机硬件不变更的情况下其资源空间存在上限,例如:内存、硬盘、网卡带宽、CPU核心等。对于基于内存的redis服务来说,想要存储更多的、超过单机内存容量的数据,那么只能采用集群模式,最常用的就是redis cluster集群。
其数据存储原理简化为公式:CRC16(key) % 16384 = slot,即:取key进行CRC16计算之后对16384取模运算得到key所在的slot,由于redis cluster在启动时会对每一台master节点分配slot空间,那么当前slot的值在哪台master节点的slot空间范围内,key就存储在哪台节点。

通俗的讲:使用一种算法把整个数据空间化整为零分散存储在多个节点 (分布式系统设计中常用策略)。

什么是hash tag

Redis官方对Hash tags的定义如下:

There is an exception for the computation of the hash slot that is used in order to implement hash tags. Hash tags are a way to ensure that multiple keys are allocated in the same hash slot. This is used in order to implement multi-key operations in Redis Cluster.
To implement hash tags, the hash slot for a key is computed in a slightly different way in certain conditions. If the key contains a “{…}” pattern only the substring between { and } is hashed in order to obtain the hash slot. However since it is possible that there are multiple occurrences of { or } the algorithm is well specified by the following rules:

  • IF the key contains a { character.
  • AND IF there is a } character to the right of {.
  • AND IF there are one or more characters between the first occurrence of { and the first occurrence of }.
    Then instead of hashing the key, only what is between the first occurrence of { and the following first occurrence of } is hashed.

hash tags 是用于计算哈希槽时的一个例外,是一种确保多个键分配到同一个哈希槽中的方法。这是为了在Redis集群中实现多键操作而使用的。
为了实现hash tags,在某些情况下,会以稍微不同的方式计算key的哈希槽。如果key只包含”{…}”模式,则仅对{和}之间的子字符串进行散列以获取哈希槽。但由于可能存在多个{或}出现,因此该算法遵循以下规则:

  • 如果key包含字符 {
  • 并且如果 } 字符位于 { 的右侧
  • 并且在第一个 { 和第一个 } 之间存在一个或多个字符

对于符合上述规则的key,则不会对整个key进行散列处理,而只会对第一次出现 { 和随后第一次出现 } 之间的内容进行散列。否则,对整个key进行散列处理。

为什么使用hash tag

不使用hash tag批量获取不同名称的key

1
2
127.0.0.1:6379> mget name name1 name2 name3
(error) CROSSSLOT Keys in request don't hash to the same slot

显示错误信息:CROSSSLOT 请求中的key没有哈希到同一个插槽。错误在指令执行之前的检查中触发,代码逻辑如下所示:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
// 代码文件:src/server.c

// 接收到client请求,解析并执行指令
int processCommand(client *c) {
// ..... 省略

// 如果是 cluster 集群,则进行一些检查操作
if (server.cluster_enabled &&
!mustObeyClient(c) &&
!(!(c->cmd->flags&CMD_MOVABLE_KEYS) && c->cmd->key_specs_num == 0 &&
c->cmd->proc != execCommand))
{
int error_code;

// 检查当前指令的所有key对应的slot所在的node
clusterNode *n = getNodeByQuery(c,c->cmd,c->argv,c->argc,
&c->slot,&error_code);
// key对应的slot所在的node不存在 或者 某些key对应的slot所在的node不是当前节点
if (n == NULL || n != server.cluster->myself) {
if (c->cmd->proc == execCommand) {
discardTransaction(c);
} else {
flagTransaction(c);
}

// 不再执行指令操作,返回错误信息
// 其中就包括错误:CROSSSLOT Keys in request don't hash to the same slot
clusterRedirectClient(c,n,c->slot,error_code);
c->cmd->rejected_calls++;
return C_OK;
}
}

// ..... 省略
/* 执行指令 */
call(c,CMD_CALL_FULL);
// ..... 省略
}

使用hash tag批量获取不同名称的key

1
2
3
4
5
6
172.26.0.5:6379> mget name {name} {name}1 {name}2 {name}3
1) (nil)
2) (nil)
3) (nil)
4) (nil)
5) (nil)

显示正常(请求中的key被哈希到同一个插槽)。

对上述操作的key计算得出对应slot,整理如下所示:

key 计算key的slot slot值 slot所在节点 是否使用hash tag
name cluster keyslot name 5798 master2
name1 cluster keyslot name1 12933 master3
name2 cluster keyslot name2 742 master1
name3 cluster keyslot name3 4807 master1
{name} cluster keyslot {name} 5798 master2
{name}1 cluster keyslot {name}1 5798 master2
{name}2 cluster keyslot {name}2 5798 master2
{name}3 cluster keyslot {name}3 5798 master2

redis中使用 cluster keyslot key的名称 可以得到key对应的slot值

结合上面例子以及官方对hash tag的描述,想必大家已经基本可以对为什么使用hash tag这个问题得出自己的结论。

假如在开发过程中,遇到Redis中既要存储大量数据,又要让某些相同特征的key(key包含相同字符串)存储在同一个节点的情况。那么,这个时候Redis cluster+hash tag绝对是你的首选。

redis源码中hash tag的计算方式

获取key的hash tag值,主要包含2部分逻辑:

  1. 查找 {} 包含的字符串
  2. 对找到的字符串进行使用crc16()计算

具体代码实现如在所示:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
// 代码文件:src/cluster.c

/* We have 16384 hash slots. The hash slot of a given key is obtained
* as the least significant 14 bits of the crc16 of the key.
*
* However if the key contains the {...} pattern, only the part between
* { and } is hashed. This may be useful in the future to force certain
* keys to be in the same node (assuming no resharding is in progress). */
unsigned int keyHashSlot(char *key, int keylen) {
int s, e; /* start-end indexes of { and } */

// 找到第一个 { 出现的位置
for (s = 0; s < keylen; s++)
if (key[s] == '{') break;

/* No '{' ? Hash the whole key. This is the base case. */
/* 0x3FFF = 16383 = 11111111111111 (14个1),使用 & 运算比取模运算 % 效率更高,并能实现相同效果*/
// 没有找到 { ,则使用整个key计算hash值
if (s == keylen) return crc16(key,keylen) & 0x3FFF;

// 找到第一个 } 出现的位置
/* '{' found? Check if we have the corresponding '}'. */
for (e = s+1; e < keylen; e++)
if (key[e] == '}') break;

/* No '}' or nothing between {} ? Hash the whole key. */
// 没找到 } 或者 {} 包含的字符串为空,则使用整个key计算hash值
if (e == keylen || e == s+1) return crc16(key,keylen) & 0x3FFF;

/* If we are here there is both a { and a } on its right. Hash
* what is in the middle between { and }. */
// 使用 {} 包含的字符串计算hash值
return crc16(key+s+1,e-s-1) & 0x3FFF;
}


// 代码文件:src/crc16.c
// crc16() 的实现逻辑

/* CRC16 implementation according to CCITT standards.
*
* Note by @antirez: this is actually the XMODEM CRC 16 algorithm, using the
* following parameters:
*
* Name : "XMODEM", also known as "ZMODEM", "CRC-16/ACORN"
* Width : 16 bit
* Poly : 1021 (That is actually x^16 + x^12 + x^5 + 1)
* Initialization : 0000
* Reflect Input byte : False
* Reflect Output CRC : False
* Xor constant to output CRC : 0000
* Output for "123456789" : 31C3
*/

static const uint16_t crc16tab[256]= {
0x0000,0x1021,0x2042,0x3063,0x4084,0x50a5,0x60c6,0x70e7,
0x8108,0x9129,0xa14a,0xb16b,0xc18c,0xd1ad,0xe1ce,0xf1ef,
0x1231,0x0210,0x3273,0x2252,0x52b5,0x4294,0x72f7,0x62d6,
0x9339,0x8318,0xb37b,0xa35a,0xd3bd,0xc39c,0xf3ff,0xe3de,
0x2462,0x3443,0x0420,0x1401,0x64e6,0x74c7,0x44a4,0x5485,
0xa56a,0xb54b,0x8528,0x9509,0xe5ee,0xf5cf,0xc5ac,0xd58d,
0x3653,0x2672,0x1611,0x0630,0x76d7,0x66f6,0x5695,0x46b4,
0xb75b,0xa77a,0x9719,0x8738,0xf7df,0xe7fe,0xd79d,0xc7bc,
0x48c4,0x58e5,0x6886,0x78a7,0x0840,0x1861,0x2802,0x3823,
0xc9cc,0xd9ed,0xe98e,0xf9af,0x8948,0x9969,0xa90a,0xb92b,
0x5af5,0x4ad4,0x7ab7,0x6a96,0x1a71,0x0a50,0x3a33,0x2a12,
0xdbfd,0xcbdc,0xfbbf,0xeb9e,0x9b79,0x8b58,0xbb3b,0xab1a,
0x6ca6,0x7c87,0x4ce4,0x5cc5,0x2c22,0x3c03,0x0c60,0x1c41,
0xedae,0xfd8f,0xcdec,0xddcd,0xad2a,0xbd0b,0x8d68,0x9d49,
0x7e97,0x6eb6,0x5ed5,0x4ef4,0x3e13,0x2e32,0x1e51,0x0e70,
0xff9f,0xefbe,0xdfdd,0xcffc,0xbf1b,0xaf3a,0x9f59,0x8f78,
0x9188,0x81a9,0xb1ca,0xa1eb,0xd10c,0xc12d,0xf14e,0xe16f,
0x1080,0x00a1,0x30c2,0x20e3,0x5004,0x4025,0x7046,0x6067,
0x83b9,0x9398,0xa3fb,0xb3da,0xc33d,0xd31c,0xe37f,0xf35e,
0x02b1,0x1290,0x22f3,0x32d2,0x4235,0x5214,0x6277,0x7256,
0xb5ea,0xa5cb,0x95a8,0x8589,0xf56e,0xe54f,0xd52c,0xc50d,
0x34e2,0x24c3,0x14a0,0x0481,0x7466,0x6447,0x5424,0x4405,
0xa7db,0xb7fa,0x8799,0x97b8,0xe75f,0xf77e,0xc71d,0xd73c,
0x26d3,0x36f2,0x0691,0x16b0,0x6657,0x7676,0x4615,0x5634,
0xd94c,0xc96d,0xf90e,0xe92f,0x99c8,0x89e9,0xb98a,0xa9ab,
0x5844,0x4865,0x7806,0x6827,0x18c0,0x08e1,0x3882,0x28a3,
0xcb7d,0xdb5c,0xeb3f,0xfb1e,0x8bf9,0x9bd8,0xabbb,0xbb9a,
0x4a75,0x5a54,0x6a37,0x7a16,0x0af1,0x1ad0,0x2ab3,0x3a92,
0xfd2e,0xed0f,0xdd6c,0xcd4d,0xbdaa,0xad8b,0x9de8,0x8dc9,
0x7c26,0x6c07,0x5c64,0x4c45,0x3ca2,0x2c83,0x1ce0,0x0cc1,
0xef1f,0xff3e,0xcf5d,0xdf7c,0xaf9b,0xbfba,0x8fd9,0x9ff8,
0x6e17,0x7e36,0x4e55,0x5e74,0x2e93,0x3eb2,0x0ed1,0x1ef0
};

uint16_t crc16(const char *buf, int len) {
int counter;
uint16_t crc = 0;
for (counter = 0; counter < len; counter++)
crc = (crc<<8) ^ crc16tab[((crc>>8) ^ *buf++)&0x00FF];
return crc;
}

hash tag 可能导致的问题

在Redis cluster集群中,使用hash tag可以确保多个键分配到同一个哈希槽中。但是,以下几种情况可能会导致一些负面影响:

  1. 存储时,大量key使用hash tag后落到同一slot,slot所在节点导致存储大量数据,甚至超过内存上限(数据倾斜)
  2. 查询/删除时,大量key使用hash tag后落到同一slot,slot所在节点处理大量请求,导致服务器忙碌、响应失败,甚至宕机(hot key)

hash tag是一把双刃剑,在使用时需要考虑具体业务逻辑与场景,应当尽量避免上述问题。假设无法避免时,可以对key按照业务线或者场景进行细化,进而对key进行拆分,以便更均匀的存储在不同的slot上。

延伸阅读

dockerhub bitnami/redis-cluster
Redis cluster specification
Redis hash tag 如何運作?
Redis Cluster mode and Key distribution