拾荒的小海螺

1、简述

在分布式系统中，负载均衡和数据分片是核心问题。传统的取模方式（hash(key) % N）虽然简单，但当节点数量变化时，会导致 大规模数据迁移。

为了解决这一问题，引入了 一致性哈希算法（Consistent Hashing）。它可以在节点变化时，仅影响部分数据映射，大大降低系统的抖动和迁移开销。

2、一致性哈希的核心思想

🔹 将哈希空间抽象成一个环（0 ~ 2³²-1）；

🔹 将节点（例如缓存服务器）映射到环上；

🔹 数据（key）通过哈希函数映射到环上，顺时针找到最近的节点作为归属；

🔹 当节点增减时，只会影响该节点相邻区间的少量数据。

直观对比

🔹 取模法：节点数变化 → 所有数据几乎都要重新分布。

🔹 一致性哈希：节点数变化 → 只有少部分数据需要迁移。

虚拟节点

为了避免数据分布不均，一致性哈希中常引入 虚拟节点（Virtual Node） 概念：

🔹 每个真实节点对应多个虚拟节点；

🔹 数据实际映射到虚拟节点，再归属于真实节点；

🔹 提升数据分布均匀性，避免“倾斜”。

3、实践样例

3.1 定义一致性哈希类

import java.nio.charset.StandardCharsets;
import java.security.MessageDigest;
import java.util.*;

public class ConsistentHash<T> {
    private final HashFunction hashFunction;
    private final int numberOfReplicas; // 每个节点的虚拟节点数
    private final SortedMap<Integer, T> circle = new TreeMap<>();

    public ConsistentHash(HashFunction hashFunction, int numberOfReplicas, Collection<T> nodes) {
        this.hashFunction = hashFunction;
        this.numberOfReplicas = numberOfReplicas;
        for (T node : nodes) {
            add(node);
        }
    }

    // 添加节点（含虚拟节点）
    public void add(T node) {
        for (int i = 0; i < numberOfReplicas; i++) {
            circle.put(hashFunction.hash(node.toString() + i), node);
        }
    }

    // 移除节点
    public void remove(T node) {
        for (int i = 0; i < numberOfReplicas; i++) {
            circle.remove(hashFunction.hash(node.toString() + i));
        }
    }

    // 获取 key 对应的节点
    public T get(Object key) {
        if (circle.isEmpty()) {
            return null;
        }
        int hash = hashFunction.hash(key);
        if (!circle.containsKey(hash)) {
            SortedMap<Integer, T> tailMap = circle.tailMap(hash);
            hash = tailMap.isEmpty() ? circle.firstKey() : tailMap.firstKey();
        }
        return circle.get(hash);
    }

    // 简单的 MD5 哈希实现
    public interface HashFunction {
        int hash(Object key);
    }

    public static class MD5Hash implements HashFunction {
        @Override
        public int hash(Object key) {
            try {
                MessageDigest md = MessageDigest.getInstance("MD5");
                byte[] digest = md.digest(key.toString().getBytes(StandardCharsets.UTF_8));
                return ((digest[3] & 0xFF) << 24) | ((digest[2] & 0xFF) << 16) |
                        ((digest[1] & 0xFF) << 8) | (digest[0] & 0xFF);
            } catch (Exception e) {
                throw new RuntimeException(e);
            }
        }
    }
}

3.2 使用示例

import java.util.Arrays;

public class ConsistentHashDemo {
    public static void main(String[] args) {
        ConsistentHash<String> consistentHash =
                new ConsistentHash<>(new ConsistentHash.MD5Hash(), 100,
                        Arrays.asList("NodeA", "NodeB", "NodeC"));

        // 模拟分配 10 个 key
        for (int i = 1; i <= 10; i++) {
            String key = "key" + i;
            String node = consistentHash.get(key);
            System.out.println(key + " is mapped to " + node);
        }

        // 模拟 NodeC 宕机
        System.out.println("\n---- Remove NodeC ----");
        consistentHash.remove("NodeC");

        for (int i = 1; i <= 10; i++) {
            String key = "key" + i;
            String node = consistentHash.get(key);
            System.out.println(key + " is mapped to " + node);
        }
    }
}

3.3 运行效果

添加三个节点时：

key1 is mapped to NodeB
key2 is mapped to NodeA
key3 is mapped to NodeC
...

移除 NodeC 后：

key3 is mapped to NodeB
key4 is mapped to NodeA
...

👉 可以看到，只有 部分 key 从 NodeC 迁移到了 NodeA/NodeB，而不是全部数据重新分配。

4、应用场景

🔹 分布式缓存（Redis、Memcached 集群）：缓存 key 的分布与扩缩容。

🔹 分布式存储（HDFS、Ceph）：文件块分布在不同存储节点。

🔹 消息队列（Kafka 分区）：消息的分区分配。

🔹 API 网关负载均衡：请求分发到不同微服务实例。

5、总结

🔹 一致性哈希通过构建哈希环，使得节点变化时，只需迁移少量数据；

🔹 引入虚拟节点机制，可进一步保证数据分布均衡；

🔹 在分布式缓存、存储和负载均衡中应用广泛。

JAVA：一致性哈希（Consistent Hashing）算法的技术指南