How to rebalance data across nodes?

I am implementing a message queue where messages are distributed across nodes in a cluster. The goal is to design a system to be able to auto-scale without needing to keep a global map of each message and its location. My question is: when I add a new node, I want to rebalance messages. How do I handle requests that come in after the node has been added, but before data has been copied? This problem seems to exist regardless of the hashing algorithm I use. Take this sequence of events: There is a single node, A, with two messages, M1 and M2 Add new node, B. M2 should be moved to B. The user attempts to delete M2. However, this fails because message is not yet found in B. Ideally, I would route the request to node A while B is still being populated. But then B would have a stale state of data was modified. How can I solve this problem of hot-swapping data between nodes without going as far as keeping a centralized database of all messages?

Apr 9, 2025 - 01:19

I am implementing a message queue where messages are distributed across nodes in a cluster. The goal is to design a system to be able to auto-scale without needing to keep a global map of each message and its location.

My question is: when I add a new node, I want to rebalance messages. How do I handle requests that come in after the node has been added, but before data has been copied? This problem seems to exist regardless of the hashing algorithm I use.

Take this sequence of events:

There is a single node, A, with two messages, M1 and M2
Add new node, B. M2 should be moved to B.
The user attempts to delete M2. However, this fails because message is not yet found in B.

Ideally, I would route the request to node A while B is still being populated. But then B would have a stale state of data was modified.

How can I solve this problem of hot-swapping data between nodes without going as far as keeping a centralized database of all messages?