3 min read

NATS Weekly #24

NATS Weekly #24

Week of April 25 - May 1, 2022

🗞 Announcements, writings, and projects

A short list of  announcements, blog posts, projects updates and other news.

️ 💁 News

Random announcements that don't fit into the other categories.

⚡Releases

Official releases from NATS repos and others in the ecosystem.

🎬 Media

Audio or video recordings about or referencing NATS.

  • We Love NATS - recording of Red Badger meetup from April 20, 2022

📖 Articles

Blog posts, tutorials, or any other text-based content about NATS.

💡 Recently asked questions

Questions sourced from Slack, Twitter, or individuals. Responses and examples are in my own words, unless otherwise noted.

How can I leverage the new "deterministic subject token partitioning" feature?

Thanks to Jean-Noel Moyne for introducing this new feature in Slack since there have been a handful of people over the months asking for it.

NATS has had subject mapping since v2.2 which provides a way to re-map subjects of published messages for subscribers. For example, given the following mapping, any message published to foo will be sent to subscribers interested in the bar subject (or some inclusive wildcard).

mappings: {
  "foo": "bar"
}

And to be explicit, a client subscribing to foo will not receive the messages since they have been mapped (it is not simply an alternative subject).

A mapping can also reference wildcards by their relative index. For example, a subject of foo.*.bar.* has two wildcards $1 and $2 (note they are relative to the set of wildcards, not all tokens). The following mapping would take a published subject of foo.5.bar.9 and remap it to foo.bar.9.5.

mappings: {
  "foo.*.bar.*": "foo.bar.$2.$1"
}

With that information in mind, let's introduce the partition function that can be used in a mapping.

mappings: {
  "orders.*": "orders.$1.{{partition(3,1)}}"
}

The {{ and }} are used to differentiate it as a function rather than a token. The partition function takes a first argument of the number of partitions (3 in this example) and then one or more wildcard positions, e.g. 1 for the first wildcard. This follows the same pattern as with $1 but without the $ prefix.

If we were to publish N messages, such as order.1, order.2 up to order.N, the resulting subjects will look like this:

source        mapped
orders.1      orders.1.1
orders.2      orders.2.2
orders.3      orders.3.3
orders.4      orders.4.1 <- notice it wraps to the first partition
orders.5      orders.5.2
//...

Subscribers can then subscribe to a specific partition like orders.*.1 and it will receive all messages that are mapped to that partition. If another message is sent on behalf of orders.1 for example, the determinstic partitioning ensures it will be delivered to the same partition.

Note this works with core NATS as well as JetStream. If persistence of these partitions is desired, a stream per partition could be created, such as ORDERS-1 which binds the subject orders.*.1, ORDERS-2 that maps to orders.*.2, etc.

When a publish occurs, it will be received by the respective stream. Then consumers can be bound to those stream partitions to consume as normal.

Changing the number of partitions is as simple as changing the mapping rule, e.g. partition(5,1) to increase to 5 partitions, for example. Of course changing the partitions means the messages may be routed to different partitions than they were previously.

When increasing partitions, you want to be sure the subscribers are pre-connected or new stream partitions are created ahead of time so messages are not lost. It may also be desirable to ensure all existing messages in those stream partitions have been consumed before changing the partitions. This may or may not be possible/practical depending on the rate of messages being received.

When decreasing partitions, you can simply change the mapping configuration first and then remove the unused stream partitions once those messages have been fully consumed.

As a quick reminder, although a pull-based consumer or push-based queue consumer support scale out of message processing, each subscriber is get random batches of messages rather than consistently getting the subset relative to the subject. In other words, if I have three subscribers as part of a pull consumer, the first subscriber could get the first message of order.1 while the second subscriber might get the second message on order.1.

With this partitioning strategy, this consistent routing can be acheived since the stream is being pre-partitioned. Do note that even if you have partitions, a pull-based or push-based queue consumer on that partition will still get a random distribution of messages. So if strict ordering is still required, then a single consumer would still be needed. Since the scale has been reduced (due to the partitioning) this may be feasible.


If you would like more in-depth information with examples to any of these questions, please reach out on Slack or Twitter!