Event sourcing with Kafka Streams - DNA Technology - Medium (original) (raw)

Event sourcing with Kafka Streams

Mateusz Jadczyk

Photo by Leo Rivas on Unsplash

The main goal of this article is to share lessons learnt during creation of a product using Apache Kafka Streams library and based on the Event Sourcing architectural pattern. To give you a proper background, I intend to:

Let’s go!

Some background (and a disclaimer)

Before this project, none of our team members had any practical experience with DDD, Event Sourcing or Kafka. Thinking in terms of events about everything is a challenge at the beginning but tends to solve a lot of problems in certain domains. The complexity of Kafka itself doesn’t help too.

For DDD fans: we certainly used various aspects of a DDD approach (and CQRS), but it’s not a purely DDD-based solution in its bones. Naming may be different but you’ll notice the resemblance.

Business context

I’m not allowed to share too many details in this section, but I can tell you it’s a greenfield FinTech product constituting a part of a larger system. Its main purpose is to be a ledger service tracking users’ money flow. It also handles bank transfers and other financial information.

The service is part of a regulated environment and therefore the audit trail is utterly important. In most user scenarios there were no requirements about real-time processing and some delays or even downtimes were acceptable.

Technical context

As numerous services in the system were created by the time we joined, a few system-wide decisions were already made. The whole system was supposed to utilise event sourcing and use Kafka as the Source Of Truth. This limited possible solutions.

Our small team (5 ppl) felt most comfortable with Java and we read that Java Kafka client libraries have the broadest support for Kafka features. Moreover, as we already had lots of new stuff waiting for us, the technology choice was a quick one. The stack is a standard one:

As it all started a while back, we could find limited information about Kafka Streams. Various articles said it’s perfect for event sourcing, others that it’s a no-go.

We were particularly tempted by one feature offered by Kafka Streams:
out-of-the-box exactly-once semantics which work together with Kafka Streams state stores (lots of jargon in a single sentence, we’ll get to it 😉 ). This seemed ideal for our use case and easy enough to use compared to other possible solutions. It was indeed but brought other challenges on the way.

If you are already familiar with Kafka and Kafka Streams, you may want to skip directly to our event sourcing solution or lessons learnt.

Apache Kafka recap

There’re enough articles already about Kafka and how it operates, so just a quick recap:

If you need more basics before reading further, this is a good explanation with nice images: https://www.cloudkarafka.com/blog/part1-kafka-for-beginners-what-is-apache-kafka.html

Kafka Streams crash course

Even though Kafka Streams library has been available for more than 5 years, it’s not as popular as Kafka itself. Let me do a quick introduction.

Kafka Streams main features

Processor topology and some code

Kafka Streams use a notion of a topology for modelling computational logic:

https://kafka.apache.org/28/documentation/streams/tutorial

Going from a concept to code, a simple example borrowed from Apache Kafka Streams tutorial shows an application counting the occurrence of the words split from the source text stream (streams-plaintext-input topic). It is then continuously producing the current values to the output topic (streams-wordcount-output):

We as developers define such a topology in the code which is then analysed by Kafka Streams and turned into a running and already scalable application.

Kafka Streams State Stores

For us, this is the defining part of Kafka Streams — State Stores which enable stateful operations. You’ve already seen this in the example: the total count of particular words needs to be stored on the go but should also survive restarts/crashes of the application.

Kafka Streams State Stores are an enabler for all stateful operations: joins, aggregates, grouping, etc.

They are part of Kafka Streams and adhere to similar principles, but provide a few extra features:

They achieve persistence and fault-tolerance by simply using custom Kafka topics underneath, called changelog topics, as storage. The data is loaded during the startup from the topic to memory (or disk). Thanks to that all neat tricks applied to topics apply also to State Stores. One of these features is…

Processing guarantees

Kafka Streams guarantee that for any record read from the source Kafka topics, its processing results will be reflected exactly once in the output topic as well as in the state stores for stateful operations.

This is usually quite hard to achieve in distributed systems when things can break and migrate at any time. But as Kafka offers transactions, Kafka Streams make use of them and perform the following operations atomically in a single transaction:

All you need to do is to set a single property in Kafka Streams configuration:

processing.guarantee = “exactly_once”

Our event sourcing solution

The concept of event sourcing seems pretty simple on the surface:

Commands, which can request a change in the system, are issued. They are validated and accepted or rejected. Once accepted, they are handled and as a result, an event is produced. An event means a fact, something that happened, and cannot be changed or rejected. Because we store the whole history of events, we can calculate the current state of an entity at any point in time.

How can this concept be implemented using Kafka Streams?

A command is sent to the command topic. There is a Kafka Streams Application with a defined topology that starts to handle the command. The first step is validation and to do that we sometimes need to peek into the current state of an entity, so we read the state from the State Store.

Once the validation is completed, we send a reply to the command_reply topic, so that the service requesting a change knows whether it was accepted or rejected.

The processing goes on. Based on the command, an event is created. Sometimes the command doesn’t contain all the information needed to build an event, and we need to look into the State Store again to retrieve extra data about the entity. The event is then sent to the event topic. This topic has infinite retention which means the messages are stored forever.

Finally, based on the event, we update the state held in the State Store. One can say we apply the event on top of the current state.

Additionally, we push the state to the state topic if it’s needed by other domains. The structure of data in this topic differs from the State Store structure — we don’t want internal implementation details to leak outside.

All of the processing happens in a single Kafka transaction. This means the offset commit for the command topic, 3 produced messages and the state update will happen exactly once.

At this point, some of you may be wondering why it’s called event sourcing if the command is the trigger for the whole flow. What would happen if one needed to change the structure of the internal State Store and replay all the events from the beginning? After all, the state in event sourcing is derived from the events which are the Source of Truth. It may turn out that a different internal representation of the internal state is needed at some point.

There’s an alternative path for it:

We have an extra processor consuming from the event topic (which we just produced an event to) and updating the state. We version both the events and the state. In the every day flow (when the state is calculated in real-time, based on a single new event) these updates are skipped — the state is already up-to-date.

In a case where we need to recreate the state, we clear the state store and reset offsets to zero for the event topic (e.g. using Kafka Streams Application Reset Tool). This is currently not ideal, because we depend on the consumer algorithm which tries to process messages from the topic with older timestamps. However, there’s no ordering guarantee when subscribing to 2 different topics (command and event in this case). A better solution would involve a flag not to consume any commands when recreating the state. This procedure implies also some downtime because the Kafka Streams Application needs to be down to be able to perform manual work on the offsets/topics.

Simple implementation

This snippet represents a topology for a single Kafka Streams Application. It implements the flow we discussed earlier. The validation part as well as the internal implementation of the processor nodes are skipped.

Using output logs from Kafka Streams and this great visualization tool we can see how the actual topology will look like:

Lessons learnt

Advantages

Pitfalls

Summary

Some people ask if Kafka Streams have been a good choice. Unfortunately, lacking experience in other frameworks for event sourcing, it would be unfair to answer. It for sure brings a lot of great value, but not for free. We nevertheless see Kafka Streams benefits for other streaming uses, so you may consider applying this library in your project.

I had to omit some details not to make the article too long, and I encourage you to ask questions in the comments.

You may want to read another story about implementing Kafka consumers health check in Spring Boot Actuator or check out more DNA Technology articles.