Docker Community Forums

Share and learn in the Docker community.

How kafka container stores its data between reloads?


I have running container johnnypark/kafka-zookeeper:1.1.1
doker inspect command shows that there are no volumes there:

“VolumeDriver”: “”,
“VolumesFrom”: null,
“Volumes”: null,

But despite this after reload I see earlier created topics with data in them. In another word, I see some data which was created by the previous container of that image.

Where it stores the data if there are no volumes?

The short answer is that it’s not insane, people do this all the time, and Kafka was actually designed for this type of usage. But first, why might you want to do this? There are actually a number of use cases, here’s a few:

You may be building an application using event sourcing and need a store for the log of changes. Theoretically you could use any system to store this log, but Kafka directly solves a lot of the problems of an immutable log and “materialized views” computed off of that. The New York Times does this for all their article data as the heart of their CMS.
You may have an in-memory cache in each instance of your application that is fed by updates from Kafka. A very simple way of building this is to make the Kafka topic log compacted, and have the app simply start fresh at offset zero whenever it restarts to populate its cache.
Stream processing jobs do computation off a stream of data coming via Kafka. When the logic of the stream processing code changes, you often want to recompute your results. A very simple way to do this is just to reset the offset for the program to zero to recompute the results with the new code. This sometimes goes by the somewhat grandiose name of The Kappa Architecture.
Kafka is often used to capture and distribute a stream of database updates (this is often called Change Data Capture or CDC). Applications that consume this data in steady state just need the newest changes, however new applications need start with a full dump or snapshot of data. However performing a full dump of a large production database is often a very delicate and time consuming operation. Enabling log compaction on the topic containing the stream of changes allows consumers of this data to simple reload by resetting to offset zero.

Thank you for the answer! But it seems to me that you misunderstood me. My question is, how does the data in Kafka topics persists after a container reboot if there are no volumes in that container?

The question is still actual.