Internal Architecture
Partition Design
Each CDC pipeline topic is configured with 3 partitions and 3 replicas:
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaTopic
metadata:
name: cdc.public.customers
namespace: kafka-cdc
labels:
strimzi.io/cluster: cdc-cluster
spec:
partitions: 3
replicas: 3
config:
retention.ms: 604800000
cleanup.policy: compact
Why 3 partitions?
| Factor | Decision |
|---|---|
Parallelism |
3 partitions allow up to 3 concurrent consumers in a consumer group |
Replication |
With RF=3 and |
Ordering |
Debezium uses the primary key as the partition key, preserving per-record ordering |
Scalability |
Partitions can be increased without downtime (only added, never removed) |
Partition Key in CDC
Debezium uses the record’s primary key as the partition key. This ensures all changes for the same record go to the same partition, preserving ordering:
Record with id=1 → Partition 0 Record with id=2 → Partition 1 Record with id=3 → Partition 2 Record with id=4 → Partition 0 (hash wraps)
Consumer Groups and Scalability
The pipeline has two main consumer groups:
| Consumer Group | Component | Replicas |
|---|---|---|
|
KafkaConnect (Debezium + HTTP Sink) |
2 |
|
Apache Camel CDC Processor |
2 |
Partition balancing
With 3 partitions and 2 consumers:
Consumer 0: Partition 0, Partition 1 Consumer 1: Partition 2
If a consumer fails, Kafka rebalances automatically:
Consumer 1: Partition 0, Partition 1, Partition 2
When scaling to 3 replicas, each consumer processes exactly 1 partition (optimal distribution).
Offset and Checkpoint Handling
Kafka offsets
Each consumer group maintains its position (offset) per partition on the internal topic __consumer_offsets:
oc exec -it cdc-cluster-kafka-0 -n kafka-cdc -- bin/kafka-consumer-groups.sh \
--bootstrap-server localhost:9092 \
--describe --group camel-cdc-consumer
Debezium checkpoints
Debezium maintains its own checkpoint via the PostgreSQL replication slot and KafkaConnect offset topics:
config:
offset.storage.topic: cdc-connect-offsets
config.storage.topic: cdc-connect-configs
status.storage.topic: cdc-connect-status
offset.storage.replication.factor: 3
config.storage.replication.factor: 3
status.storage.replication.factor: 3
-
cdc-connect-offsets— current position in the PostgreSQL WAL -
cdc-connect-configs— connector configuration -
cdc-connect-status— task status
All three topics use replication.factor: 3 to survive the loss of one broker.
PostgreSQL replication slot
Debezium creates a replication slot so changes are not lost if the connector restarts:
config:
slot.name: debezium_cdc
publication.name: cdc_publication
snapshot.mode: initial
-
slot.name— name of the logical slot in PostgreSQL -
snapshot.mode: initial— on first startup, Debezium captures a full snapshot of the tables
To verify the slot:
oc exec -it deploy/cdc-postgresql -n kafka-cdc -- psql -U cdcuser -d cdcdb -c \
"SELECT slot_name, plugin, slot_type, active FROM pg_replication_slots;"
Retention and Reprocessing
Retention policies
| Topic | Retention | Cleanup Policy |
|---|---|---|
|
7 days (604800000 ms) |
|
|
7 days (604800000 ms) |
|
|
30 days (2592000000 ms) |
|
|
30 days (2592000000 ms) |
|
What is cleanup.policy: compact?
With compact, Kafka retains only the latest value for each key, removing older versions. For CDC:
-
The current state of each record is preserved
-
Intermediate updates are removed after compaction
-
You can reconstruct the full table state by consuming the topic from the beginning
Reprocessing
To reprocess events, you can reset the consumer group offset:
oc exec -it cdc-cluster-kafka-0 -n kafka-cdc -- bin/kafka-consumer-groups.sh \
--bootstrap-server localhost:9092 \
--group camel-cdc-consumer \
--topic cdc.public.customers \
--reset-offsets --to-earliest \
--execute
| This causes all messages on the topic to be reprocessed. Use with caution in production. |
Replication Configuration
The Kafka cluster uses the following replication parameters to ensure durability:
config:
offsets.topic.replication.factor: 3
transaction.state.log.replication.factor: 3
transaction.state.log.min.isr: 2
default.replication.factor: 3
min.insync.replicas: 2
-
min.insync.replicas: 2— a write is acknowledged only if 2 of the 3 replicas persist it -
default.replication.factor: 3— all new topics are created with RF=3
This means the cluster can tolerate the loss of 1 broker without losing data or availability.
How it Works
Writing to Kafka: from producer to disk
When Debezium (or any producer) sends a message to Kafka, the following happens internally:
-
The producer serializes the key and value, computes
hash(key) % numPartitionsto choose the target partition. -
The producer batches messages per partition (configurable via
batch.sizeandlinger.ms) to optimize I/O. -
The batch is sent to the partition leader broker over TCP.
-
The leader writes the batch to a segment file on disk — an append-only log. Segments rotate when they reach 1GB (configurable).
-
Each message gets a sequential offset: a 64-bit integer, monotonically increasing, unique within the partition.
-
Followers fetch the batch from the leader, write it to their own segments, and send an ACK to the leader.
-
With
acks=all(Strimzi default) andmin.insync.replicas=2, the leader acknowledges the producer only when at least 2 replicas have persisted the data.
Reading in Kafka: consumer groups
-
Each consumer in the group connects to the group coordinator (a broker chosen to manage the group).
-
The coordinator runs a partition assignment protocol (range or round-robin), distributing partitions across consumers.
-
Each consumer periodically `poll()`s the leader of its assigned partitions, receiving message batches from its last offset.
-
The consumer processes messages and then commits the offset to the internal topic
__consumer_offsets. -
If a consumer disconnects (no heartbeat within
session.timeout.ms), the coordinator triggers a rebalance: orphaned partitions are reassigned to the remaining consumers.
Compaction: cleanup.policy=compact
For CDC topics with cleanup.policy: compact:
-
Kafka periodically runs a log cleaner that scans closed segments.
-
For each key, it retains only the message with the highest offset (the most recent).
-
Messages with duplicate keys and lower offsets are physically removed from disk.
-
A message with a
nullvalue (tombstone) marks the key for full removal after thedelete.retention.msperiod. -
Result: the topic contains exactly one message per key — the current state of each database record.
Network Policies
The kafka-cdc namespace is protected with NetworkPolicy rules that restrict ingress traffic:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: kafka-cdc-allow-internal
namespace: kafka-cdc
spec:
podSelector: {}
policyTypes:
- Ingress
ingress:
- from:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: kafka-cdc
- from:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: openshift-cluster-observability-operator
ports:
- protocol: TCP
port: 9404
-
Only pods inside
kafka-cdccan communicate with each other -
The observability namespace can access the metrics port (9404) for Prometheus scraping
Pod Disruption Budgets
PDBs ensure maintenance operations (drains, upgrades) do not leave the pipeline without capacity:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: kafka-brokers-pdb
namespace: kafka-cdc
spec:
minAvailable: 2
selector:
matchLabels:
strimzi.io/cluster: cdc-cluster
strimzi.io/kind: Kafka
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: kafka-connect-pdb
namespace: kafka-cdc
spec:
minAvailable: 1
selector:
matchLabels:
strimzi.io/cluster: cdc-connect
strimzi.io/kind: KafkaConnect
-
kafka-brokers-pdb— at least 2 of the 3 brokers are always available -
kafka-connect-pdb— at least 1 of the 2 Connect workers is always available
Official Documentation
-
Apache Kafka Design — Kafka internal architecture (partitions, replication, storage)
-
Configuring Kafka Topics — Topic management with Strimzi
-
Strimzi Configuration Guide — Advanced operator and CR configuration
-
Kafka Consumer Configurations — Consumer group and offset parameters
-
Pod Scheduling and Disruption Budgets — PDBs and scheduling on OpenShift