Streams for Apache Kafka

What is Streams for Apache Kafka?

Self-managed distribution of Apache Kafka designed to deliver a superior install, configuration, and management experience on Red Hat OpenShift.

Based on the open source Strimzi project (CNCF Incubation project), it provides:

  • Container images for Apache Kafka

  • Operators to manage clusters, topics, and users

  • HTTP Bridge for Apache Kafka

  • Console UI (StreamsHub)

Apache Kafka — Fundamentals

Apache Kafka is a distributed data streaming system built on a publish-subscribe model.

Key characteristics:

  • Horizontal scalability

  • Fault tolerance

  • Immutable data (append-only log)

  • Open source (Apache License 2.0)

Use cases:

  • Real-time recommendations

  • IoT applications

  • Data gathering for AI

  • Change Data Capture (CDC)

Strimzi Operators

Diagram

The Cluster Operator watches the Kafka CR and reconciles desired state. The Topic and User Operators run as containers inside the Entity Operator.

KRaft deployment (without Zookeeper)

Starting with Streams for Apache Kafka 3.x, metadata is stored inside Kafka using KRaft, removing the Zookeeper dependency.

apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
  name: cdc-cluster
  namespace: kafka-cdc
spec:
  kafka:
    version: "4.0.0"
    replicas: 3
    listeners:
      - name: plain
        port: 9092
        type: internal
        tls: false
      - name: tls
        port: 9093
        type: internal
        tls: true
    config:
      offsets.topic.replication.factor: 3
      transaction.state.log.replication.factor: 3
      transaction.state.log.min.isr: 2
      default.replication.factor: 3
      min.insync.replicas: 2
    storage:
      type: persistent-claim
      size: 10Gi
      deleteClaim: false
    metricsConfig:
      type: jmxPrometheusExporter
      valueFrom:
        configMapKeyRef:
          name: kafka-metrics-config
          key: kafka-metrics-config.yml
  entityOperator:
    topicOperator: {}
    userOperator: {}
  kafkaExporter:
    topicRegex: ".*"
    groupRegex: ".*"

KRaft cluster: 3 nodes

How it Works

KRaft consensus (without Zookeeper)

Starting with Kafka 3.x, cluster metadata is managed internally using the KRaft (Kafka Raft) protocol:

  1. Each broker has a role: controller (metadata management) or broker (data storage and serving), or both in small clusters (combined mode).

  2. Controllers form a quorum using Raft. One controller is elected active controller — it is the only one that writes metadata.

  3. When a topic is created or a partition is reassigned, the active controller writes the change to an internal log (__cluster_metadata), which is replicated to the other controllers.

  4. Brokers obtain metadata by subscribing to the controller log — there is no polling; it is push-based.

  5. If the active controller fails, the remaining controllers elect a new leader via Raft in milliseconds, eliminating the multi-minute delay that could occur with Zookeeper.

Message lifecycle

  1. A producer sends a batch of records to the leader broker of the target partition.

  2. The leader writes the batch to its local log (append-only on disk, sequential I/O).

  3. Followers (replicas) fetch the batch from the leader and write it to their own logs.

  4. When min.insync.replicas replicas (including the leader) acknowledge the write, the leader sends an ACK to the producer.

  5. The batch offset is assigned sequentially — it is immutable and monotonically increasing.

  6. A consumer with an assigned consumer group calls poll() on the leader of each assigned partition, receiving batches from its last committed offset.

Strimzi reconciliation loop

The Strimzi Cluster Operator behaves like a classic Kubernetes controller:

  1. It watches changes to the Kafka, KafkaConnect, KafkaTopic, and KafkaUser CRs.

  2. It computes the difference between desired state (CR) and current state (StatefulSets, ConfigMaps, Secrets).

  3. It applies changes in a rolling fashion — it never stops all brokers at once.

  4. The Topic and User Operators run as sidecars in the Entity Operator pod and reconcile their respective CRs against the Kafka API directly (not Kubernetes resources).

Kafka Bridge (HTTP REST)

The Kafka Bridge lets you produce and consume messages over HTTP REST — ideal for demos and testing without a native Kafka client:

apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaBridge
metadata:
  name: cdc-bridge
  namespace: kafka-cdc
spec:
  replicas: 1
  bootstrapServers: cdc-cluster-kafka-bootstrap:9092
  http:
    port: 8080

Example — produce a message:

curl -X POST https://kafka-bridge-kafka-cdc.apps.<domain>/topics/cdc.public.customers \
  -H "Content-Type: application/vnd.kafka.json.v2+json" \
  -d '{"records":[{"value":{"first_name":"Test","last_name":"User","email":"test@demo.io"}}]}'

Security: Authentication and encryption

The Kafka cluster exposes two listeners: plain (9092, no TLS) and tls (9093, with TLS). Production clients should use the TLS listener with SCRAM-SHA-512 authentication.

KafkaUser with SCRAM-SHA-512

Strimzi manages credentials automatically when you create a KafkaUser resource. The operator generates a Secret with the password:

apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaUser
metadata:
  name: cdc-user
  namespace: kafka-cdc
  labels:
    strimzi.io/cluster: cdc-cluster
spec:
  authentication:
    type: scram-sha-512
  authorization:
    type: simple
    acls:
      - resource:
          type: topic
          name: cdc
          patternType: prefix
        operations: [Read, Write, Describe]
      - resource:
          type: topic
          name: dlq
          patternType: prefix
        operations: [Read, Write, Describe]
      - resource:
          type: group
          name: cdc-connect-cluster
          patternType: literal
        operations: [Read]
      - resource:
          type: group
          name: camel-cdc-consumer
          patternType: literal
        operations: [Read]

ACLs restrict access to topics with the cdc and dlq prefixes only, and to the pipeline’s specific consumer groups.

Client configuration — KafkaConnect

KafkaConnect is configured to use the TLS listener (9093) with SCRAM-SHA-512:

spec:
  bootstrapServers: cdc-cluster-kafka-bootstrap:9093
  authentication:
    type: scram-sha-512
    username: cdc-user
    passwordSecret:
      secretName: cdc-user
      password: password
  tls:
    trustedCertificates:
      - secretName: cdc-cluster-cluster-ca-cert
        certificate: ca.crt

Strimzi automatically generates the cdc-cluster-cluster-ca-cert secret with the cluster CA certificate.

Client configuration — Apache Camel

Camel uses SASL/SSL properties in the kafka component parameters:

parameters:
  brokers: cdc-cluster-kafka-bootstrap.kafka-cdc.svc:9093
  securityProtocol: SASL_SSL
  saslMechanism: SCRAM-SHA-512
  saslJaasConfig: "org.apache.kafka.common.security.scram.ScramLoginModule required username=\"${env:KAFKA_USER}\" password=\"${env:KAFKA_PASSWORD}\";"
  sslTruststoreLocation: /etc/kafka-certs/ca.crt
  sslTruststoreType: PEM

Credentials are injected as environment variables from the Secret generated by Strimzi.

Streams for Apache Kafka ecosystem

Component Role in the ecosystem

Kafka Core

Brokers, topics, partitions, replication

Kafka Connect

Connector framework (source/sink)

Kafka Bridge

HTTP REST proxy to produce/consume

Apicurio Registry

Schema Registry (Avro, JSON Schema, Protobuf)

Debezium

CDC connectors for PostgreSQL, MySQL, MongoDB, etc.

Streams Console

Web UI to monitor clusters, topics, and consumer groups

Kafka Exporter

Exports consumer group lag metrics to Prometheus

Mirror Maker 2

Cross-cluster replication

Kroxylicious

Kafka Proxy — encryption at rest, schema validation

Official Documentation