Observability + Service Mesh
CDC Pipeline Observability
The full pipeline is instrumented with Prometheus metrics, Grafana dashboards, and tracing via the Service Mesh.
Observability stack
| Component | Role |
|---|---|
Prometheus (via Cluster Observability Operator) |
Scrapes JMX metrics from Kafka, Debezium, and Camel |
Grafana |
"Kafka CDC Pipeline" dashboard with throughput, lag, and latency panels |
Kiali |
Visualization of traffic between services in the Service Mesh |
Kafka Exporter |
Exports consumer group lag metrics to Prometheus |
Grafana Dashboard — Kafka CDC Pipeline
Access: https://grafana-observability.
The dashboard is bound to the Grafana instance via instanceSelector. The label must match the Grafana CR instance label:
apiVersion: grafana.integreatly.org/v1beta1
kind: GrafanaDashboard
metadata:
name: kafka-cdc-pipeline
namespace: openshift-cluster-observability-operator
spec:
instanceSelector:
matchLabels:
dashboards: grafana-observability
json: |
{ ... }
The label dashboards: grafana-observability must match the Grafana CR instance label exactly. A mismatch (for example connectivity-link) will prevent the dashboard from ever mounting.
|
The "Kafka CDC Pipeline" dashboard includes the following panels:
| Panel | Metric |
|---|---|
Kafka Broker — Messages In/s |
|
Kafka Broker — Bytes In/Out |
Byte throughput per second (in and out) |
Consumer Group Lag |
|
Debezium — Streaming Duration |
|
KafkaConnect — Task Status |
Number of active connectors |
KafkaConnect — Records Processed/s |
|
PodMonitors for metrics
PodMonitors are configured for each pipeline component:
apiVersion: monitoring.rhobs/v1
kind: PodMonitor
metadata:
name: kafka-cluster-metrics
namespace: openshift-cluster-observability-operator
spec:
namespaceSelector:
matchNames:
- kafka-cdc
podMetricsEndpoints:
- interval: 30s
port: tcp-prometheus
selector:
matchLabels:
strimzi.io/cluster: cdc-cluster
strimzi.io/kind: Kafka
How it Works
Metrics pipeline: from JMX to Grafana
CDC pipeline metrics pass through four layers before visualization:
-
Exposure (JMX → Prometheus format) — Each Kafka broker and KafkaConnect exposes internal JMX metrics. A JMX Prometheus Exporter agent (configured via
metricsConfigin the Strimzi CR) converts JMX metrics to Prometheus text format on port 9404. Camel uses Micrometer to expose its metrics natively at/q/metrics. -
Scraping (Prometheus) —
PodMonitorCRs tell Prometheus which pods to scrape, on which port, and how often (interval: 30s). Prometheus stores time series with labels (topic, partition, consumer group, connector) that enable granular queries. -
Query (PromQL → panels) — Each Grafana dashboard panel runs a PromQL query. For example,
sum(rate(kafka_server_brokertopicmetrics_messagesin_total[5m])) by (topic)computes message throughput per second grouped by topic, using a 5-minute window to smooth spikes. -
Alerts (PrometheusRule → Alertmanager) — Alert rules are evaluated continuously by Prometheus. When an expression holds for the defined
forperiod (e.g.kafka_consumergroup_lag > 1000for 5 minutes), Prometheus fires an alert to Alertmanager, which can notify via email, Slack, PagerDuty, etc.
Kafka Exporter: consumer lag metrics
The kafkaExporter deployed by Strimzi is a dedicated process that:
-
Connects to the Kafka cluster and reads offsets for all consumer groups
-
Computes lag per partition:
lag = highWaterMark - consumerOffset -
Exposes the
kafka_consumergroup_lagmetric that Prometheus scrapes -
This helps detect bottlenecks: if lag grows, consumers are not keeping up with producers
Service Mesh — Istio Ambient Mode
The kafka-cdc namespace is enrolled in the Service Mesh using Istio ambient mode (no sidecars):
metadata:
labels:
istio.io/dataplane-mode: ambient
istio-discovery: enabled
Traffic visible in Kiali
In Kiali you can see the service graph for the kafka-cdc namespace:
-
PostgreSQL → KafkaConnect (Debezium)
-
KafkaConnect → Kafka brokers
-
Kafka → Camel CDC Processor
-
Camel → Mailpit (HTTP)
Ambient mode provides:
-
Automatic mTLS between all pods in the namespace
-
L4/L7 metrics without sidecars (via ztunnel)
-
Traffic visibility in Kiali without per-sidecar CPU/RAM overhead
Alerts — PrometheusRule
Alerts are deployed as a PrometheusRule resource that Prometheus evaluates automatically:
apiVersion: monitoring.rhobs/v1
kind: PrometheusRule
metadata:
name: kafka-cdc-alerts
namespace: openshift-cluster-observability-operator
labels:
openshift.io/user-monitoring: "true"
spec:
groups:
- name: kafka-cdc
rules:
- alert: KafkaConsumerLagHigh
expr: kafka_consumergroup_lag > 1000
for: 5m
labels:
severity: warning
annotations:
summary: "High consumer lag on {{ $labels.consumergroup }}"
- alert: DebeziumDisconnected
expr: debezium_postgres_Connected == 0
for: 2m
labels:
severity: critical
annotations:
summary: "Debezium disconnected from PostgreSQL"
- alert: KafkaConnectTaskFailed
expr: kafka_connect_worker_connector_failed_task_count > 0
for: 1m
labels:
severity: critical
annotations:
summary: "KafkaConnect connector in FAILED state"
To verify that alerts are active:
oc get prometheusrule kafka-cdc-alerts -n openshift-cluster-observability-operator -o yaml
Official Documentation
-
OpenShift Monitoring — Integrated monitoring with Prometheus and Alertmanager
-
Cluster Observability Operator — Multi-signal observability on OpenShift
-
Red Hat OpenShift Service Mesh — Service Mesh with Istio, including ambient mode
-
Grafana Documentation — Dashboards and metric visualization
-
Kiali Documentation — Service Mesh observability console
-
Prometheus Documentation — Monitoring and alerting
-
Kafka Metrics and Monitoring — Export Kafka metrics to Prometheus