Enterprise Case: Oil and Gas
Enterprise Architecture — IoT/SCADA in Oil and Gas
The same CDC pattern implemented in this workshop applies directly to mission-critical scenarios in the oil and gas industry, where IoT sensors and SCADA systems generate continuous data that must be processed in real time.
Components and Adaptation
| Workshop Component | Oil and Gas Adaptation | Configuration |
|---|---|---|
PostgreSQL (cdcdb) |
Historian DB (OSIsoft PI / Aveva) |
Sensor tables with high-precision timestamps |
Debezium (PostgresConnector) |
Debezium (PostgresConnector or custom) |
CDC over sensor reading tables |
Kafka (3 brokers) |
Kafka (5+ brokers) |
More partitions for high sensor throughput |
Camel (notifications) |
Camel (anomaly processing) |
Content-based routing by sensor type and threshold |
Mailpit (email) |
PagerDuty / OpsGenie (critical alerts) |
Integration with on-call systems |
Topics for Oil and Gas
Workshop topic adaptation for a monitoring scenario:
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaTopic
metadata:
name: oilgas.sensors.pressure
namespace: kafka-oilgas
labels:
strimzi.io/cluster: oilgas-cluster
spec:
partitions: 12
replicas: 3
config:
retention.ms: 2592000000
cleanup.policy: delete
---
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaTopic
metadata:
name: oilgas.sensors.temperature
namespace: kafka-oilgas
labels:
strimzi.io/cluster: oilgas-cluster
spec:
partitions: 12
replicas: 3
config:
retention.ms: 2592000000
cleanup.policy: delete
---
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaTopic
metadata:
name: oilgas.alerts.critical
namespace: kafka-oilgas
labels:
strimzi.io/cluster: oilgas-cluster
spec:
partitions: 3
replicas: 3
config:
retention.ms: 7776000000
cleanup.policy: delete
Key differences:
-
12 partitions for sensors — higher throughput for thousands of concurrent sensors
-
30-day retention for sensor data — regulatory compliance
-
90-day retention for critical alerts — auditing and incident investigation
Schema Registry for Sensors
Apicurio Registry validates sensor event schemas, ensuring consistency:
{
"type": "object",
"properties": {
"sensor_id": {"type": "string"},
"well_id": {"type": "string"},
"reading_type": {"type": "string", "enum": ["pressure", "temperature", "flow_rate"]},
"value": {"type": "number"},
"unit": {"type": "string"},
"timestamp": {"type": "string", "format": "date-time"},
"quality": {"type": "integer", "minimum": 0, "maximum": 100}
},
"required": ["sensor_id", "well_id", "reading_type", "value", "timestamp"]
}
Camel Route for Anomaly Detection
Workshop Camel route adaptation for processing sensor data:
- route:
id: sensor-anomaly-detection
from:
uri: kafka:oilgas.sensors.pressure
parameters:
brokers: oilgas-cluster-kafka-bootstrap.kafka-oilgas.svc:9093
groupId: anomaly-detector
securityProtocol: SASL_SSL
saslMechanism: SCRAM-SHA-512
steps:
- unmarshal:
json:
library: jackson
- choice:
when:
- simple: "${body[value]} > 3500"
steps:
- setBody:
simple: |
{"sensor":"${body[sensor_id]}",
"well":"${body[well_id]}",
"value":${body[value]},
"severity":"CRITICAL",
"message":"Presion excede umbral critico"}
- marshal:
json:
library: jackson
- to:
uri: kafka:oilgas.alerts.critical
otherwise:
steps:
- log:
message: "Sensor ${body[sensor_id]} reading normal: ${body[value]}"
How it Works
From sensor to alert: real-time pipeline
The data flow from a field sensor to an operational alert follows this chain:
-
A pressure/temperature sensor connected via Modbus/OPC-UA sends readings to the SCADA system (e.g. OSIsoft PI) every 1–5 seconds.
-
The SCADA persists readings in its Historian DB (PostgreSQL or proprietary database). Each INSERT is a new datapoint with
sensor_id,value,timestamp, andquality. -
Debezium captures the INSERT via the WAL and publishes to the corresponding topic (
oilgas.sensors.pressure). With 12 partitions, the system supports thousands of concurrent sensors partitioned bysensor_id. -
Apache Camel consumes the event, deserializes the JSON, and applies content-based routing: if
value > threshold, it generates an alert event to theoilgas.alerts.criticaltopic. Otherwise it logs the normal reading. -
Alert events trigger notifications via PagerDuty/OpsGenie for the on-call team.
-
In parallel, all events (normal and anomalous) are persisted in a Data Lake (S3/ODF) for historical analysis and ML model training.
Key differences vs. the workshop
| Aspect | Workshop (CDC) | Enterprise (Oil & Gas) |
|---|---|---|
Throughput |
~10 events/min (manual INSERTs) |
~10,000 events/sec (automatic sensors) |
Partitions |
3 (sufficient for demo) |
12+ (parallelism for high throughput) |
Retention |
7 days (demo) |
30–90 days (regulatory compliance) |
Action |
Email via Mailpit |
PagerDuty + Data Lake + ML pipeline |
Security |
SCRAM + TLS (demo) |
SCRAM + TLS + per-plant ACLs + encryption at rest |
Enterprise Considerations
| Requirement | Implementation |
|---|---|
Latency < 1s |
Kafka + Debezium provides sub-second end-to-end latency |
Compliance |
Configurable retention per topic, versioned schemas in Apicurio |
Geo-distribution |
MirrorMaker 2 for replication across plants/regions |
Security |
SCRAM-SHA-512 + TLS + ACLs per sensor/plant |
Scalability |
Add partitions and brokers without downtime |
Observability |
Grafana dashboards + PrometheusRule alerts + Kiali traffic |
Oil and Gas Specific Alerts
Extension of the workshop PrometheusRule for sensors:
- alert: SensorDataLagHigh
expr: kafka_consumergroup_lag{topic=~"oilgas.sensors.*"} > 5000
for: 2m
labels:
severity: critical
annotations:
summary: "Lag critico en datos de sensores"
- alert: NoSensorData
expr: rate(kafka_server_brokertopicmetrics_messagesin_total{topic=~"oilgas.sensors.*"}[5m]) == 0
for: 5m
labels:
severity: critical
annotations:
summary: "Sin datos de sensores — posible desconexion SCADA"
Official Documentation
-
Red Hat Streams for Apache Kafka — Real-time sensor data streaming
-
Red Hat build of Debezium — CDC for change capture in SCADA systems
-
Red Hat build of Apache Camel — Integration with legacy and IoT systems
-
Red Hat OpenShift Data Foundation — Persistent storage for data lakes
-
Red Hat OpenShift Container Platform — Edge and cloud deployment platform