Troubleshooting

Production lessons from fleet GitOps, ambient mesh, and centralized observability. See also ebook Ch.15 matrix (adapted below).

RHDP fleets: Start with the RHDP install playbook for install order, spoke token anti-patterns, console-link 503s during the first hour, and operator bootstrap blockers.

Verification scripts — what to run when something fails

Use this order on the hub after install or when the fleet looks unhealthy:

Step	Command	If it fails
1	`oc get managedclusters`	Spokes not Available → ACM import; do not put tokens in auto-syncing `field-content`
2	`bash scripts/verify-fleet.sh`	Missing `fleet-values-sync` CronJob → sync clustergroup; missing Skupper links → import spokes first
3	`bash scripts/argocd-preflight.sh`	Helm lint / path errors → fix chart before `oc apply`; run locally in CI
4	`python scripts/verify-gitops-strategies.py`	PUSH/PULL partition broken → check `fleet-spoke-push` ApplicationSet and spoke `field-content`
5	`MIN_OK_CODE=200 bash scripts/verify-console-links.sh`	503 → backends still syncing (playbook); 403 on ODS → `oc login` first

`verify-fleet.sh` output

Line	Healthy	Unhealthy
`ManagedClusters`	`east`, `west` Available=True	Missing or `False` — finish ACM import
`Hub Argo CD applications`	Long list of apps	Empty — not on hub or GitOps not installed
`fleet-values-sync`	CronJob present	`not deployed yet` — hub clustergroup not synced
`Skupper`	`skupperlinks` rows	Empty until spokes join mesh

`argocd-preflight.sh`

Offline checks before push: Helm lint on all charts/all/*, region bootstrap charts, and verify-gitops-strategies.py. FAIL on lint means invalid YAML or Chart.yaml — fix in Git, not on cluster.

`verify-gitops-strategies.py`

Confirms east/west values declare explicit chart path and that PUSH component IDs match acm-hub-spoke gitops strategy config.

Full product matrix: Validation Guide.

Symptom matrix

Symptom	Likely cause	Fix
Hub console links 503 (Developer Hub, GitLab, ODS, Skupper)	Backends still syncing or missing deps (catalog CM, SCC, Site)	Wait 60–90 min; see install playbook sections per product
OpenShift AI link 403 in curl	OAuth-protected dashboard	Log in with `oc login`; script uses `oc whoami -t` bearer token
East/west namespaces Terminating / recreating	Spoke tokens in auto-syncing `field-content` while import fails, or Namespace pre-created with `managedCluster` label before `ManagedCluster`	Remove tokens from GitOps values; import via ACM UI or chart order (`ManagedCluster` first); `fleet-values-sync` = domains only
`acm-operator` stuck, no MCH	CRD not ready before PostSync	`helm template acm charts/all/acm-operator \\| oc apply -f -`
RHODS / COO CSV multiple operatorgroups	Duplicate OG from subscription + `operatorGroup: true`	Remove duplicate OG flag in hub values
ArgoCD apps show Unknown sync status	ACM 2.16 CRD schema bug	Add resource exclusion for `clusterview.open-cluster-management.io`; see below
`upstream connect error` / 503 on mesh routes	HBONE port 15008 not configured (pod before ztunnel)	Restart pods in ambient namespaces; ensure ambient labels at sync-wave 2 after Istio/ZTunnel
ApplicationSet Degraded: both name and server	Stale `destination.server` from older template (SSA)	Delete/recreate ApplicationSet or set `server: ""` in template
ACM UI: no Argo applications created	ApplicationSet missing `cluster.open-cluster-management.io/placement` label	Label ApplicationSet + child Apps; verify with `oc get applications -n openshift-gitops \\| grep spoke`
Kiali: `Unauthorized` on east/west	Stale `kiali-multi-cluster-secret` or expired spoke token	Delete aggregate secret; run token-sync job; restart Kiali pod
Kafka Console: `/api/kafkas` 404	External route hits UI only; Next.js does not proxy `/api`	Enable `apiRoute` in `charts/all/kafka-console`; verify HTTP 200 on `/api/kafkas`
Strimzi entity-operator CrashLoop	mTLS on 9091 conflicts with ztunnel	Exclude operator namespace from ambient or use documented Strimzi tuning
Skupper listener not Ready	Site or token not synced	Check `oc get site,listener -n service-interconnect` on hub and spoke
GitOpsCluster: legacy secret not found	ACM hasn’t created cluster secret yet	Wait 5-10 min; check klusterlet on spoke; verify ManagedCluster is Joined
Kuadrant `/kuadrant`: failed to fetch APIProducts	K8s RBAC or CRD group	Sync `developer-hub`; ClusterRole `developer-hub-kuadrant` needs `devportal.kuadrant.io` + `gateway.networking.k8s.io` gateways/httproutes
AuthPolicy Not Accepted / `MissingDependency`	Wrong `ISTIO_GATEWAY_CONTROLLER_NAMES` or operator started before mesh	Sync `rhcl-operator` subscription config; restart `kuadrant-operator-controller-manager`; see Connectivity Link
API key works in console but not httpbin	AuthPolicy selector `app` ≠ APIProduct name	Match `app` label to APIProduct (e.g. `workshop-mcp-gateway`, `workshop-llm-tokens`)
API Overview: Expected object at root, got string	Incomplete OpenAPI in catalog entity	Ensure API entities have valid `definition` with `paths`; fix `$text` file refs in `reading.allow`
TechDocs tab 404 / builder not local	`techdocs.builder: external` or missing mkdocs	Set `builder: local` in app-config; scaffolded repos need `mkdocs.yml` + `backstage.io/techdocs-ref: dir:.`
Quay org-setup Job failing	`/version` redirect, CSRF, or duplicate robot	Use GitOps `setup.py` with `/discovery` + bearer token; see Quay
DevSpaces link on hub 404	DevSpaces is spoke-only	Open `https://devspaces.<east-or-west-domain>` from template output
MCP Gateway 503 / `/mcp` 404	PostSync not run	Refresh `hub-post-install-bootstrap`; check job `hub-post-install-workshop-surfaces`
Developer Hub /lightspeed chat 401	Missing MaaS key	`maas-facilitator-seed` in `vault` + refresh `vault-maas-external-secrets`
NeuroFace /api/chat 401	ESO secret placeholder	RHDP `litemaas.apiKey` or `maas-facilitator-seed`; PostSync `neuroface-maas-key-sync`
GitLab UI 503/500	GitLab still starting or hub undersized	Wait for `GitLab` CR Ready; verify hub 4×16/64; `bash scripts/verify-node-capacity.sh`
Argo `gitlab-operator` Missing (runner resources)	`gitlab-runner` namespace removed	Set `runnerEnabled: false`; remove stuck finalizer from Runner CR
Argo operation stuck on `Subscription/gitlab-runner-operator`	Old operation not terminated	`oc patch application gitlab-operator -n openshift-gitops --type merge -p '{"operation":null}'`
TechDocs `FetchUrlReader does not implement readTree`	GitHub Pages URL in `techdocs-ref`	Use `url:https://github.com/<owner>/<repo>/tree/main/<path>`
Developer Hub GitLab API `/repos/` 404	GitLab host in `integrations.github`	Remove from `github` block — only in `integrations.gitlab`
`workshop-kuadrant-sync-plans` job immutable error	Existing Job cannot be updated	`oc delete job workshop-kuadrant-sync-plans -n workshop-kuadrant-apis --ignore-not-found` before apply
`unsealvault-cronjob` Init:Error flooding	Vault already initialized, token lacks write permissions	`oc patch cronjob unsealvault-cronjob -n imperative --type merge -p '{"spec":{"suspend":true}}'`
hub-post-install-bootstrap workshop-surfaces CrashLoop	SA lacks `bind`/`escalate` RBAC	Chart includes rule; break deadlock with `oc patch application ... --type merge -p '{"operation":null}'` then patch ClusterRole live
Orphan apps in `default`	`helm template \\| oc apply` without `-n`	Delete orphan stack; always sync via Argo CD (namespace in Application spec)
workshop-apis 401 without key	Expected (Kuadrant AuthPolicy)	Request key at Developer Hub `/kuadrant`
Developer Hub Kuadrant tab missing / catalog parse error	Catalog ConfigMap truncated to hub domain only	Re-sync `developer-hub` chart ≥ v1.5.1; verify `oc get cm developer-hub-catalog-workshop-kuadrant-apis -n developer-hub -o yaml \\| grep 'kind: API'` returns 4 lines
Vault console link 307	href points to route root	Use `/ui/` — see install playbook
ESO `ClusterSecretStore` not ready / `context deadline exceeded`	OpenShift ESO netpol allows Vault egress on :443 only; in-cluster Vault listens on :8200	Chart ships `allow-vault-maas-egress-8200` in `vault-maas-external-secrets`; re-sync app
Camel `mqtt-to-kafka` Error, Kafka metadata timeout	Missing advertised EndpointSlice or ambient ztunnel on Kafka TCP	EndpointSlice + `deployment` trait `istio.io/dataplane-mode: none`; see below
Stormshift MirrorMaker2 CrashLoop	Empty `clusterName` → `broker-0-.`	Set `clusterName: east\|west` in spoke app values

ArgoCD Unknown sync status (ACM 2.16)

Symptom: All ArgoCD applications show “Unknown” sync status in the UI, even though they are healthy and syncing correctly.

Error message:

SchemaError(github.com/stolostron/cluster-lifecycle-api/clusterview/v1alpha1.UserPermission.status): 
unknown model in reference

Cause: MCE ocm-proxyserver publishes aggregated clusterview OpenAPI with a broken UserPermission.status reference. Argo CD cannot load the hub OpenAPI cache, so apps show Unknown / ComparisonError. resourceExclusions alone does not fix this.

Verification: Applications still show Healthy health status and operationState.phase: Succeeded:

# Check actual operation state (should show "Succeeded")
oc get application <app-name> -n openshift-gitops \
  -o jsonpath='{.status.operationState.phase}'

# All apps healthy?
oc get applications -n openshift-gitops -o jsonpath='{range .items[*]}{.metadata.name}: {.status.health.status}{"\n"}{end}' | grep -v Healthy

Automated fix (Git): charts/all/openshift-gitops — acmArgocdOpenapiFix (enabled by default): scales ocm-proxyserver to 0, deletes its APIServices, PostSync Job + CronJob, restarts the application controller.

Manual one-shot:

oc scale deployment/ocm-proxyserver -n multicluster-engine --replicas=0
for name in v1.clusterview.open-cluster-management.io \
  v1alpha1.clusterview.open-cluster-management.io \
  v1beta1.proxy.open-cluster-management.io; do
  oc delete apiservice "$name" --ignore-not-found
done
oc rollout restart statefulset openshift-gitops-application-controller -n openshift-gitops

Note: Pair with acm-operator PostSync to disable cluster-proxy-addon. Success: 0 Unknown apps. Trade-off: clusterview UserPermission via proxy is unavailable; direct spoke APIs and ACM fleet inventory still work.

MCE cluster-proxy-addon (ACM 2.16+)

Symptom: Argo CD spoke apps use destination.server = cluster-proxy URL; proxy add-on conflicts with hub cluster-wide proxy or complicates GitOps debugging.

Default in ACM/MCE 2.16: cluster-proxy-addon component is enabled.

Automated fix (new installations): Chart charts/all/acm-operator runs PostSync Job + CronJob (acm-mce-disable-cluster-proxy) that sets MultiClusterEngine/spec.overrides.components[name=cluster-proxy-addon].enabled: false.

Verify:

oc get mce multiclusterengine -o jsonpath='{range .spec.overrides.components[*]}{.name}={.enabled}{"\n"}{end}' | grep cluster-proxy
# expect: cluster-proxy-addon=false

Disable automation: set mceDisableClusterProxyAddon: false in acm-operator values (hub clustergroup override if needed).

Limitation: Disabling the add-on does not always remove ocm-proxyserver in multicluster-engine — that deployment is a separate MCE component. Spoke ManagedClusterAddon/cluster-proxy on local-cluster may also need manual review if pod-log-via-proxy features are required.

Manual one-shot:

oc patch mce multiclusterengine --type=merge -p '{"spec":{"overrides":{"components":[{"name":"cluster-proxy-addon","enabled":false}]}}}'

For a full component list merge without dropping other overrides, use the Job script in charts/all/acm-operator/files/disable-cluster-proxy-addon.py.

HBONE port 15008 not configured

Symptom: Routes return upstream connect error or 503; ztunnel logs show missing HBONE listener for pod IP.

Cause: Workloads started before ambient enrollment or before ztunnel programmed iptables.

Fix:

Ensure namespaces get istio.io/dataplane-mode: ambient after Istio + IstioCNI + ZTunnel (wave 2 in servicemeshoperator3, not wave 1 namespaces).
Restart affected Deployments after mesh is Ready.
reconcileIptablesOnStartup: true on IstioCNI helps new nodes but does not retrofix running pods.

# charts/all/servicemeshoperator3 — ambient labels (wave 2)
metadata:
  labels:
    istio.io/dataplane-mode: ambient
  annotations:
    argocd.argoproj.io/sync-wave: "2"

ApplicationSet: both `name` and `server` defined

Symptom:

application destination spec is invalid: application destination can't
have both name and server defined: west https://kubernetes.default.svc

Cause: Older ApplicationSet template set server; Server-Side Apply does not remove fields the new manifest omits.

Fix:

# charts/all/acm-hub-spoke/templates/applicationset.yaml
destination:
  name: ''
  namespace: openshift-gitops
  server: ""   # explicit blank clears stale SSA

Then delete and let Argo CD recreate the ApplicationSet, or patch live spec to remove server.

Kiali multi-cluster Unauthorized

Symptom: Hub Kiali logs: Error fetching Namespaces for cluster [east]: Unauthorized.

Cause:

Expired token in spoke kiali-hub-export ConfigMap.
Legacy kiali-multi-cluster-secret still labeled kiali.io/multiCluster=true alongside kiali-remote-* secrets.

Fix:

# Hub
oc delete secret kiali-multi-cluster-secret -n openshift-cluster-observability-operator --ignore-not-found
oc create job kiali-token-refresh --from=cronjob/kiali-multicluster-token-sync \
  -n openshift-cluster-observability-operator
oc delete pod -n openshift-cluster-observability-operator -l app=kiali

On spokes, confirm export ConfigMap exists:

oc get cm kiali-hub-export -n openshift-cluster-observability-operator -o jsonpath='{.data.updatedAt}'

Kafka Console 404 on `/api/*`

Symptom: Browser or curl to https://kafka-console.<hub-domain>/api/kafkas returns Next.js HTML 404; in-pod console-api returns 200.

Cause: Operator Service targets UI port 3000 only; external route does not split /api to port 8080.

Fix: Deploy supplemental Route (GitOps: charts/all/kafka-console/templates/api-route.yaml):

spec:
  host: kafka-console.apps.hub.example.com
  path: /api
  to:
    kind: Service
    name: kafka-console-api-service
  port:
    targetPort: http   # 8080 on console-api container

Do not set haproxy.router.openshift.io/rewrite-target — the API expects the /api prefix.

Blank UI / NextAuth 404 on `/api/auth/*`

Symptom: Kafka Console page loads partially or stays blank; browser network tab shows 404 on /api/auth/providers; console-api logs show GET /api/auth/providers ... 404.

Cause: The supplemental /api Route sends all /api/* traffic to Quarkus. NextAuth runs in the UI container (Next.js) on port 3000, not in console-api.

Fix: Add a more specific Route /api/auth → kafka-console-console-service with port.targetPort: **3000** (not 80 — the Service’s EndpointSlice exposes pod port 3000). GitOps: charts/all/kafka-console/templates/api-route.yaml (kafka-console-ui-auth).

curl -sk -o /dev/null -w '%{http_code}\n' \
  https://kafka-console.<hub-domain>/api/auth/providers
# Expect 200

JSON `404` / code `4041` on cluster detail

Symptom: UI shows {"errors":[{"title":"Resource not found","status":"404","code":"4041"}]} when opening a Kafka cluster.

Cause: Valid API route, but the cluster id is unknown or the console-api cannot reach brokers (often west spoke offline → Skupper listener has no connector).

Checks:

# List works?
curl -sk https://kafka-console.<hub-domain>/api/kafkas

# Detail per cluster (replace id from list response)
curl -sk -o /dev/null -w '%{http_code}\n' https://kafka-console.<hub-domain>/api/kafkas/<id>

# West spoke up?
oc config use-context west
oc get applications spoke-interconnect-west -n openshift-gitops
oc get link -n service-interconnect

Fix: Restore west (or east) spoke apps and Skupper link; resync field-content-kafka-console for broker DNS EndpointSlices.

industrial-edge-tst Degraded (Camel / KServe)

Symptom: Argo CD app industrial-edge-tst-east (or -west) is Degraded with:

Integration/mqtt-to-kafka: dependency camel:mqtt not found in Camel catalog
InferenceService/anomaly-detection: stuck Progressing; sync waits for healthy state

Causes:

Camel K: Routes use paho: URIs; the catalog dependency is camel:paho, not camel:mqtt.
KServe: Chart ships InferenceService only when anomalyDetection.enabled: true. Default is false because spokes need ODH RawDeployment (no Serverless Operator), a MinIO model at s3://models/anomaly-detection/model, and a Ready DataScienceCluster. Threshold alerts still work via ie-anomaly-alerter without KServe.

Fix (GitOps):

# charts/all/industrial-edge-tst/templates/camel-integrations.yaml
dependencies:
  - camel:paho
  - camel:kafka

# charts/all/industrial-edge-data-science-cluster — edge RawDeployment
kserve:
  defaultDeploymentMode: RawDeployment
  serving:
    managementState: Removed
modelmeshserving:
  managementState: Removed

Verify Camel integration:

oc get integration mqtt-to-kafka -n industrial-edge-tst-all \
  -o jsonpath='{range .status.conditions[?(@.type=="Ready")]}{.status} {.message}{"\n"}{end}'

Enable ML inference later: upload model to MinIO, set anomalyDetection.enabled: true in spoke app values, sync industrial-edge-data-science-cluster then industrial-edge-tst.

Industrial Edge alerts not in Mailpit

Symptom: ie-anomaly-alerter logs show Failed to send mail: HTTP Error 503 or Mailpit UI is empty while MQTT anomalies appear in pod logs.

Causes:

Wrong hub domain on spokes — MAILPIT_URL must be https://mailpit.<hub-apps-domain>/api/v1/send, not the spoke’s own domain. Check:

oc get deploy ie-anomaly-alerter -n industrial-edge-tst-all \
  -o jsonpath='{.spec.template.spec.containers[0].env[?(@.name=="MAILPIT_URL")].value}{"\n"}'

ie-anomaly-alerter not deployed — Argo CD app Missing on east/west; apply with correct hubClusterDomain:

helm template ie charts/all/ie-anomaly-alerter \
  --set hubClusterDomain=apps.cluster-<hub-id>.dynamic2.redhatworkshops.io \
  --set clusterName=east | oc apply -f -

fleet-values-sync stale on ACM 2.16 — spoke domains were not derived when the job looked for kube-apiserver instead of apiserverurl.openshift.io. Re-run after chart fix:
```
oc create job --from=cronjob/fleet-values-sync fleet-values-sync-manual -n openshift-gitops
```

Verify: Mailpit route returns 200 on POST /api/v1/send; alerter logs Mail sent [...] -> 200.

Camel K 401 Unauthorized / ImagePullBackOff on internal registry: The PostSync Job camel-k-registry-bootstrap creates camel-k-registry-docker from the builder SA token and patches IntegrationPlatform + pull-secret trait. If the integration kit is stuck in Error, delete the Integration and IntegrationKit, then re-sync the app.

Camel K + Istio ambient (MQTT → Kafka silent failure): With istio.io/dataplane-mode: ambient on industrial-edge-tst-all, ztunnel intercepts Kafka broker TCP and Camel cannot complete metadata fetch. Git fix: deployment trait (not pod) sets istio.io/dataplane-mode: none on the integration Deployment.

# charts/all/industrial-edge-tst/templates/camel-integrations.yaml
traits:
  deployment:
    configuration:
      metadata:
        labels:
          istio.io/dataplane-mode: none

Kafka advertised DNS (EndpointSlice)

Symptom: Camel or MirrorMaker2 logs UnknownHostException for dev-cluster-broker-0-<clusterName>.<namespace>.svc or metadata request timeout.

Cause: Strimzi Kafka CR sets advertisedHost to a custom DNS name; clients resolve it via hub EndpointSlice objects that Skupper/kafka-console charts create. If clusterName is empty in spoke values, broker hostnames are invalid (broker-0-.).

Fix:

Set clusterName: east|west in charts/region/east|west/values.yaml for IE tst, stormshift, datalake apps.
Verify EndpointSlices exist on hub for each broker advertised name.
Re-sync field-content-kafka-console if west/east broker lists are stale.

oc get endpointslices -A | grep kafka-brokers-advertised
oc get kafka -n industrial-edge-tst-all -o yaml | grep -A2 advertisedHost

MCP Gateway (Argo Unknown)

Symptom: https://mcp-gateway.<hub-domain>/mcp returns 503 or 404; Argo app mcp-gateway sync Unknown.

Cause: ACM 2.16 schema bug blocks Application sync; MCPServerRegistration CRDs and routes never land.

Fix:

oc annotate application hub-post-install-bootstrap -n openshift-gitops argocd.argoproj.io/refresh=hard --overwrite
curl -sk -o /dev/null -w '%{http_code}\n' https://mcp-gateway.<hub-domain>/mcp
# Expect 200

spoke-gateway Degraded (`modelmesh-serving` not found)

Symptom: Argo CD app spoke-gateway-east (on the east cluster) shows HTTPRoute ie-anomaly-detection Degraded.

Cause: Optional KServe/ModelMesh route points at a backend that is not Ready yet (or ML stack not installed).

Fix (GitOps): charts/all/spoke-gateway/values.yaml sets inferenceRoute.enabled: false by default. Enable only after InferenceService is Ready and set backend namespace to redhat-ods-applications when using cluster-scoped ModelMesh.

MaaS / Lightspeed / NeuroFace 401

Symptom: Developer Hub /lightspeed loads but chat fails with 401 or empty response; NeuroFace /api/chat returns 401.

Cause: MaaS API keys not injected — secrets contain CHANGEME-inject-via-RHDP or Lightspeed sync Job skipped.

Fix:

export MAAS_KEY_LLAMA='sk-...'
export MAAS_KEY_GRANITE='sk-...'
oc create secret generic maas-facilitator-seed -n vault --from-literal=api-key='sk-...'
oc rollout restart deployment/developer-hub -n developer-hub
oc rollout restart deployment/neuroface -n neuroface

Verify:

oc get secret kairos-ai-credentials -n kairos-system -o jsonpath='{.data.api-key}' | base64 -d | wc -c
curl -sk -X POST https://neuroface.<hub-domain>/api/chat \
  -H 'Content-Type: application/json' \
  -d '{"messages":[{"role":"user","content":"hi"}]}'

Lightspeed model: defaults to MaaS granite-3-2-8b-instruct (plugins.lightspeed.aiModel in developer-hub chart). Requires valid key in llama-stack-secrets / Kairos sync.

Camel Dashboard (spoke console plugin)

Symptom: No Camel tab in the OpenShift console on east/west, or Argo app camel-dashboard-openshift-all-{east,west} OutOfSync.

GitOps: Vendored wrapper charts/all/camel-dashboard-openshift (umbrella 4.20.2 in charts/*.tgz), namespace camel-dashboard, sync wave 3 (see charts/region/east/values.yaml, charts/region/west/values.yaml). Avoids Argo DeadlineExceeded when spokes cannot reach the public Helm repo in time.

Post-sync (cluster-admin, once per spoke): Administration → Cluster settings → Console → enable the Camel Dashboard console plugin. Argo ignores ConsolePlugin.spec.enablement so manual enablement does not fight GitOps.

Camel K vs CamelApp: Industrial Edge uses Camel K Integration resources (e.g. mqtt-to-kafka). The dashboard operator primarily manages CamelApp CRs. Integrations may not appear in the Camel tab until you register them as CamelApp or add a bridge; use Topology/Kamelet views for Camel K workloads in the meantime.

Symptom: Failed to get a valid plugin manifest from /api/plugins/camel-dashboard-console/

Cause: The camel-dashboard-console Service has no endpoints — usually app.kubernetes.io/instance on the Service selector does not match the Deployment pod labels (e.g. after helm template + oc apply with release name camel-dashboard instead of camel-dashboard-openshift-all-{east,west}).

Fix:

# Endpoints must be non-empty
oc get endpointslices -n camel-dashboard -l kubernetes.io/service-name=camel-dashboard-console -o yaml | grep -A3 addresses

# Align selector with running pods (or re-sync Argo with helm.releaseName set in spoke templates)
oc get svc camel-dashboard-console -n camel-dashboard -o jsonpath='selector={.spec.selector}{"\n"}'
oc get pod -n camel-dashboard -l app=camel-dashboard-console -o jsonpath='instance={.items[0].metadata.labels.app\.kubernetes\.io/instance}{"\n"}'

# Test manifest from inside the cluster
oc run curl-camel --rm -i --restart=Never -n camel-dashboard \
  --image=registry.redhat.io/ubi9/ubi-minimal:latest -- \
  curl -sk https://camel-dashboard-console.camel-dashboard.svc:9443/plugin-manifest.json

Prefer Argo CD sync (not manual helm apply) so releaseName: camel-dashboard-openshift-all-{cluster} matches Service and Deployment labels.

Checks:

oc get application camel-dashboard-openshift-all-east -n openshift-gitops -o jsonpath='{.status.sync.status}{" "}{.status.health.status}{"\n"}'
oc get deployment -n camel-dashboard
oc get consoleplugin | grep -i camel

Air-gapped spokes: mirror the Helm repo or chart tgz internally and point repoURL / targetRevision in spoke values.yaml.

Helm template error (Hawtio disabled): If Argo reports index of nil pointer on hawtio-online-console-plugin, ensure spoke valuesObject includes stub plugin.service.port and gateway.service.port (see east/templates/component-applications.yaml).

East spoke Unknown apps: If east-spoke-components was removed from the hub, re-sync acm-hub-spoke so ApplicationSet fleet-spoke-push recreates it (see GitOps deployment chain).

east-spoke-components missing but west exists: Placement includes east but Argo has no east-application-manager-cluster-secret — spoke was imported via ACM UI without KlusterletAddonConfig. Sync acm-hub-spoke (chart creates KAC per managedClusters key) or apply KAC manually; wait for application-manager addon, then refresh ApplicationSet fleet-spoke-push.

east-spoke-components stuck Progressing: Usually waiting on devspaces-east (CheCluster InstallOrUpdateFailed while chePhase: Active). Fixes: delete orphan east-devspaces on the spoke (duplicate of devspaces-east, often with deletionTimestamp); ensure only devspaces from charts/region/east/values.yaml exists. Git: ignoreDifferences on CheCluster status + argocd.argoproj.io/skip-health-check on the CheCluster CR. Then oc patch application east-spoke-components -n openshift-gitops --type json -p='[{"op":"remove","path":"/operation"}]' and re-sync.

Cannot find ApplicationSet in ACM UI: ACM Applications lists Application CRs only. Use oc get applicationset fleet-spoke-push -n openshift-gitops on the hub, or open OpenShift GitOps → ApplicationSets. Child apps like industrial-edge-tst on the east spoke come from charts/region/east/values.yaml (PULL), not from the ApplicationSet template directly.

Argo CD: where applications live

Cluster	Namespace	Examples
Hub	`openshift-gitops`	`field-content-*`, `east-spoke-components`, `west-spoke-components`
East spoke	`openshift-gitops`	`camel-dashboard-openshift-all-east`, `operators-east`, `spoke-gateway-east`, `spoke-interconnect-east`
West spoke	`openshift-gitops`	`camel-dashboard-openshift-all-west`, `operators-west`, `spoke-gateway-west`, `spoke-interconnect-west`

Parent apps use destination.server = cluster-proxy URL. Child apps on spokes use https://kubernetes.default.svc.

Symptom: entity-operator CrashLoopBackOff after enabling ambient on Kafka namespaces.

Cause: Double encryption or ztunnel intercept on internal replication port 9091.

Fix: Keep Kafka control-plane namespaces off ambient where documented, or follow Strimzi + OSSM ambient guidance for your version.

Validation Guide — quick health checks and component validation
Bill of Materials — operator versions and compatibility
Service Mesh sync waves
Architecture sync-wave table
Getting Started
Support Policy — community support channels

Troubleshooting

Verification scripts — what to run when something fails

verify-fleet.sh output

argocd-preflight.sh

verify-gitops-strategies.py

Symptom matrix

ArgoCD Unknown sync status (ACM 2.16)

MCE cluster-proxy-addon (ACM 2.16+)

HBONE port 15008 not configured

ApplicationSet: both name and server defined

Kiali multi-cluster Unauthorized

Kafka Console 404 on /api/*

Blank UI / NextAuth 404 on /api/auth/*

JSON 404 / code 4041 on cluster detail

industrial-edge-tst Degraded (Camel / KServe)

Industrial Edge alerts not in Mailpit

Kafka advertised DNS (EndpointSlice)

MCP Gateway (Argo Unknown)

spoke-gateway Degraded (modelmesh-serving not found)

MaaS / Lightspeed / NeuroFace 401

Camel Dashboard (spoke console plugin)

Argo CD: where applications live

Related docs

`verify-fleet.sh` output

`argocd-preflight.sh`

`verify-gitops-strategies.py`

ApplicationSet: both `name` and `server` defined

Kafka Console 404 on `/api/*`

Blank UI / NextAuth 404 on `/api/auth/*`

JSON `404` / code `4041` on cluster detail

spoke-gateway Degraded (`modelmesh-serving` not found)