This document describes actionable ways to extend the AI Computer Vision pattern for production or proof-of-concept environments. Each scenario includes context, the exact files and values to change, and a validation command.
Per-user OIDC Gateway routing (ApplicationSet)
Context
Workshop users scaffold personal NeuroFace instances via Developer Hub. Each instance gets a unique HTTPRoute path (/user/userN/) on the shared spoke Gateway and an RHCL AuthPolicy with OIDC pointing to the user’s RHBK biometric realm.
What to change
Scaffolding template:
charts/all/developer-hub/files/software-templates/ai-computer-vision.yamlSkeleton manifests:
charts/all/developer-hub/files/software-templates/ai-cv-skeleton/k8s/ApplicationSet:
charts/all/developer-hub/templates/applicationset-user-neuroface.yaml(hubvp-gitops, matrix SCM +k8s/target-cluster.yaml→ spokeeast/west)Gateway cross-namespace routes:
charts/all/spoke-neuroface/templates/gateway.yaml(allowedRoutes.namespaces.from: All)
Validation
$ oc get applicationset -n vp-gitops user-neuroface-apps
$ oc get application -n vp-gitops | grep neuroface-
$ oc get application -n vp-gitops neuroface-user1 -o jsonpath='{.spec.destination.name}{"\n"}'
$ oc get httproute,authpolicy -n neuroface-user1Adding a third spoke (south) for a new region
Context
ws-workshop/neuroface-* repos; each Application targets spoke east or west from k8s/target-cluster.yaml; HTTPRoute and AuthPolicy exist in the user namespace.You operate inference in three geographic regions and want a third edge cluster (south) registered in RHACM alongside east and west.
What to change
Create
values-south.yamlby copyingvalues-east.yaml:
$ cp values-east.yaml values-south.yamlEdit
values-south.yamland replace all east-specific references:
main:
clusterGroupName: south (1)
clusterGroup:
name: south (2)
applications:
servicemesh-config:
overrides:
- name: clusterName
value: south (3)
- name: clusterRole
value: spoke
observability:
overrides:
- name: clusterName
value: south
- name: clusterSuffix
value: "-south" (4)
spoke-neuroface:
overrides:
- name: clusterName
value: south| 1 | Sets the cluster group for the VP Operator Pattern CR |
| 2 | Sets the ArgoCD project and namespace prefix |
| 3 | All clusterName overrides must be south |
| 4 | Observability uses the suffix for Thanos and Grafana datasource labels
|
managedClusterGroups:
east:
name: east
acmlabels:
- name: clusterGroup
value: east
west:
name: west
acmlabels:
- name: clusterGroup
value: west
south: # <-- add this block
name: south
acmlabels:
- name: clusterGroup
value: southUpdate
charts/all/neuroface-gateway/values.yamlto add a south backend:
clusters:
east:
domain: ""
west:
domain: ""
south: # <-- add
domain: ""Update
charts/all/neuroface-gateway/templates/httproute.yamlto include the south backend with appropriate weight (for example, 33/33/34).Update
charts/all/observability/values.yamlto include south cluster token sync.
Files affected
values-south.yaml(new)values-hub.yaml(addmanagedClusterGroups.south)charts/all/neuroface-gateway/values.yamlandtemplates/httproute.yamlcharts/all/observability/values.yamlandtemplates/multi-cluster-secret.yaml
Validation
$ oc label managedcluster <south_cluster_name> clusterGroup=south --overwrite
$ oc get managedcluster <south_cluster_name> -o jsonpath='{.metadata.labels.clusterGroup}{"\n"}'
$ oc get applications.argoproj.io -n vp-gitops | grep southsouth
south-spoke-neuroface Synced Healthy
south-spoke-neuroface-cv Synced Healthy
south-spoke-interconnect Synced HealthyReplacing YOLO with a custom model served by OVMS
Context
Your organization trained a proprietary object-detection model and wants to serve it through OVMS instead of the default YOLO weights.
What to change
Store your model artifacts in an S3-compatible bucket or PVC accessible from the spoke. OVMS expects the following directory structure:
<model_name>/
1/
model.xml
model.binEdit
charts/all/spoke-neuroface-cv/values.yamlto point to your model:
ovms:
modelName: my-custom-model # <-- your model name
modelPath: my-custom-model/ # <-- path in S3 bucket or PVC
targetDevice: CPU # or GPU if available
resources:
requests:
memory: 2Gi # adjust based on model size
limits:
memory: 4GiIf your model uses a different input/output format than YOLO, update the NeuroFace application configuration in
charts/all/spoke-neuroface/values.yamlto match the new inference API contract.If you use KServe
InferenceServiceinstead of standalone OVMS, create a new template incharts/all/spoke-neuroface-cv/templates/using theserving.kserve.io/v1beta1API.
Files affected
charts/all/spoke-neuroface-cv/values.yaml(model path, name, resources)charts/all/spoke-neuroface-cv/templates/(model mount and environment variables)charts/all/spoke-neuroface/values.yaml(inference endpoint URL if changed)Optional:
values-east.yamlandvalues-west.yamloverrides for per-spoke model URLs
Validation
$ oc get pods -n neuroface-cv
$ oc logs deploy/ovms -n neuroface-cv --tail=20Model loaded successfully$ curl -sk -X POST https://neuroface-cv.apps.<hub_domain>/v2/models/<model_name>/infer \
-H 'Content-Type: application/json' -d @sample-request.jsonEnabling GPU workers on spokes
Context
CPU inference meets functional requirements but latency exceeds your SLA. You add GPU worker nodes on east and west spokes for accelerated model serving.
What to change
Provision GPU worker nodes. On AWS, create a MachineSet with GPU instances:
$ oc get machinesets -n openshift-machine-apiCreate a new MachineSet (or edit an existing one) with instance type g4dn.xlarge (NVIDIA T4) or g5.2xlarge (NVIDIA A10G). See the OpenShift documentation for AWS MachineSets.
Install the NVIDIA GPU Operator. Add subscriptions to
values-east.yamlandvalues-west.yaml:
subscriptions:
nfd:
name: nfd
namespace: openshift-nfd
channel: stable
source: redhat-operators
gpu-operator:
name: gpu-operator-certified
namespace: nvidia-gpu-operator
channel: v24.9
source: certified-operatorsAdd the corresponding namespaces:
namespaces:
openshift-nfd:
operatorGroup: true
targetNamespaces: []
nvidia-gpu-operator:
operatorGroup: true
targetNamespaces: []Update OVMS resource requests in
charts/all/spoke-neuroface-cv/values.yaml:
ovms:
targetDevice: GPU
resources:
limits:
nvidia.com/gpu: 1 # <-- request one GPU
memory: 4Gi
requests:
memory: 2GiUpdate cluster sizing documentation in
pattern-metadata.yaml:
external_requirements:
cluster_sizing_note: >
GPU workers: g4dn.xlarge (T4) or g5.2xlarge (A10G) on spokes.Files affected
values-east.yamlandvalues-west.yaml(subscriptions + namespaces for NFD and GPU Operator)charts/all/spoke-neuroface-cv/values.yaml(GPU resource requests)pattern-metadata.yaml(sizing note)Spoke cluster MachineSet (platform-specific)
Validation
$ oc get nodes -l nvidia.com/gpu.present=trueEnabling GPU workers on the hub (real local model download and serving)
Context
You want to download and serve real LLM weights locally (not just proxy the external RHDP endpoint) to test Models-as-a-Service end to end, including inference latency and GPU utilization — on a hub-only install (no spokes needed). This is a separate concern from the spoke GPU section above: spoke GPUs accelerate the YOLO PPE model; hub GPUs host real chat-completion LLMs via vLLM.
What to change
This ships as an opt-in overlay (values-hub-gpu.yaml at the repo root)
so the default, CPU-only install (values-hub.yaml) is completely
unaffected — nothing here changes unless you explicitly load the overlay:
$ EXTRA_HELM_OPTS='-f /path/to/values-hub-gpu.yaml' ./pattern.sh make installThe overlay adds:
openshift-nfd/nvidia-gpu-operatornamespaces and subscriptions (same Node Feature Discovery + NVIDIA GPU Operator pattern as the spoke section above, applied to the hub instead)gpu.enabled: "true"override forcharts/all/openshift-ai-hub, which activatestemplates/gpu-vllm-models.yaml: aServingRuntimeInferenceService+ PVC-backed Hugging Face cache per model, downloading weights directly from Hugging Face on first start (no S3/OCI bucket needed)
Default model list (charts/all/openshift-ai-hub/values.yaml → gpu.models)
uses official, Red Hat-validated FP8-quantized checkpoints from
huggingface.co/RedHatAI (LLM Compressor
+ vLLM, ~50% smaller than fp16 with ~97-101% accuracy recovery per Red Hat’s
published benchmarks) instead of community/upstream repos, sized for a
single AWS g6.12xlarge (4x NVIDIA L4, 24 GiB VRAM each):
| Model | Hugging Face ID (RedHatAI, FP8-dynamic) | GPUs | Notes |
|---|---|---|---|
|
| 1 | ~7 GiB, fits one L4 comfortably |
|
| 1 | ~8 GiB, fits one L4 comfortably |
|
| 1 | ~14 GiB — fp16 (~28 GiB) would need 2 GPUs; FP8 fits a single 24 GiB L4 |
Only 3 of the 4 available GPUs are used, leaving headroom for a larger
--max-model-len, a 4th model, or RedHatAI variants of llama-scout-17b
(really Llama 4 Scout, a 109B-total/17B-active MoE model) — Red Hat does
publish an FP8 checkpoint (RedHatAI/Llama-4-Scout-17B-16E-Instruct-FP8-dynamic),
but its own reference command uses --tensor-parallel-size 8, well beyond a
single g6.12xlarge. This pattern keeps proxying it from the external RHDP
endpoint instead of self-hosting it.
|
This is a starter template, not a production-hardened deployment. Verify
|
Files affected
values-hub-gpu.yaml(new, opt-in overlay — never merged intovalues-hub.yaml)charts/all/openshift-ai-hub/values.yaml(gpu.*block, defaultenabled: false)charts/all/openshift-ai-hub/templates/gpu-vllm-models.yaml(new template, gated ongpu.enabled)
Validation
$ oc get nodes -l nvidia.com/gpu.present=true
$ oc get servingruntime,inferenceservice -n gpu-models
$ oc get pvc -n gpu-modelsSingle spoke deployment (proof of concept)
Context
For a proof-of-concept environment, you want to deploy only the hub and one spoke (east), omitting the west cluster entirely.
What to change
Remove the west managed cluster group from
values-hub.yaml:
managedClusterGroups:
east:
name: east
acmlabels:
- name: clusterGroup
value: east
# west: # <-- comment out or delete
# name: west
# acmlabels:
# - name: clusterGroup
# value: westUpdate RHCL HTTPRoute weights to send all traffic to east. In
charts/all/neuroface-gateway/values.yaml:
gateway:
weights:
east: 100
west: 0Skip the west spoke installation entirely. Do not create a Pattern CR on the west cluster.
The pattern still functions: inference traffic goes 100% to east, Grafana shows only east metrics, and Skupper operates with a single spoke link.
Files affected
values-hub.yaml(remove west managedClusterGroup)charts/all/neuroface-gateway/values.yaml(weights)
Validation
$ oc get managedclusterlocal-cluster and east spoke listed.$ oc get httproute -n neuroface-gateway-system -o yaml | grep -A5 "backendRefs"Integrating Red Hat Trusted Artifact Signer (RHTAS) for image signing
Context
Your supply chain policy requires signed container images before deployment to edge clusters.
What to change
Uncomment the RHTAS, RHTPA, and cert-manager subscription entries in
values-hub.yaml(lines 105–120):
rhtas:
name: rhtas-operator
namespace: openshift-operators
channel: alpha
source: redhat-operators
rhtpa:
name: rhtpa-operator
namespace: openshift-operators
channel: alpha
source: redhat-operators
cert-manager:
name: openshift-cert-manager-operator
namespace: cert-manager-operator
channel: stable-v1
source: redhat-operatorsAdd corresponding namespaces and ArgoCD applications for RHTAS configuration charts.
Configure GitLab CI/CD pipelines in the Developer Hub software templates to sign images with
cosignbefore pushing to the OpenShift internal registry.
Files affected
values-hub.yaml(subscriptions)charts/all/(new RHTAS configuration chart if needed)values-secret.yaml.template(signing credentials)
Validation
$ oc get csv -n openshift-operators | grep rhtasSucceeded$ cosign verify --certificate-identity-regexp=.* \
--certificate-oidc-issuer-regexp=.* \
image-registry.openshift-image-registry.svc:5000/neuroface-user1/neuroface-backend:latestConfiguring Grafana alerts for inference latency SLA
Context
Verification succeededYour SLA requires inference p95 latency below 500 ms. You create Grafana alerts when latency exceeds the threshold.
What to change
Create a
GrafanaAlertRuleGroupresource incharts/all/observability/templates/:
apiVersion: grafana.integreatly.org/v1beta1
kind: GrafanaAlertRuleGroup
metadata:
name: neuroface-inference-sla
namespace: openshift-cluster-observability-operator
spec:
instanceSelector:
matchLabels:
dashboards: grafana
folderRef: "AI Computer Vision"
interval: 60s
rules:
- title: "NeuroFace inference p95 > 500ms"
condition: C
for: 5m
data:
- refId: A
datasourceUid: prometheus
model:
expr: histogram_quantile(0.95, rate(neuroface_request_duration_seconds_bucket[5m]))
- refId: C
datasourceUid: __expr__
model:
type: threshold
conditions:
- evaluator:
type: gt
params: [0.5]Configure a notification channel (Slack, PagerDuty, email) in the Grafana instance.
Ensure OpenTelemetry collectors on spokes export
neuroface_request_duration_secondshistogram metrics.
Files affected
charts/all/observability/templates/(alert rule manifest)charts/all/observability/values.yaml(notification channel configuration)
Validation
$ oc get grafanaalertrulegroups -n openshift-cluster-observability-operatorScaling workshop users
neuroface-inference-sla resource present.Workshop mode defaults to 30 users. To scale to a different count (for example, 50), update userCount in these application overrides consistently across all three values files:
In values-hub.yaml:
platform-users:
overrides:
- name: userCount
value: "50"
developer-hub:
overrides:
- name: userCount
value: "50"
gitlab-operator:
overrides:
- name: userCount
value: "50"
openshift-ai-hub:
overrides:
- name: userCount
value: "50"
devspaces:
overrides:
- name: userCount
value: "50"
showroom:
overrides:
- name: showroom.terminal.userCount
value: "50"Apply the same userCount: "50" to platform-users in values-east.yaml and values-west.yaml.
See Workshop mode for details on what each chart provisions per user.
For installation issues during customization, see Troubleshooting.