This document describes actionable ways to extend the AI Computer Vision pattern for production or proof-of-concept environments. Each scenario includes context, the exact files and values to change, and a validation command.

Per-user OIDC Gateway routing (ApplicationSet)

Context

Workshop users scaffold personal NeuroFace instances via Developer Hub. Each instance gets a unique HTTPRoute path (/user/userN/) on the shared spoke Gateway and an RHCL AuthPolicy with OIDC pointing to the user’s RHBK biometric realm.

What to change

  1. Scaffolding template: charts/all/developer-hub/files/software-templates/ai-computer-vision.yaml

  2. Skeleton manifests: charts/all/developer-hub/files/software-templates/ai-cv-skeleton/k8s/

  3. ApplicationSet: charts/all/developer-hub/templates/applicationset-user-neuroface.yaml (hub vp-gitops, matrix SCM + k8s/target-cluster.yaml → spoke east/west)

  4. Gateway cross-namespace routes: charts/all/spoke-neuroface/templates/gateway.yaml (allowedRoutes.namespaces.from: All)

Validation

$ oc get applicationset -n vp-gitops user-neuroface-apps
$ oc get application -n vp-gitops | grep neuroface-
$ oc get application -n vp-gitops neuroface-user1 -o jsonpath='{.spec.destination.name}{"\n"}'
$ oc get httproute,authpolicy -n neuroface-user1

Adding a third spoke (south) for a new region

Context

Expected: ApplicationSet on the hub discovers ws-workshop/neuroface-* repos; each Application targets spoke east or west from k8s/target-cluster.yaml; HTTPRoute and AuthPolicy exist in the user namespace.

You operate inference in three geographic regions and want a third edge cluster (south) registered in RHACM alongside east and west.

What to change

  1. Create values-south.yaml by copying values-east.yaml:

$ cp values-east.yaml values-south.yaml
  1. Edit values-south.yaml and replace all east-specific references:

main:
  clusterGroupName: south          (1)

clusterGroup:
  name: south                      (2)

  applications:
    servicemesh-config:
      overrides:
        - name: clusterName
          value: south             (3)
        - name: clusterRole
          value: spoke

    observability:
      overrides:
        - name: clusterName
          value: south
        - name: clusterSuffix
          value: "-south"          (4)

    spoke-neuroface:
      overrides:
        - name: clusterName
          value: south
1Sets the cluster group for the VP Operator Pattern CR
2Sets the ArgoCD project and namespace prefix
3All clusterName overrides must be south
4Observability uses the suffix for Thanos and Grafana datasource labels
  1. Add the south managed cluster group in values-hub.yaml:

  managedClusterGroups:
    east:
      name: east
      acmlabels:
        - name: clusterGroup
          value: east
    west:
      name: west
      acmlabels:
        - name: clusterGroup
          value: west
    south:                         # <-- add this block
      name: south
      acmlabels:
        - name: clusterGroup
          value: south
  1. Update charts/all/neuroface-gateway/values.yaml to add a south backend:

clusters:
  east:
    domain: ""
  west:
    domain: ""
  south:                           # <-- add
    domain: ""
  1. Update charts/all/neuroface-gateway/templates/httproute.yaml to include the south backend with appropriate weight (for example, 33/33/34).

  2. Update charts/all/observability/values.yaml to include south cluster token sync.

Files affected

  • values-south.yaml (new)

  • values-hub.yaml (add managedClusterGroups.south)

  • charts/all/neuroface-gateway/values.yaml and templates/httproute.yaml

  • charts/all/observability/values.yaml and templates/multi-cluster-secret.yaml

Validation

$ oc label managedcluster <south_cluster_name> clusterGroup=south --overwrite
$ oc get managedcluster <south_cluster_name> -o jsonpath='{.metadata.labels.clusterGroup}{"\n"}'
$ oc get applications.argoproj.io -n vp-gitops | grep south
Expected output
south
south-spoke-neuroface          Synced        Healthy
south-spoke-neuroface-cv       Synced        Healthy
south-spoke-interconnect       Synced        Healthy

Replacing YOLO with a custom model served by OVMS

Context

Your organization trained a proprietary object-detection model and wants to serve it through OVMS instead of the default YOLO weights.

What to change

  1. Store your model artifacts in an S3-compatible bucket or PVC accessible from the spoke. OVMS expects the following directory structure:

<model_name>/
  1/
    model.xml
    model.bin
  1. Edit charts/all/spoke-neuroface-cv/values.yaml to point to your model:

ovms:
  modelName: my-custom-model       # <-- your model name
  modelPath: my-custom-model/      # <-- path in S3 bucket or PVC
  targetDevice: CPU                # or GPU if available
  resources:
    requests:
      memory: 2Gi                  # adjust based on model size
    limits:
      memory: 4Gi
  1. If your model uses a different input/output format than YOLO, update the NeuroFace application configuration in charts/all/spoke-neuroface/values.yaml to match the new inference API contract.

  2. If you use KServe InferenceService instead of standalone OVMS, create a new template in charts/all/spoke-neuroface-cv/templates/ using the serving.kserve.io/v1beta1 API.

Files affected

  • charts/all/spoke-neuroface-cv/values.yaml (model path, name, resources)

  • charts/all/spoke-neuroface-cv/templates/ (model mount and environment variables)

  • charts/all/spoke-neuroface/values.yaml (inference endpoint URL if changed)

  • Optional: values-east.yaml and values-west.yaml overrides for per-spoke model URLs

Validation

$ oc get pods -n neuroface-cv
$ oc logs deploy/ovms -n neuroface-cv --tail=20
Expected: OVMS log shows Model loaded successfully
$ curl -sk -X POST https://neuroface-cv.apps.<hub_domain>/v2/models/<model_name>/infer \
  -H 'Content-Type: application/json' -d @sample-request.json

Enabling GPU workers on spokes

Context

Expected: JSON response with detections array

CPU inference meets functional requirements but latency exceeds your SLA. You add GPU worker nodes on east and west spokes for accelerated model serving.

What to change

  1. Provision GPU worker nodes. On AWS, create a MachineSet with GPU instances:

$ oc get machinesets -n openshift-machine-api

Create a new MachineSet (or edit an existing one) with instance type g4dn.xlarge (NVIDIA T4) or g5.2xlarge (NVIDIA A10G). See the OpenShift documentation for AWS MachineSets.

  1. Install the NVIDIA GPU Operator. Add subscriptions to values-east.yaml and values-west.yaml:

  subscriptions:
    nfd:
      name: nfd
      namespace: openshift-nfd
      channel: stable
      source: redhat-operators
    gpu-operator:
      name: gpu-operator-certified
      namespace: nvidia-gpu-operator
      channel: v24.9
      source: certified-operators

Add the corresponding namespaces:

  namespaces:
    openshift-nfd:
      operatorGroup: true
      targetNamespaces: []
    nvidia-gpu-operator:
      operatorGroup: true
      targetNamespaces: []
  1. Update OVMS resource requests in charts/all/spoke-neuroface-cv/values.yaml:

ovms:
  targetDevice: GPU
  resources:
    limits:
      nvidia.com/gpu: 1            # <-- request one GPU
      memory: 4Gi
    requests:
      memory: 2Gi
  1. Update cluster sizing documentation in pattern-metadata.yaml:

external_requirements:
  cluster_sizing_note: >
    GPU workers: g4dn.xlarge (T4) or g5.2xlarge (A10G) on spokes.

Files affected

  • values-east.yaml and values-west.yaml (subscriptions + namespaces for NFD and GPU Operator)

  • charts/all/spoke-neuroface-cv/values.yaml (GPU resource requests)

  • pattern-metadata.yaml (sizing note)

  • Spoke cluster MachineSet (platform-specific)

Validation

$ oc get nodes -l nvidia.com/gpu.present=true

Enabling GPU workers on the hub (real local model download and serving)

Context

You want to download and serve real LLM weights locally (not just proxy the external RHDP endpoint) to test Models-as-a-Service end to end, including inference latency and GPU utilization — on a hub-only install (no spokes needed). This is a separate concern from the spoke GPU section above: spoke GPUs accelerate the YOLO PPE model; hub GPUs host real chat-completion LLMs via vLLM.

What to change

This ships as an opt-in overlay (values-hub-gpu.yaml at the repo root) so the default, CPU-only install (values-hub.yaml) is completely unaffected — nothing here changes unless you explicitly load the overlay:

$ EXTRA_HELM_OPTS='-f /path/to/values-hub-gpu.yaml' ./pattern.sh make install

The overlay adds:

  • openshift-nfd / nvidia-gpu-operator namespaces and subscriptions (same Node Feature Discovery + NVIDIA GPU Operator pattern as the spoke section above, applied to the hub instead)

  • gpu.enabled: "true" override for charts/all/openshift-ai-hub, which activates templates/gpu-vllm-models.yaml: a ServingRuntime
    InferenceService + PVC-backed Hugging Face cache per model, downloading weights directly from Hugging Face on first start (no S3/OCI bucket needed)

Default model list (charts/all/openshift-ai-hub/values.yamlgpu.models) uses official, Red Hat-validated FP8-quantized checkpoints from huggingface.co/RedHatAI (LLM Compressor + vLLM, ~50% smaller than fp16 with ~97-101% accuracy recovery per Red Hat’s published benchmarks) instead of community/upstream repos, sized for a single AWS g6.12xlarge (4x NVIDIA L4, 24 GiB VRAM each):

ModelHugging Face ID (RedHatAI, FP8-dynamic)GPUsNotes

qwen2-5-coder-7b-instruct

RedHatAI/Qwen2.5-Coder-7B-Instruct-FP8-dynamic

1

~7 GiB, fits one L4 comfortably

granite-3-1-8b-instruct

RedHatAI/granite-3.1-8b-instruct-FP8-dynamic

1

~8 GiB, fits one L4 comfortably

deepseek-r1-distill-qwen-14b

RedHatAI/DeepSeek-R1-Distill-Qwen-14B-FP8-dynamic

1

~14 GiB — fp16 (~28 GiB) would need 2 GPUs; FP8 fits a single 24 GiB L4

Only 3 of the 4 available GPUs are used, leaving headroom for a larger --max-model-len, a 4th model, or RedHatAI variants of llama-scout-17b (really Llama 4 Scout, a 109B-total/17B-active MoE model) — Red Hat does publish an FP8 checkpoint (RedHatAI/Llama-4-Scout-17B-16E-Instruct-FP8-dynamic), but its own reference command uses --tensor-parallel-size 8, well beyond a single g6.12xlarge. This pattern keeps proxying it from the external RHDP endpoint instead of self-hosting it.

g6.16xlarge has the same single L4 GPU as g6.8xlarge — only more CPU/RAM. It is not a useful choice for this section; prefer g6.12xlarge for the GPU count, or g6.8xlarge if you only need one model at a time.

This is a starter template, not a production-hardened deployment. Verify storageUri requirements against your cluster’s actual KServe CRD version, gated Hugging Face models need gpu.models[].hfTokenSecretName wired to a real token Secret, and --max-model-len/--gpu-memory-utilization should be tuned once you see real VRAM headroom on the live GPU.

Files affected

  • values-hub-gpu.yaml (new, opt-in overlay — never merged into values-hub.yaml)

  • charts/all/openshift-ai-hub/values.yaml (gpu.* block, default enabled: false)

  • charts/all/openshift-ai-hub/templates/gpu-vllm-models.yaml (new template, gated on gpu.enabled)

Validation

$ oc get nodes -l nvidia.com/gpu.present=true
$ oc get servingruntime,inferenceservice -n gpu-models
$ oc get pvc -n gpu-models

Single spoke deployment (proof of concept)

Context

For a proof-of-concept environment, you want to deploy only the hub and one spoke (east), omitting the west cluster entirely.

What to change

  1. Remove the west managed cluster group from values-hub.yaml:

  managedClusterGroups:
    east:
      name: east
      acmlabels:
        - name: clusterGroup
          value: east
    # west:                        # <-- comment out or delete
    #   name: west
    #   acmlabels:
    #     - name: clusterGroup
    #       value: west
  1. Update RHCL HTTPRoute weights to send all traffic to east. In charts/all/neuroface-gateway/values.yaml:

gateway:
  weights:
    east: 100
    west: 0
  1. Skip the west spoke installation entirely. Do not create a Pattern CR on the west cluster.

  2. The pattern still functions: inference traffic goes 100% to east, Grafana shows only east metrics, and Skupper operates with a single spoke link.

Files affected

  • values-hub.yaml (remove west managedClusterGroup)

  • charts/all/neuroface-gateway/values.yaml (weights)

Validation

$ oc get managedcluster
Expected: Only local-cluster and east spoke listed.
$ oc get httproute -n neuroface-gateway-system -o yaml | grep -A5 "backendRefs"

Integrating Red Hat Trusted Artifact Signer (RHTAS) for image signing

Context

Expected: Only one backendRef with weight 100.

Your supply chain policy requires signed container images before deployment to edge clusters.

What to change

  1. Uncomment the RHTAS, RHTPA, and cert-manager subscription entries in values-hub.yaml (lines 105–120):

    rhtas:
      name: rhtas-operator
      namespace: openshift-operators
      channel: alpha
      source: redhat-operators
    rhtpa:
      name: rhtpa-operator
      namespace: openshift-operators
      channel: alpha
      source: redhat-operators
    cert-manager:
      name: openshift-cert-manager-operator
      namespace: cert-manager-operator
      channel: stable-v1
      source: redhat-operators
  1. Add corresponding namespaces and ArgoCD applications for RHTAS configuration charts.

  2. Configure GitLab CI/CD pipelines in the Developer Hub software templates to sign images with cosign before pushing to the OpenShift internal registry.

Files affected

  • values-hub.yaml (subscriptions)

  • charts/all/ (new RHTAS configuration chart if needed)

  • values-secret.yaml.template (signing credentials)

Validation

$ oc get csv -n openshift-operators | grep rhtas
Expected: RHTAS operator CSV Succeeded
$ cosign verify --certificate-identity-regexp=.* \
  --certificate-oidc-issuer-regexp=.* \
  image-registry.openshift-image-registry.svc:5000/neuroface-user1/neuroface-backend:latest

Configuring Grafana alerts for inference latency SLA

Context

Expected: Verification succeeded

Your SLA requires inference p95 latency below 500 ms. You create Grafana alerts when latency exceeds the threshold.

What to change

  1. Create a GrafanaAlertRuleGroup resource in charts/all/observability/templates/:

apiVersion: grafana.integreatly.org/v1beta1
kind: GrafanaAlertRuleGroup
metadata:
  name: neuroface-inference-sla
  namespace: openshift-cluster-observability-operator
spec:
  instanceSelector:
    matchLabels:
      dashboards: grafana
  folderRef: "AI Computer Vision"
  interval: 60s
  rules:
    - title: "NeuroFace inference p95 > 500ms"
      condition: C
      for: 5m
      data:
        - refId: A
          datasourceUid: prometheus
          model:
            expr: histogram_quantile(0.95, rate(neuroface_request_duration_seconds_bucket[5m]))
        - refId: C
          datasourceUid: __expr__
          model:
            type: threshold
            conditions:
              - evaluator:
                  type: gt
                  params: [0.5]
  1. Configure a notification channel (Slack, PagerDuty, email) in the Grafana instance.

  2. Ensure OpenTelemetry collectors on spokes export neuroface_request_duration_seconds histogram metrics.

Files affected

  • charts/all/observability/templates/ (alert rule manifest)

  • charts/all/observability/values.yaml (notification channel configuration)

Validation

$ oc get grafanaalertrulegroups -n openshift-cluster-observability-operator

Scaling workshop users

Expected: neuroface-inference-sla resource present.

Workshop mode defaults to 30 users. To scale to a different count (for example, 50), update userCount in these application overrides consistently across all three values files:

In values-hub.yaml:

    platform-users:
      overrides:
        - name: userCount
          value: "50"
    developer-hub:
      overrides:
        - name: userCount
          value: "50"
    gitlab-operator:
      overrides:
        - name: userCount
          value: "50"
    openshift-ai-hub:
      overrides:
        - name: userCount
          value: "50"
    devspaces:
      overrides:
        - name: userCount
          value: "50"
    showroom:
      overrides:
        - name: showroom.terminal.userCount
          value: "50"

Apply the same userCount: "50" to platform-users in values-east.yaml and values-west.yaml.

See Workshop mode for details on what each chart provisions per user.

For installation issues during customization, see Troubleshooting.