Workload groups
5 minute read
Workload groups let you aggregate usage metrics from related workloads, so StormForge generates appropriate recommendations even when a workload has temporarily low or split traffic.
Overview
StormForge generates recommendations based on observed resource usage. When workloads share a logical traffic pattern but run as separate deployments — blue/green pairs, canary variants, or ephemeral workloads — the inactive variants can report near-zero usage. This can lead to undersized recommendations for a variant that should align with a busier workload.
Workload groups solve this problem by letting you define related workloads. StormForge aggregates the group’s metrics before computing a recommendation, so a workload’s recommendation reflects the group’s combined resource usage rather than just its own observed metrics.
Workload group configuration is per-workload: configuring a group for workload A doesn’t affect workload B’s recommendations unless B is also configured with a group.
How it works
A workload group is configured on a target workload — the workload a recommendation is being generated for. Workloads that match the configuration become members of the group. The target workload is also a member by default. At recommendation time, StormForge aggregates CPU, memory, replica counts, and OOM events across all members, then uses the aggregate to compute the target workload’s recommendation.
Set workload-group.exclude-target: true to exclude the target workload from its own group — useful when the recommendation should be based entirely on another workload’s metrics. In mirror mode, StormForge computes the recommendation from the other members only. See Mirror another workload’s recommendations.
In the StormForge UI, recommendation charts include a toggle between Individual (target workload only) and Aggregated (group-wide) views.
Configuration
Configure workload groups using the workload-group settings in your workload configuration.
| Annotation | Description | Default |
|---|---|---|
live.stormforge.io/workload-group.selector |
Kubernetes label selector matching workloads to include | - |
live.stormforge.io/workload-group.expression |
CEL expression matching workloads to include | - |
live.stormforge.io/workload-group.name |
Name of a workload to include | Target workload’s name |
live.stormforge.io/workload-group.cluster |
Cluster of a workload to include | Target workload’s cluster |
live.stormforge.io/workload-group.namespace |
Namespace of a workload to include | Target workload’s namespace |
live.stormforge.io/workload-group.resource |
Resource type of a workload to include | Target workload’s resource type |
live.stormforge.io/workload-group.exclude-target |
Exclude the target workload from its own group (mirror mode) | false |
For most use cases — blue/green pairs, canary deployments, and ephemeral workloads — use expression or selector to identify related workloads:
expressionis the more flexible option. Because a CEL expression can reference the target workload viatarget, the same expression can be applied to many workloads — for example, through aClusterOptimizationConfigurationrule — and each resolves its own group independently.selectormatches by label, so it works best when all workloads in the group share a common label.
The identity fields (name, cluster, namespace, resource) are most useful in mirror mode, where a workload’s recommendation should come from a specific other workload.
All configured fields combine to narrow the match. exclude-target is independent of matching and controls only whether the target workload is also a member.
To see which workloads will be aggregated for a given workload’s next recommendation, run:
stormforge describe workload <workload>
CEL expressions
The workload-group.expression field accepts a CEL expression for matching workloads by their attributes, with two available variables:
target— the target workload (the workload to generate the recommendation for)candidate— a candidate workload being evaluated for inclusion in the group
The expression should return true to include the candidate. Both variables expose the StormForge representation of the workload, including the fields name, namespace, cluster, and resource.
Referencing candidate.cluster, candidate.namespace, candidate.resource, or candidate.name in a CEL expression overrides the corresponding identity field. Where a CEL reference and an identity field would otherwise conflict, the CEL reference wins.
Examples
Blue/green deployments
A blue/green deployment runs two variants — typically <app>-blue and <app>-green — and shifts traffic between them. Ungrouped, the inactive variant would receive recommendations sized for near-zero traffic.
Use a CEL expression to match the other variant by name:
live.stormforge.io/workload-group.expression: '''candidate.name.matches(target.name.find(".*-") + "(blue|green)")'''
This expression works on <app>-blue and <app>-green, so a single workload-group configuration applies across every blue/green pair in a cluster.
If your blue/green pairs share a common label, use a selector:
live.stormforge.io/workload-group.selector: '"app=nginx"'
Either option results in recommendations that reflect the combined traffic regardless of which variant is currently active.
Canary deployments
Progressive delivery tools like Flagger create an <app>-primary deployment that serves live traffic, with an <app> deployment used as the canary during rollouts. The canary may run at near-zero replicas between rollouts, making its standalone recommendation undersized for production traffic.
Include the paired deployment in the group with a CEL expression:
live.stormforge.io/workload-group.expression: '''candidate.name + "-primary" == target.name || target.name + "-primary" == candidate.name'''
This expression works whether the target workload is the canary (podinfo) or the primary (podinfo-primary), so the same configuration applies to every pair.
If the canary and primary share a label — Flagger preserves the canary’s labels on the primary by default — you can use a selector:
live.stormforge.io/workload-group.selector: '"app=podinfo"'
Ephemeral workloads
Some deployment pipelines produce logically identical workloads with unique suffixes such as hashes, timestamps, or build IDs. Because names change on every deployment, matching by the target workload’s literal name isn’t possible.
When these workloads share a label, define a workload group using a selector. For example, with Knative revisions, each configuration change creates a new Deployment with a unique name, but all service revisions carry the label serving.knative.dev/service=<service>:
live.stormforge.io/workload-group.selector: '"serving.knative.dev/service=my-service"'
When the workloads don’t share a label, match on a stable name prefix with a CEL expression. For workloads named <app>-<hash>, strip the suffix from the target workload’s name to get the prefix, then match any workload sharing it:
live.stormforge.io/workload-group.expression: '''candidate.name.startsWith(target.name.substring(0, target.name.lastIndexOf("-") + 1))'''
All matching workloads contribute metrics to the group, so the target workload’s recommendation remains stable across deployments.
Mirror another workload’s recommendations
To compute a workload’s recommendation from another workload’s metrics only, set exclude-target: true along with identity fields pointing to the workload to mirror.
Use mirror mode for:
- A newly deployed workload that doesn’t yet have representative usage data but resembles an existing workload.
- A low-traffic or staging workload that should match a busier or production workload.
- Mirroring recommendations from a primary cluster to a standby cluster to keep both environments consistently sized.
Mirror a workload in the same namespace:
live.stormforge.io/workload-group.name: "production-api"
live.stormforge.io/workload-group.exclude-target: "true"
Mirror a workload with the same name in a different cluster:
live.stormforge.io/workload-group.cluster: "prod-us-east"
live.stormforge.io/workload-group.exclude-target: "true"
In mirror mode, the aggregate excludes the target workload’s metrics.