Concepts

Understand concepts and terminology used in Optimize Live

Cluster

To produce recommendations for workloads, the StormForge Agent needs to be installed on the Kubernetes cluster where the workloads are deployed to and then registered with the StormForge Optimize platform.

Workloads

A workload is a component that runs inside one or more Kubernetes pods. In the context of Optimize Live, a workload is a named Workload resource from a namespace in a cluster. Optimize Live observes the resource utilization of workloads in order to produce recommendations for them.

Optimize Live can produce recommendations for the following Workload types: DaemonSet, Deployment, Pod, ReplicaSet, ReplicationController, and StatefulSet.

StormForge Agent

To minimize the footprint in a cluster, Optimize Live requires only the StormForge Agent to be installed on a cluster. By default, the Agent is installed in the stormforge-system namespace.

The StormForge Agent leverages the Kubernetes view role, granting read-only permissions on all resources in the cluster.

StormForge Applier

The StormForge Applier patches a cluster’s workloads with the recommended resource utilization values generated by the Optimize Live machine learning. The Applier uses the same credentials file as the Agent and runs in the same namespace.

Install the Applier if you plan to:

  • Deploy recommendations automatically on a schedule of your choosing. This option enables you to skip manually reviewing the recommended settings and ensures your settings track closely to actual CPU and memory use.
  • Deploy recommendations on demand. For example, you can apply a single recommendation in any environment as you experiment with recommendations or if you need to quickly deploy a recommendation outside of a schedule.

The Applier leverages the Kubernetes edit role, enabling it to update and patch all optimizable workloads (and HPA, if enabled). You can grant additional permissions by specifying additional RBAC in the Helm install command.

Recommendations

An Optimize Live recommendation is the set of resource requests and limits that the machine learning algorithm has determined to be optimal for a workload, based on historical utilization observations.

It typically takes about 7 days’ worth of metrics to generate a recommendation that you can apply, often referred to as a complete recommendation.

During this intial 7-day metrics collection period, you can view preliminary recommendations based on the metrics collected so far:

  • One hour after installation, you’ll see the first preliminary recommendation based on the metrics collected so far. You can view preliminary recommendations, but you should not apply them, because they’re not based on a complete set of metrics.
  • For hours 2 to 24 after installation, metrics collection continues and preliminary recommendations are generated hourly. You might notice the recommendations becoming more refined.
  • On day 2 to 7 after installation, metrics collection continues and preliminary recommendations are generated once daily.
  • On day 7, complete recommendations are available to apply, and continue to be generated on the schedule of your choosing (or once daily by default if you don’t set a schedule).

How we generate recommendations

Optimize Live generates recommendations using our patent pending machine learning. Our machine learning examines the metrics collected* (including CPU and memory requests and usage) and monitors usage patterns and scaling behavior to come up with the optimal settings for:

  • CPU requests and limits
  • memory requests and limits
  • HPA target utilization, if a workload is scaling on the HPA

When generating a recommendation, the machine learning generates 3 candidate recommendations, one for each possible “optimization goal”:

  • savings (most aggressive candidate)
  • reliability (least aggressive)
  • balanced (default, falls between the other 2 candidates)

The more data that we collect, the better the recommendation that we generate, and our machine learning weights recent data more heavily. The recommendation schedule defines how often a recommendation is deployed and for how long that recommendation is considered “not stale.” For example, a recommendation with a daily schedule (the default value and best practice) should be deployed daily.

Our machine learning detects spikes in a workload and considers them when generating recommendations. To realize the most savings, consider deploying recommendations frequently (again, a best practice). When the machine learning detects that a workload has been scaled down to zero replicas, it does not provide a recommendation for that workload.

*For the full list of metrics, run:

helm show readme oci://registry.stormforge.io/library/stormforge-agent \
| grep "## Workload Metrics" -A 18

Learn more:

Last modified April 10, 2024