Try our Optimize Live sandbox
If you don’t have a cluster of your own, you can check out Optimize Live running in a view-only sandbox.
The sandbox contains two clusters, each running the same simulated e-commerce application.
This guide walks you through a comparison of optimized and non-optimized workloads, showing you how Optimize Live machine learning rightsizes your workloads based on the metrics it collects.
Log in to the sandbox
To preview Optimize Live (with view-only privileges) without having to install anything, log in to the demo sandbox with user ID sandbox@stormforge.demo.
Applications in the sandbox
Each cluster is running a simulated e-commerce application that has several workloads. One Kubernetes workload is running for each service (shopping cart, ad service, and so on).
The StormForge Optimize Live Agent is installed on both clusters. All workloads in this estate are configured to have a recommendation generated once daily.
- In the
dev-us-west2-optimized
cluster, recommendations are applied automatically as soon as they’re generated. - In the
dev-us-east1-not-optimized
cluster, recommendations are generated but not applied.
Let’s start with the Overview page.
- Note: The sandbox is a live environment, and its data is always changing. Therefore, the data in the sandbox won’t match what you see in this guide because this guide captures a point in time. Although the data differs, the principles are still the same.
Get the estate-level view
In the With Optimization section, you’ll find the proposed rightsizing of the workloads across the estate (which consists of two clusters):
- Total requests is calculated by aggregating per-container recommended request values multiplied by observed replica count, and averaging over the last 7 days.
- Net Impact shows the difference between current and recommended totals.
Let’s break down the numbers.
At this point in time in the sandbox, across both clusters, you’d realize a net savings of $5.97 a month by rightsizing your workloads with the recommended settings. Right now that doesn’t sound like much, but imagine an estate with many thousands of workloads.
Let’s look at the math for CPU resources:
- Rightsizing overprovisioned workloads will potentially save you 0.919 cores.
- Rightsizing underprovisioned workloads will consume an additional 0.885 cores.
- Therefore, the net impact of rightsizing CPU resources is an additional 0.034 cores (0.919-0.885).
For memory resources:
- Rightsizing overprovisioned workloads will potentially save you 2.31 GiB, thus reducing costs.
- Rightsizing underprovisioned workloads will consume an additional 647 MiB.
- Therefore, the net impact of rightsizing memory requests is a savings of 1.68 GiB.
Where do the dollar amounts come from?
They’re calculated using the values on the Cost Estimates page. In the left navigation, click Settings > Cost Estimates. You can change them to be more reflective of your environment, or you can hide them if they’re not important to you.
Estate view: At-a-glance cluster, namespace, and workload health
The Top Clusters, Top Namespaces, and Top Overprovisioned/Underprovisioned Workloads sections help you to understand where you’ll get the greatest benefit from applying recommendations.
Typically, you’ll just scan these sections to decide where you need to investigate further, perhaps due to traffic spikes, deployment of new applications, or if total requests somewhere are higher or lower than expected.
From the Overview page, you can either:
- Drill down to the next level of detail by clicking an item. Each section lists up to 10 items.
- Skip directly to an item type by using the left navigation.
In this guide, we’ll walk you through each page section so that when you start using Optimize Live in your environment, you’ll know where to go based on what you want to see.
Let’s start by looking at the Top Clusters section.
Similar to the Net Impact described previously, the Impact column at the right shows the difference between the current requests and proposed optimized requests.
Notice how the dev-us-east1-not-optimized
cluster is listed first: The difference between current total requests and optimized total requests is greater than the dev-us-west1-optimized
cluster. Optimizing the workloads in this cluster will have the greatest impact on savings (or reliability, if the recommendation is to increase the CPU or memory requests values).
In a real scenario, you might not know why there’s a difference between current and optimized total requests. If the cluster view is too high-level, you can either click into a cluster or scan the other sections of this page for more information.
Let’s see what the Top Namespaces section tells you.
In this example, each cluster has just one namespace, so the total requests and impact values are the same as those in the Top Clusters section.
In a large estate, if several namespaces from the same cluster were listed, you might investigate those namespaces. Or, you could scroll to the Top Overprovisioned Workloads and Top Underprovisioned Workloads sections to assess what’s happening. Let’s take a look at the Top Overprovisioned workloads.
Notice that the two most overprovisioned workloads are in the
microservices-demo-1
namespace, where you already know workloads are not applied.
Again, the Impact column in each row shows the difference between the current requests and proposed optimized requests, this time for the specific workload.
Click the sf-hipster-shop-loadgenerator-loadgenerator-workload
workload to view its details.
On the workload details page, notice that the Net Impact column shows the difference between the current and optimized total requests, and the estimated cost savings.
We already know that recommendations aren’t applied: In the Recommendation section, Automatic Deployment is Off. Recommendations are generated once daily, but not applied.
Now look at the Impact Overview graphs, starting with the Total CPU usage graph:
Notice the how much higher the current total requests (the blue line) are compared to the actual usage (the yellow line) and the recommended total requests (the green line). This workload is overprovisioned with respect to CPU resources, and applying recommendations would rightsize it.
- Tip: To show and hide lines on the graph, click its name in the graph legend. For example, you might deselect Net impact to reduce visual clutter.
Now hover on the graph at a point in time when the recommendation was valid. Notice that the recommended requests are close to the value shown in the With Optimization block at the top of the page (0.320 cores):
Similarly, with total memory usage and requests, the current requests are much higher than total usage, and the recommended requests are much closer to the actual usage. Applying recommendations will rightsize this workload.
Now let’s look at a container in this workload.
Reviewing the recommendation details: container-level recommendations
Before we look at the recommendation details and the container’s usage graphs, let’s look at how this workload and container are configured. On the workload details page, click the Config tab, and in the Containers section, expand the main
container:
Remember, a recommendation includes the proposed optimized requests and limits values for each container in a workload. But the optimization policy for the container defines what is applied: requests only, requests and limits, or nothing.
In this example, if recommendations were to be applied, they would adjust both the requests and limits values for the container.
Go to the Recommendation Details tab, which shows the current and proposed requests and limits values for each container. Optimize Live recommends the following adjustments:
- CPU: Reduce requests to 80m and reduce limits to 200m
- Memory: Reduce requests to 100MiB and reduce requests to 300MiB
The graphs show the overprovisioning clearly: In the Average CPU Usage graph, notice how much closer the recommended requests and limits (the green and pink lines) are to the actual usage than the current requests and limits:
Notice the same in the Average Memory Usage graph:
When you hover over a point during which this recommendation was valid, you see that the recommended settings in the graph match the most recent recommendation settings - look at the recommended requests:
The potential rightsizing you see in the graphs should make a compelling argument for enabling recommendations on this workload.
Deploying recommendations automatically
When you’re comfortable with the recommendations that Optimize Live generates, you can let Optimize Live deploy them automatically to save you time and toil:
- Install the Optimize Live Applier.
helm install stormforge-applier \ oci://registry.stormforge.io/library/stormforge-applier \ -n stormforge-system
- Enable automatic deployment on the Config tab of the workload details page (see the Configure topic if you need details).
By default, a new recommendation will applied once daily. You can change the schedule based on how closely you want to track CPU and memory utilization and your tolerance for churn.
To reduce pod churn, you can define thresholds that apply only recommendations that propose a reasonable amount of change (as defined by you). Learn more in the Configure topic.
Configuring recommendations
You don’t need to configure much in Optimize Live because the machine learning does the work. But you can adjust several settings at the workload and container levels, including:
- Recommendation schedule: This defines how often Optimize Live generates (and optionally applies) recommendations.
- Optimization goals for CPU and memory: Do you want more “aggressive” recommendations that optimize cost savings, less aggressive recommendations that ensure reliability, or a balance of the two?
- Optimization policy: For each container, you can choose what to optimize: requests, requests and limits, or neither.
- HPA target utilization: You can set this when an HPA is detected.
- Container-level thresholds: As mentioned above, thresholds ensure that only the most impactful recommendations are applied, which can help to reduce pod churn.
You can configure these settings by using the following methods:
- UI: Best for viewing, since you can configure only one workload at a time. In the left navigation, click Workloads, and in the list, find and then click the workload. On the workload details page, click the Config tab near the middle of the page.
- Annotations or
kubectl annotate
: Configure workloads individually, or at the namespace and cluster level.
- StormForge CLI: Change workload settings individually, or at the namespace or cluster level.
For details about what you can configure and the different methods, see the Configure topic.
Recap
The Optimize Live UI gives you the flexibility to review the net impact of recommendations at the estate, cluster, namespace, workload, and container level.
Typically, scanning the Overview page gives you an idea of where to investigate further when you see an unusually large impact or unexpected results. You can drill down to the level you want to view and decide if it’s necessary to change any optimization settings.
If you decide to change any optimization settings, you can do use using the UI, CLI, or annotations, and you can change settings at the cluster, namespace, or workload levels.
Key points to remember:
- Rightsizing is more than just cost savings: It’s about making sure workloads run optimally.
- You’ll get the most out of Optimize Live when you let Optimize Live apply recommendations automatically.
By letting the machine learning in Optimize Live rightsize the workloads in your estate, you’ll see fast time to value and remove the toil of thinking about — and setting — the optimal Kubernetes requests values.