Try our Optimize Live sandbox

Explore Optimize Live with view-only privileges - no installation required

If you don’t have a cluster of your own, you can check out Optimize Live running in a view-only sandbox.

The sandbox contains two clusters, each running the same simulated e-commerce application.

This guide walks you through a comparison of optimized and non-optimized workloads, showing you how Optimize Live machine learning rightsizes your workloads based on the metrics it collects.

Log in to the sandbox

To preview Optimize Live (with view-only privileges) without having to install anything, click Enter Sandbox on the sandbox landing page.

Applications in the sandbox

Each cluster is running a simulated e-commerce application that has several workloads. One Kubernetes workload is running for each service (shopping cart, ad service, and so on).

The StormForge Optimize Live Agent is installed on both clusters. All workloads in this estate are configured to have a recommendation generated once daily.

  • In the dev-us-west2-optimized cluster, recommendations are applied automatically as soon as they’re generated.
  • In the dev-us-east1-not-optimized cluster, recommendations are generated but not applied.

Let’s start with the Overview page.

  • Note: The sandbox is a live environment, and its data is always changing. Therefore, the data in the sandbox won’t match what you see in this guide because this guide captures a point in time. Although the data differs, the principles are still the same.

Get the estate-level view

In the With Optimization section, you’ll find the proposed rightsizing of the workloads across the estate (which consists of two clusters):

  • Total requests is calculated by aggregating per-container recommended request values multiplied by observed replica count, and averaging over the last 7 days.
  • Net Impact shows the difference between current and recommended totals.

Optimize Live web application overview page

Let’s break down the numbers.

At this point in time in the sandbox, across both clusters, you’d realize a net savings of $5.97 a month by rightsizing your workloads with the recommended settings. Right now that doesn’t sound like much, but imagine an estate with many thousands of workloads.

Let’s look at the math for CPU resources:

  • Rightsizing overprovisioned workloads will potentially save you 0.919 cores.
  • Rightsizing underprovisioned workloads will consume an additional 0.885 cores.
  • Therefore, the net impact of rightsizing CPU resources is an additional 0.034 cores (0.919-0.885).

For memory resources:

  • Rightsizing overprovisioned workloads will potentially save you 2.31 GiB, thus reducing costs.
  • Rightsizing underprovisioned workloads will consume an additional 647 MiB.
  • Therefore, the net impact of rightsizing memory requests is a savings of 1.68 GiB.

Where do the dollar amounts come from?
They’re calculated using the values on the Cost Estimates page. In the left navigation, click Settings > Cost Estimates. You can change them to be more reflective of your environment, or you can hide them if they’re not important to you.

Estate view: At-a-glance cluster, namespace, and workload health

The Top Clusters, Top Namespaces, and Top Overprovisioned/Underprovisioned Workloads sections help you to understand where you’ll get the greatest benefit from applying recommendations.

Typically, you’ll just scan these sections to decide where you need to investigate further, perhaps due to traffic spikes, deployment of new applications, or if total requests somewhere are higher or lower than expected.

From the Overview page, you can either:

  • Drill down to the next level of detail by clicking an item. Each section lists up to 10 items.
  • Skip directly to an item type by using the left navigation.

In this guide, we’ll walk you through each page section so that when you start using Optimize Live in your environment, you’ll know where to go based on what you want to see.

Let’s start by looking at the Top Clusters section.

Similar to the Net Impact described previously, the Impact column at the right shows the difference between the current requests and proposed optimized requests.

Optimize Live overview page, clusters section

Notice how the dev-us-east1-not-optimized cluster is listed first: The difference between current total requests and optimized total requests is greater than the dev-us-west1-optimized cluster. Optimizing the workloads in this cluster will have the greatest impact on savings (or reliability, if the recommendation is to increase the CPU or memory requests values).

In a real scenario, you might not know why there’s a difference between current and optimized total requests. If the cluster view is too high-level, you can either click into a cluster or scan the other sections of this page for more information.

Let’s see what the Top Namespaces section tells you.

Optimize Live overview page, namespaces section

In this example, each cluster has just one namespace, so the total requests and impact values are the same as those in the Top Clusters section.

In a large estate, if several namespaces from the same cluster were listed, you might investigate those namespaces. Or, you could scroll to the Top Overprovisioned Workloads and Top Underprovisioned Workloads sections to assess what’s happening. Let’s take a look at the Top Overprovisioned workloads.

Optimize Live overview page, top overprovisioned workloads section Notice that the two most overprovisioned workloads are in the microservices-demo-1 namespace, where you already know workloads are not applied.

Again, the Impact column in each row shows the difference between the current requests and proposed optimized requests, this time for the specific workload.

Click the sf-hipster-shop-loadgenerator-loadgenerator-workload workload to view its details.

On the workload details page, notice that the Net Impact column shows the difference between the current and optimized total requests, and the estimated cost savings.

Optimize Live overview page, top overprovisioned workloads section

We already know that recommendations aren’t applied: In the Recommendation section, Automatic Deployment is Off. Recommendations are generated once daily, but not applied.

Optimize Live overview page, recommendation schedule

Now look at the Impact Overview graphs, starting with the Total CPU usage graph:

Optimize Live overview page, Total CPU usage graph

Notice the how much higher the current total requests (the blue line) are compared to the actual usage (the yellow line) and the recommended total requests (the green line). This workload is overprovisioned with respect to CPU resources, and applying recommendations would rightsize it.

  • Tip: To show and hide lines on the graph, click its name in the graph legend. For example, you might deselect Net impact to reduce visual clutter.

Now hover on the graph at a point in time when the recommendation was valid. Notice that the recommended requests are close to the value shown in the With Optimization block at the top of the page (0.320 cores): Optimize Live overview page, Total CPU usage graph

Similarly, with total memory usage and requests, the current requests are much higher than total usage, and the recommended requests are much closer to the actual usage. Applying recommendations will rightsize this workload.

Optimize Live overview page, Total memory usage graph

Now let’s look at a container in this workload.

Reviewing the recommendation details: container-level recommendations

Before we look at the recommendation details and the container’s usage graphs, let’s look at how this workload and container are configured. On the workload details page, click the Config tab, and in the Containers section, expand the main container: Optimize Live overview page, container configuration

Remember, a recommendation includes the proposed optimized requests and limits values for each container in a workload. But the optimization policy for the container defines what is applied: requests only, requests and limits, or nothing.

In this example, if recommendations were to be applied, they would adjust both the requests and limits values for the container.

Go to the Recommendation Details tab, which shows the current and proposed requests and limits values for each container. Optimize Live recommends the following adjustments: Optimize Live overview page, container recommendation

  • CPU: Reduce requests to 80m and reduce limits to 200m
  • Memory: Reduce requests to 100MiB and reduce requests to 300MiB

The graphs show the overprovisioning clearly: In the Average CPU Usage graph, notice how much closer the recommended requests and limits (the green and pink lines) are to the actual usage than the current requests and limits:

Optimize Live overview page, container CPU usage

Notice the same in the Average Memory Usage graph: Optimize Live overview page, container memory usage

When you hover over a point during which this recommendation was valid, you see that the recommended settings in the graph match the most recent recommendation settings - look at the recommended requests:

Optimize Live overview page, container memory usage

The potential rightsizing you see in the graphs should make a compelling argument for enabling recommendations on this workload.

Deploying recommendations automatically

When you’re comfortable with the recommendations that Optimize Live generates, you can let Optimize Live deploy them automatically to save you time and toil:

  1. Install the Optimize Live Applier.
    helm install stormforge-applier \
    oci://registry.stormforge.io/library/stormforge-applier \
    -n stormforge-system
    
  2. Enable automatic deployment on the Config tab of the workload details page (see the Configure topic if you need details).

By default, a new recommendation will applied once daily. You can change the schedule based on how closely you want to track CPU and memory utilization and your tolerance for churn.

To reduce pod churn, you can define thresholds to ensure only impactful recommendations are applied. Learn more in the Configure topic.

Configuring recommendations

You don’t need to configure much in Optimize Live because the machine learning does the work. But you can adjust several settings at the workload and container levels, including:

  • Recommendation schedule: This defines how often Optimize Live generates (and optionally applies) recommendations.
  • Optimization goals for CPU and memory: Do you want more “aggressive” recommendations that optimize cost savings, less aggressive recommendations that ensure reliability, or a balance of the two?
  • Optimization policy: For each container, you can choose what to optimize: requests, requests and limits, or neither.
  • HPA target utilization: You can set this when an HPA is detected.
  • Container-level thresholds: As mentioned above, thresholds ensure that only the most impactful recommendations are applied, which can help to reduce pod churn.

You can configure these settings by using the following methods:

  • UI: Best for viewing, since you can configure only one workload at a time. In the left navigation, click Workloads, and in the list, find and then click the workload. On the workload details page, click the Config tab near the middle of the page.
  • Annotations or kubectl annotate: Configure workloads individually, or at the namespace and cluster level.
  • StormForge CLI: Change workload settings individually, or at the namespace or cluster level.

For details about what you can configure and the different methods, see the Configure topic.

Recap

The Optimize Live UI gives you the flexibility to review the net impact of recommendations at the estate, cluster, namespace, workload, and container level.

Typically, scanning the Overview page gives you an idea of where to investigate further when you see an unusually large impact or unexpected results. You can drill down to the level you want to view and decide if it’s necessary to change any optimization settings.

If you decide to change any optimization settings, you can do use using the UI, CLI, or annotations, and you can change settings at the cluster, namespace, or workload levels.

Key points to remember:

  • Rightsizing is more than just cost savings: It’s about making sure workloads run optimally.
  • You’ll get the most out of Optimize Live when you let Optimize Live apply recommendations automatically.

By letting the machine learning in Optimize Live rightsize the workloads in your estate, you’ll see fast time to value and remove the toil of thinking about — and setting — the optimal Kubernetes requests values.

Last modified April 19, 2024