Experiments

How to define Experiment resources for Optimize Pro

What is an Experiment?

An Experiment is the basic unit of organization in StormForge Optimize Pro. The purpose of an experiment is to try different configurations of an application’s parameters and measure their impact. An experiment is composed of three primary concepts:

How is an Experiment created?

With kubectl apply -f and an Experiment .yaml file.

What is in an Experiment .yaml file?

Experiments are Kubernetes objects. This means they follow a similar pattern to other Kubernetes resources in that they include TypeMeta data, ObjectMeta data, and a spec.

apiVersion: optimize.stormforge.io/v1beta2
kind: Experiment
metadata:
  name: example
  labels:
    stormforge.io/application: example
    stormforge.io/scenario: default
spec:
  [...]

When creating Experiments, the apiVersion is optimize.stormforge.io/v1beta2, and the kind is Experiment.

There are two required labels for Experiments: stormforge.io/application and stormforge.io/scenario. These labels are used to help organize and view experiments in the StormForge Optimize Pro app.

Spec

The spec defines each of the three primary concepts – Parameters, Metrics, and Trials – as well as optional Optimize Pro settings.

Optimize Pro Settings

A small number of settings exist which can be used to adjust how Optimize Pro conducts the experiment.

Optimize Pro settings for an experiment are configured through the .spec.optimization field.

The most notable setting is experimentBudget, which tells Optimize Pro how many trials it is permitted to conduct before the experiment is considered finished.

Example of an experiment setting:

spec:
  optimization:
  - name: "experimentBudget"
    value: "20"

Optimize Pro settings are specified in a list, with each setting having a name and value. The name and value should both be strings.

Parameters

Parameters are the things an experiment is trying to find the optimal values for. You can define parameters as either numeric or categorical. Numeric parameters let Optimize Pro search for the best numeric value in a range, while categorical parameters let you give Optimize Pro a list of valid settings, and it will experiment within the set of values you give it.

Parameters are configured through the .spec.parameters field.

Example of a numeric parameter:

spec:
  parameters:
  - name: maxHeapSize
    baseline: 256
    min: 64
    max: 8192

Example of a categorical parameter:

spec:
  parameters:
  - name: garbageCollector
    baseline: "G1"
    items:
    - "Parallel"
    - "Serial"
    - "ConcMarkSweep"
    - "G1"
    - "Z"

You specify your parameters in a list, and you can specify as many parameters as you want. While there is not a hard limit here, the more parameters you want to experiment with, the higher you should set your experimentBudget so that Optimize Pro has enough trials to explore the larger problem space.

As a rule of thumb, expect to give Optimize Pro about 20 trials per parameter of experimentBudget. For a five-parameter experiment, that would equate to an experimentBudget of ~100.

For more information, see the parameters reference.

Metrics

If parameters are what Optimize Pro is trying to find the optimal values for, metrics are how Optimize Pro decides how well it’s doing.

After choosing some experimental parameter values and running a trial, Optimize Pro will query for all the metrics you define in order to evaluate the results of its trial parameter choices. Metrics can be outputs from the trial itself, including calculations or aggregations thereof. They can be recorded by built-in tooling, or they can be recorded by your existing monitoring solution. They can also include simple recorded information from Kubernetes about how the trial ran.

Metrics are configured through the .spec.metrics field.

Example of a prometheus metric:

spec:
  metrics:
  - name: throughput
    type: prometheus
    query: 'scalar( throughput{ job="trialRun", instance="{{ .Trial.Name }}" } )'

Example of a kubernetes metric:

spec:
  metrics:
  - name: duration
    minimize: true
    type: kubernetes
    query: "{{ duration .StartTime .CompletionTime }}"

Like parameters, you can specify a list of metrics that you care about. As a general strategy, metrics should be selected with opposing goals. For example, choosing to minimize resource usage by itself will result in an application that does not start and therefore does not use any CPU or memory at all. An example of opposing goals would be to minimize overall resource usage (a combined metric for both CPU and memory) while maximizing throughput of some part of the application.

For more information, see the metrics reference.

Trials

Running trials is how Optimize Pro conducts the experimentation phase of searching a problem space for an optimal solution. When you define a trial in an experiment, you are defining the work Optimize Pro will repeatedly do to prepare, run, and record metrics for your application when given a set of parameter assignments.

Trials are configured through the .spec.patches and .spec.trialTemplate fields.

Patches

If your application is external to the experiment and not part of the trial template itself (as is often the case), patches tell Optimize Pro how to modify your application in-between trial runs, to set parameter values.

Patches are applied before the trial job defined in the trial template runs.

Example of a patch:

spec:
  patches:
  - targetRef:
      apiVersion: apps/v1
      kind: Deployment
      name: example-nginx
    patch: |
      spec:
        template:
          spec:
            containers:
            - name: nginx
              env:
              - name: VARIABLE
                value: "{{ .Values.trialInput }}"      

Patches are specified as a list. Each patch entry in the list must supply a target reference, which tells Optimize Pro which existing Kubernetes resource the patch applies to, as well as a kubectl patch template (strategic, json, or merge). Go template functions may be used in the definition of the patch, which is how parameter values are specified.

For more information, see the patches reference.

Trial Template

The core of a trial is a Kubernetes job, which will be created based on the trial template defined here. Typical kinds of trial jobs include:

  • A batch job which performs work directly and is measured on its performance
  • A load test job which sends traffic to another application, measuring that application’s performance
  • An API client job which calls out to external systems to do work, and receives performance indicators back
  • A simple “sleep” job to give time for an external metrics system to collect data

Minimized example of a trial template:

spec:
  trialTemplate:
    spec:
      jobTemplate:
        spec:
          template:
            spec:
              containers:
              - name: main
                image: docker.io/curlimages/curl:latest
                command:
                - '/bin/sh'
                - '-c'
                - |
                  # Wait for external activity to complete, then fetch a value,
                  # and push it to the metrics service
                  sleep 120
                  curl --data-binary @- "$PUSHGATEWAY_URL" <<EOF
                    value $(curl http://example-nginx:80/metric)
                  EOF                  

Trial templates can be further customized using setup tasks (helpers), readiness gates, and other features.

For more information, see the trial template reference.

A Complete Example

This is an example of a complete, though trivial, experiment. Consider it a “hello world” sample, for education purposes.

This experiment asks Optimize Pro to explore which number of sleepSeconds (our parameter) results in the shortest trialDuration (our metric), within the bounds of 0 and 20.

---
apiVersion: optimize.stormforge.io/v1beta2
kind: Experiment
metadata:
  name: 'hello-world'
  labels:
    stormforge.io/application: 'hello-world'
    stormforge.io/scenario: 'example'

spec:
  optimization:
  - name: "experimentBudget"
    value: "10"

  parameters:
  - name: sleepSeconds
    baseline: 10
    min: 0
    max: 20

  metrics:
  - name: trialDuration
    minimize: true
    query: "{{ duration .StartTime .CompletionTime }}"

  trialTemplate:
    spec:
      jobTemplate:
        backoffLimit: 1
        spec:
          template:
            spec:
              activeDeadlineSeconds: 60 # Running longer than 1min is a failure
              restartPolicy: Never
              containers:
              - name: trial
                image: docker.io/library/busybox:latest
                command: ['sleep', '$(SLEEPSECONDS)']
Last modified November 16, 2022