Trials

A trial is a single run of an experiment. This includes a given set of parameters that will be measured against the metrics defined in the experiment.

Each trial consists of two components - the trial job spec and an optional set of setup tasks.

Trial Spec

The trial spec is composed of a list setup tasks and a trial job. While the trial spec itself is optional, it is strongly recommended to make use of the trial job.

Additional details can be found in the trial reference.

Trial Job

A trial job controls the duration of the parameters that are being tested. This is commonly represented as a load testing tool such as StormForge, locust, or pgbench. This can also be a simple script that waits for another resource to complete.

A trial job is launched after all patches have been applied and all setup tasks are complete.

If no trial job is specified, a default trial job will be launched that will sleep for a period of seconds.

The following example will run a pgbench pod for each trial run.

spec:
  trialTemplate:
    spec:
      jobTemplate:
        spec:
          template:
            spec:
              containers:
              - image: crunchydata/crunchy-pgbench:centos7-11.4-2.4.1
                name: pgbench

Setup Tasks

Setup tasks allow you to perform additional work that is needed for each trial run. This can be something like deploying a helm chart, launching a prometheus instance, or deploying an operator.

Setup task pods will also have additional environment variables presented for each trial run. These environment variables contain all of the current parameter values and the current mode ( create or delete ). Parameter environment variables are presented as uppercase names of their respective parameters. For example, if there is a parameter named memory, it will be presented to the pod as MEMORY.

Helm

A Helm setup task allows you to deploy a helm chart as part of each trial run.

The helm setup task should include the chart name, a repository, and either a set of helm values inline or supplied through a configmap. When helm values are supplied inline, the value field is evaluated as a Go template which allows you to consume the supplied parameters through the .Values struct.

An example of a helm setup task can be seen in our elasticsearch example:

spec:
  trialTemplate:
    spec:
      setupTasks:
      - name: elasticsearch
        helmChart: elasticsearch
        helmChartVersion: 7.9.2
        helmRepository: https://helm.elastic.co
        helmValues:
        - name: cluster.name
          value: rally-demo
        - name: replicas
          value: "{{ .Values.replicas }}"
        - name: resources.limits.cpu
          value: "{{ .Values.cpu }}m"
        - name: resources.limits.memory
          value: "{{ .Values.memory }}Mi"
        - name: resources.requests.cpu
          value: "{{ .Values.cpu }}m"
        - name: resources.requests.memory
          value: "{{ .Values.memory }}Mi"
        - name: esJavaOpts
          value: "-Djava.net.preferIPv4Stack=true -Xms{{ percent .Values.memory .Values.heap_percent }}m -Xmx{{ percent .Values.memory .Values.heap_percent }}m"
        - name: persistence.enabled
          value: "false"
        - name: antiAffinity
          value: soft

Prometheus

The Prometheus setup task provides the ability to deploy a Prometheus instance for each trial run. This includes a Prometheus server, kube state metrics, push gateway, and configmap reloader.

The bundled Prometheus instance scrapes a very limited set of targets - nodes, kube state metrics, and the push gateway. The included configuration is enough to provide the ability to capture resource requests and limits as well as measure cpu and memory utilization.

spec:
  trialTemplate:
    spec:
      setupTasks:
      - name: monitoring
        args:
        - prometheus
        - $(MODE)

Readiness Gates

Readiness gates provide the ability to wait for additional conditions to be present before creating the trial job.

spec:
  trialTemplate:
    spec:
      readinessGates:
      - kind: Deployment
        apiVersion: apps/v1
        name: postgres
        conditionTypes:
        - Available
        periodSeconds: 5
        failureThreshold: 10

Example

We’ll be making use of the included locust load tester from the Online Boutique example.

We’ve added a Readiness Gate to ensure that the load tester is available before starting the trial.

Our trial job consists of a small shell script. We’ll reset the load tester stats, kick off a new load test, wait 3 minutes, and then stop the load test.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
apiVersion: redskyops.dev/v1beta1
kind: Experiment
metadata:
  name: shopping
spec:
  parameters:
  - name: frontendCpu
    min: 50
    max: 1000
    baseline: 100
  - name: frontendMemory
    min: 16
    max: 512
    baseline: 64
  - name: catalogCpu
    min: 50
    max: 1000
    baseline: 100
  - name: catalogMemory
    min: 16
    max: 512
    baseline: 64
  patches:
  - targetRef:
      kind: Deployment
      apiVersion: apps/v1
      name: frontend
    patch: |
      spec:
        template:
          spec:
            containers:
            - name: server
              resources:
                limits:
                  cpu: "{{ .Values.frontendCpu }}m"
                  memory: "{{ .Values.frontendMemory }}Mi"
                requests:
                  cpu: "{{ .Values.frontendCpu }}m"
                  memory: "{{ .Values.frontendMemory }}Mi"      
  - targetRef:
      kind: Deployment
      apiVersion: apps/v1
      name: productcatalogservice
    patch: |
      spec:
        template:
          spec:
            containers:
            - name: server
              resources:
                limits:
                  cpu: "{{ .Values.catalogCpu }}m"
                  memory: "{{ .Values.catalogMemory }}Mi"
                requests:
                  cpu: "{{ .Values.catalogCpu }}m"
                  memory: "{{ .Values.catalogMemory }}Mi"      
  metrics:
  - name: latency
    minimize: true
    type: jsonpath
    query: '{.current_response_time_percentile_95}'
    path: '/stats/requests'
    max: "700"
    port: 80
    selector:
      matchLabels:
        app: loadgenerator
  - name: throughput
    optimize: false
    type: jsonpath
    query: '{.total_rps}'
    path: '/stats/requests'
    port: 80
    selector:
      matchLabels:
        app: loadgenerator
  trialTemplate:
    spec:
      readinessGates:
      - kind: Deployment
        apiVersion: apps/v1
        name: loadgenerator
        conditionTypes:
        - redskyops.dev/app-ready
        InitialDelaySeconds: 5
        periodSeconds: 5
        failureThreshold: 30
      jobTemplate:
        spec:
          template:
            spec:
              containers:
              - name: main
                image: ghcr.io/thestormforge/setuptools:edge
                command:
                - /bin/sh
                args:
                - -c
                - |
                  curl http://locust/stats/reset && \
                  sleep 5 && \
                  curl -X POST -F 'user_count=20' -F 'hatch_rate=10' http://locust/swarm && \
                  sleep 180 && \
                  curl http://locust/stop && \
                  curl http://locust/stats/requests                  

Now that we’ve got our trial defined, we’re ready to try things out!

Bringing it all together

Next we’ll deploy the microservices demo and start the experiment.

We’ll use kustomize to handle the deployment so we can adjust included load generator. Let’s save our experiment in our current working directory as experiment.yaml. Following that, create a kustomization.yaml in your current working directory with the following content:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

namespace: default

bases:
- https://raw.githubusercontent.com/GoogleCloudPlatform/microservices-demo/master/release/kubernetes-manifests.yaml
- experiment.yaml

patchesJson6902:
- target:
    group: apps
    version: v1
    kind: Deployment
    name: loadgenerator
  patch: |
    - op: replace
      path: /spec/template/spec/containers/0/command
      value: [ "/usr/local/bin/locust" ]
    - op: replace
      path: /spec/template/spec/containers/0/args
      value: [ "--host=http://$(FRONTEND_ADDR)", "--users=$(USERS)" ]    

After creating the kustomization, we’ll deploy it to our cluster.

$ kustomize build . | kubectl apply -f - && \
  kubectl expose deployment loadgenerator --name locust --port 80 --target-port 8089 --labels app=loadgenerator
service/adservice created
service/cartservice created
service/checkoutservice created
service/currencyservice created
service/emailservice created
service/frontend created
service/frontend-external created
service/paymentservice created
service/productcatalogservice created
service/recommendationservice created
service/redis-cart created
service/shippingservice created
deployment.apps/adservice created
deployment.apps/cartservice created
deployment.apps/checkoutservice created
deployment.apps/currencyservice created
deployment.apps/emailservice created
deployment.apps/frontend created
deployment.apps/loadgenerator created
deployment.apps/paymentservice created
deployment.apps/productcatalogservice created
deployment.apps/recommendationservice created
deployment.apps/redis-cart created
deployment.apps/shippingservice created
experiment.redskyops.dev/shopping created
service/locust exposed

Now we should see the experiment in a running state and new trials created.

NAME                                STATUS
experiment.redskyops.dev/shopping   Running

NAME                                         READY   STATUS        RESTARTS   AGE
pod/adservice-5f6f7c76f5-rvtwf               1/1     Running       0          25m
pod/cartservice-675b6659c8-h8jc2             1/1     Running       0          25m
pod/checkoutservice-85d4b74f95-hz8gx         1/1     Running       0          25m
pod/currencyservice-6d7f8fc9fc-rk5c8         1/1     Running       0          25m
pod/emailservice-798f4f5575-cb6nw            1/1     Running       0          25m
pod/frontend-56d87bd5d8-4dtkj                0/1     Running       0          11s
pod/frontend-69b9bfb898-n2qv9                1/1     Running       0          5m12s
pod/loadgenerator-56db8dcc67-qllmn           1/1     Running       0          25m
pod/paymentservice-98cb47fff-6s87x           1/1     Running       0          25m
pod/productcatalogservice-577b7dc56-hfcn2    1/1     Running       0          11s
pod/productcatalogservice-bd46bcf5b-9nxz7    0/1     Terminating   0          5m12s
pod/recommendationservice-5bf5bcbbdf-pc2bq   1/1     Running       0          25m
pod/redis-cart-74594bd569-4pqg7              1/1     Running       0          25m
pod/shippingservice-75f7f9dc6c-hbcfq         1/1     Running       0          25m

NAME                               STATUS    ASSIGNMENTS                                                            VALUES
trial.redskyops.dev/shopping-000   Waiting   frontendCpu=100, frontendMemory=64, catalogCpu=100, catalogMemory=64

Congratulations, you’ve created your first experiment!


Last modified February 3, 2021