Chaos Monkey Alternatives

Kubernetes

4 min read
Last Updated October 17, 2018

Kube Monkey

Kube-monkey is an open-source implementation of Chaos Monkey for use on Kubernetes clusters and written in Go. Like the original Chaos Monkey, Kube-monkey performs just one task: it randomly deletes Kubernetes pods within the cluster, as a means of injecting failure in the system and testing the stability of the remaining pods. It is based on pseudo-random rules, running at a pre-defined hour on weekdays to then build a schedule. Based on the generated schedule random pod targets that will be attacked and killed at a random time during that same day, although the time-range is configurable.

Kube-monkey will only terminate pods that have explicitly opted in by specifying certain Kube-monkey metadata labels. The following illustrates the basic labels that can be specified to allow Kube-monkey to kill pods within the application.

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: monkey-victim
  namespace: app-namespace
  labels:
    kube-monkey/enabled: enabled
    kube-monkey/identifier: monkey-victim
    kube-monkey/mtbf: '2'
    kube-monkey/kill-mode: 'fixed'
    kube-monkey/kill-value: 1
spec:
  template:
    metadata:
      labels:
        kube-monkey/enabled: enabled
        kube-monkey/identifier: monkey-victim
# ...

Check out the GitHub repository for more information on installing and using Kube-monkey.

Engineering Chaos In Kubernetes with Gremlin

Gremlin Free simplifies your Chaos Engineering workflow for Kubernetes by making it safe and effortless to execute Chaos Experiments across all nodes. As a distributed architecture Kubernetes is particularly sensitive to instability and unexpected failures. Gremlin Free can overload the CPU and shutdown nodes on your Kubernetes clusters.

Check out this tutorial over on our community site to get started!

Kubernetes Pod Chaos Monkey

Kubernetes Pod Chaos Monkey is a Chaos Monkey-style tool for Kubernetes. The code itself is a local shell script that issues kubectl commands to occasionally locate and then delete Kubernetes pods. It targets a cluster based on the configurable NAMESPACE and attempts to destroy a node every DELAY seconds (defaulting to 30).

Since Kubernetes Pod Chaos Monkey is essentially a simple shell script it can be modified quite easily.

The Chaos Toolkit

The Chaos Toolkit is an open-source and extensible tool that is written in Python. It uses platform-specific drivers to connect to your Kubernetes cluster and execute Chaos Experiments. Every experiment performed by Chaos Toolkit is written in JSON using a robust API. Experiments are made up of a few key elements that are executed sequentially and allow the experiment to bail out if any step in the process fails.

  • Steady State Hypothesis: This element defines the normal or "steady" state of the system before the Method element is applied. Here we've defined a basic application with a steady state hypothesis titled "Service should have nodes."

    {
    "version": "1.0.0",
    "title": "Gremlin EKS App",
    "description": "Gremlin EKS App",
    "tags": ["service", "kubernetes"],
    "steady-state-hypothesis": {
      "title": "Service should have nodes.",
      "probes": [
        {
          "type": "probe",
          "name": "nodes_found",
          "tolerance": true,
          "provider": {
            "type": "python",
            "module": "chaosk8s.node.probes",
            "func": "get_nodes",
            "arguments": {
              "label_selector": "eks-gremlin-chaos"
            }
          }
        }
      ]
    }
    }
  • Probe: A Probe is an element that collects system information, such as checking the health status of a node. Here we define a Probe element, which we've added to our steady state Probes list above, that calls the get_nodes function and retrieves the list of nodes for the specified label-selector.

    {
    "type": "probe",
    "name": "nodes_found",
    "tolerance": true,
    "provider": {
      "type": "python",
      "module": "chaosk8s.node.probes",
      "func": "get_nodes",
      "arguments": {
        "label_selector": "eks-gremlin-chaos"
      }
    }
    }
  • Action: An Action element performs an operation against the system, such as draining or deleting a node. In the example we call the delete_nodes function, passing the required label-selector argument, and setting all to true so we delete all nodes in the cluster.

    {
    "type": "action",
    "name": "delete_all_nodes",
    "provider": {
      "type": "python",
      "module": "chaosk8s.node.actions",
      "func": "delete_nodes",
      "arguments": {
        "all": true,
        "label-selector": "eks-gremlin-chaos"
      }
    }
    }
  • Method: A Method element defines the series of Probe and Action elements that make up the experiment. Here we're first using the nodes_found Probe to make sure nodes exist, executing the delete_all_nodes Action to delete all nodes in the cluster, then performing another explicit Probe to verify that no nodes remain.

    "method": [
      {
          "ref": "nodes_found"
      },
      {
          "type": "action",
          "name": "delete_all_nodes",
          "provider": {
              "type": "python",
              "module": "chaosk8s.node.actions",
              "func": "delete_nodes",
              "arguments": {
                  "all": true,
                  "label-selector": "eks-gremlin-chaos"
              }
          }
      },
      {
          "type": "probe",
          "name": "nodes_not_found",
          "tolerance": false,
          "provider": {
              "type": "python",
              "module": "chaosk8s.node.probes",
              "func": "get_nodes",
              "arguments": {
                  "label_selector": "eks-gremlin-chaos"
              }
          }
      }
    ]

That's the basics to begin experimenting using the Chaos Toolkit. Chaos Toolkit also has a fault injection plugin for Gremlin so you can easily perform attacks while utilizing the safety and security of the Gremlin platform.

Download PDF