Kube Monkey
Kube-monkey is an open-source implementation of Chaos Monkey for use on Kubernetes clusters and written in Go. Like the original Chaos Monkey, Kube-monkey performs just one task: it randomly deletes Kubernetes pods within the cluster, as a means of injecting failure in the system and testing the stability of the remaining pods. It is based on pseudo-random rules, running at a pre-defined hour on weekdays to then build a schedule. Based on the generated schedule random pod targets that will be attacked and killed at a random time during that same day, although the time-range is configurable.
Kube-monkey will only terminate pods that have explicitly opted in by specifying certain Kube-monkey metadata labels
. The following illustrates the basic labels that can be specified to allow Kube-monkey to kill pods within the application.
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: monkey-victim
namespace: app-namespace
labels:
kube-monkey/enabled: enabled
kube-monkey/identifier: monkey-victim
kube-monkey/mtbf: '2'
kube-monkey/kill-mode: 'fixed'
kube-monkey/kill-value: 1
spec:
template:
metadata:
labels:
kube-monkey/enabled: enabled
kube-monkey/identifier: monkey-victim
# ...
Check out the GitHub repository for more information on installing and using Kube-monkey.
Engineering Chaos In Kubernetes with Gremlin
Gremlin Free simplifies your Chaos Engineering workflow for Kubernetes by making it safe and effortless to execute Chaos Experiments across all nodes. As a distributed architecture Kubernetes is particularly sensitive to instability and unexpected failures. Gremlin Free can overload the CPU and shutdown nodes on your Kubernetes clusters.
Check out this tutorial over on our community site to get started!
Kubernetes Pod Chaos Monkey
Kubernetes Pod Chaos Monkey is a Chaos Monkey-style tool for Kubernetes. The code itself is a local shell script that issues kubectl commands to occasionally locate and then delete Kubernetes pods. It targets a cluster based on the configurable NAMESPACE
and attempts to destroy a node every DELAY
seconds (defaulting to 30).
Since Kubernetes Pod Chaos Monkey is essentially a simple shell script it can be modified quite easily.
The Chaos Toolkit
The Chaos Toolkit is an open-source and extensible tool that is written in Python. It uses platform-specific drivers to connect to your Kubernetes cluster and execute Chaos Experiments. Every experiment performed by Chaos Toolkit is written in JSON using a robust API. Experiments are made up of a few key elements that are executed sequentially and allow the experiment to bail out if any step in the process fails.
-
Steady State Hypothesis: This element defines the normal or "steady" state of the system before the Method element is applied. Here we've defined a basic application with a steady state hypothesis titled "Service should have nodes."
{ "version": "1.0.0", "title": "Gremlin EKS App", "description": "Gremlin EKS App", "tags": ["service", "kubernetes"], "steady-state-hypothesis": { "title": "Service should have nodes.", "probes": [ { "type": "probe", "name": "nodes_found", "tolerance": true, "provider": { "type": "python", "module": "chaosk8s.node.probes", "func": "get_nodes", "arguments": { "label_selector": "eks-gremlin-chaos" } } } ] } }
-
Probe: A Probe is an element that collects system information, such as checking the health status of a node. Here we define a Probe element, which we've added to our steady state Probes list above, that calls the
get_nodes
function and retrieves the list of nodes for the specifiedlabel-selector
.{ "type": "probe", "name": "nodes_found", "tolerance": true, "provider": { "type": "python", "module": "chaosk8s.node.probes", "func": "get_nodes", "arguments": { "label_selector": "eks-gremlin-chaos" } } }
-
Action: An Action element performs an operation against the system, such as draining or deleting a node. In the example we call the
delete_nodes
function, passing the requiredlabel-selector
argument, and settingall
totrue
so we delete all nodes in the cluster.{ "type": "action", "name": "delete_all_nodes", "provider": { "type": "python", "module": "chaosk8s.node.actions", "func": "delete_nodes", "arguments": { "all": true, "label-selector": "eks-gremlin-chaos" } } }
-
Method: A Method element defines the series of Probe and Action elements that make up the experiment. Here we're first using the
nodes_found
Probe to make sure nodes exist, executing thedelete_all_nodes
Action to delete all nodes in the cluster, then performing another explicit Probe to verify that no nodes remain."method": [ { "ref": "nodes_found" }, { "type": "action", "name": "delete_all_nodes", "provider": { "type": "python", "module": "chaosk8s.node.actions", "func": "delete_nodes", "arguments": { "all": true, "label-selector": "eks-gremlin-chaos" } } }, { "type": "probe", "name": "nodes_not_found", "tolerance": false, "provider": { "type": "python", "module": "chaosk8s.node.probes", "func": "get_nodes", "arguments": { "label_selector": "eks-gremlin-chaos" } } } ]
That's the basics to begin experimenting using the Chaos Toolkit. Chaos Toolkit also has a fault injection plugin for Gremlin so you can easily perform attacks while utilizing the safety and security of the Gremlin platform.