Attacks
Gremlin is a simple, safe, and secure way to use Chaos Engineering to improve system resilience. The Gremlin Platform provides a range of attacks which you can run against your infrastructure. This includes Resource Gremlins, Network Gremlins and State Gremlins. It is also possible to schedule regular attacks, create attack templates, and view attack reports.
Gremlin provides a library of possible failure modes to test. You can impact system resources, delay or drop network traffic to your dependencies, shut down your hosts, and much more!
Visit the attack creation page to start testing your infrastructure today. Go to the attacks list page to monitor ongoing and historic attacks.
Each attack, or "gremlin", tests your resilience in a different way:
Resource Gremlins
Resource gremlins are a great starting point -- simple to run and understand. They reveal how your service degrades when starved of CPU, memory, IO, or disk.
Gremlin | Impact |
---|---|
CPU | Generates high load for one or more CPU cores. |
Memory | Allocates a specific amount of RAM. |
IO | Puts read/write pressure on I/O devices such as hard disks. |
Disk | Writes files to disk to fill it to a specific percentage. |
State Gremlins
State gremlins introduce chaos into your infrastructure so that you can observe how well your service handles it or fails.
Gremlin | Impact |
---|---|
Shutdown | Reboots or halts the host operating system to test how your system behaves when losing one or more cluster machines. |
Time travel | Changes the host's system time, which can be used to simulate adjusting to daylight saving time and other time-related events. |
Process killer | Kills the specified process, which can be used to simulate application or dependency crashes. |
Network Gremlins
Network gremlins allow you to see the impact of lost or delayed traffic to your application. Test how your service behaves when you are unable to reach one of your dependencies, internal or external. Limit the impact to only the traffic you want to test by specifying ports, hostnames, and IP addresses.
Gremlin | Impact |
---|---|
Blackhole | Drops all matching network traffic. |
Latency | Injects latency into all matching egress network traffic. |
Packet loss | Induces packet loss into all matching egress network traffic. |
DNS | Blocks access to DNS servers. |
Warning: Important considerations for targeting Kubernetes Pods with Network Attacks
Attack Stage Progression
Every Attack on Gremlin is composed of one or more Executions, where each Execution is an instance of the attack running on a specific target.
The Stage progression of an Attack is derived from the Stage progression of all of an Attack's Executions. Gremlin weighs the Importance of Stages so as to mark an Attack with the most important Stage of its executions.
Example
An Attack with three Executions will derive its final reported stage by picking the most important stage from among its executions. So, if the three Execution Stages are TargetNotFound, Running, TargetNotFound
, the resulting stage for the Attack will be Running
.
You can see Stages ordered by their importance below.
Stages
Stages are sorted by descending order of importance (the Running
Stage holds the highest importance)
Stage | Description |
---|---|
Running | Attack running on the host |
Halt | Attack told to halt |
RollbackStarted | Code to rollback has started |
RollbackTriggered | Daemon started a rollback of client |
InterruptTriggered | Daemon issued an interrupt to the client |
HaltDistributed | Distributed to the host but not yet halted |
Initializing | Attack is creating the desired impact |
Distributed | Distributed to the host but not yet running |
Pending | Created but not yet distributed |
Failed | Client reported unexpected failure |
HaltFailed | Halt on client did not complete |
InitializationFailed | Creating the impact failed |
LostCommunication | Client never reported finishing/receiving execution |
ClientAborted | Something on the client/daemon side stopped the Gremlin and it was aborted without user intervention |
UserHalted | User issued a halt, and that is now complete |
Successful | Completed running on the Host |
TargetNotFound | Attack not scoped to any current targets |
Parameter Reference
Resource Gremlins
CPU
Parameter | Flag | Required | Default | Version | Description |
---|---|---|---|---|---|
Length | -l int | False | 60 |
0.0.1 | The length of the attack (seconds). |
Cores | -c int | False | 1 |
0.0.1 | The number of cores to try to utilize. |
Percent | -p <0-100> | False | 100 |
2.11.0 | The percent of each core to utilize. |
All Cores | -a | False | False |
2.11.0 | If set, consume all available cores (cannot be used with -c parameter). |
When the CPU Gremlin runs, the OS Scheduler decides where the process will run and your application (and all other processes) will compete for CPU time. There's no guarantee that our process will all block others.
Disk
Parameter | Flag | Required | Default | Version | Description |
---|---|---|---|---|---|
Length | -l int | False | 60 |
1.8.0 | The length of the attack (seconds). |
Dir | -d path | False | /tmp |
1.8.0 | The root directory for the IO attack. |
Workers | -w int | False | 1 |
1.8.0 | The number of diskwrite workers to run concurrently. |
Block Size | -b int | False | 4 |
1.8.0 | Number of Kilobytes (KB) that are read/written at a time. |
Volume Percentage | -p <0-100> | False | 100 |
1.8.0 | Percent of Volume to fill (0100). |
IO
Parameter | Flag | Required | Default | Version | Description |
---|---|---|---|---|---|
Length | -l int | False | 60 |
1.4.0 | The length of the attack (seconds). |
Dir | -d path | False | /tmp |
1.4.0 | The root directory for the IO attack. |
Workers | -w int | False | 1 |
1.4.0 | The number of IO workers to run concurrently. |
Mode | -m <r,w,rw> | False | rw |
1.4.0 | Do only reads, only writes, or both. |
Block Size | -s int | False | 4 |
1.4.0 | Number of Kilobytes (KB) that are read/written at a time. |
Block Count | -c int | False | 1 |
1.4.0 | The number of blocks read/written by workers. |
Memory
Parameter | Flag | Required | Default | Version | Description |
---|---|---|---|---|---|
Length | -l int | False | 60 |
0.0.1 | The length of the attack (seconds). |
MB | -m int | False | 0.0.1 | The number of megabytes to allocate. | |
GB | -g float | False | 0.5 |
0.0.1 | The number of gigabytes to allocate. |
Percentage* | -p <0-100> | False | 100 |
2.8.30 | The percentage of total memory to allocate. |
State Gremlins
Process Killer
Parameter | Flag | Required | Default | Version | Description |
---|---|---|---|---|---|
Length | -l int | False | 60 |
1.8.0 | The length of the attack (seconds). |
Interval | -i int | False | 1 |
1.8.0 | The number of seconds to delay before kills. |
Process | -p reg ex or int | True | 1.8.0 | The process name to match (allows regex) or the process ID. | |
Group | -g char | False | 1.8.0 | The group name or ID to match against (name matches only). | |
User | -u char | False | 1.8.0 | The user name or ID to match against (name matches only). | |
Newest | -n | False | False |
1.8.0 | If set the newest matching process will be killed (name matches only, cannot be used with -o). |
Oldest | -o | False | False |
1.8.0 | If set the oldest matching process will be killed (name matches only, cannot be used with -n). |
Exact | -e | False | False |
1.8.0 | If set the match must be exact and not just a substring match (name matches only). |
Kill Children | -c | False | False |
1.8.0 | If set the processes children will also be killed. |
Full Match | -f | False | False |
1.8.0 | If set the processes name match will occur against the full command line string that the process was launched with. |
Shutdown
Parameter | Flag | Required | Default | Version | Description |
---|---|---|---|---|---|
Delay | -d int | False | 1 |
0.0.1 | The number of minutes to delay before shutting down. |
Reboot | -r | False | True |
0.0.1 | Indicates the host should reboot after shutting down. |
Time Travel
Parameter | Flag | Required | Default | Version | Description |
---|---|---|---|---|---|
Length | -l int | False | 60 |
1.5.0 | The length of the attack (seconds). |
NTP | -n | False | False |
1.5.0 | Disable NTP from correcting systemtime. |
Offset | -o int | False | 86400 |
1.5.0 | The offset to the current time (seconds). |
Network Gremlins
Blackhole
Parameter | Flag | Required | Default | Version | Description |
---|---|---|---|---|---|
Length | -l int | False | 60 |
0.0.1 | The length of the attack (seconds). |
IP Addresses [M] [W] | -i IP address | False | 0.0.1 | Only impact traffic to these IP addresses. Also accepts CIDR values (i.e. 10.0.0.0/24 ). |
|
Device | -d interface | False | eth0 |
0.0.1 | Impact traffic over this network interface. |
Hostnames [M] [W] | -h hostnames | False | ^api.gremlin.com |
0.0.1 | Only impact traffic to these hostnames. |
Egress Ports [M] [W] | -p port numbers | False | ^53 |
0.0.1 | Only impact egress traffic to these destination ports. Also accepts port ranges (e.g. 8080-8085 ). |
Ingress Ports [M] [W] | -n port numbers | False | 0.0.1 | Only impact ingress traffic to these destination ports. Also accepts port ranges (e.g. 8080-8085 ). |
|
Protocol | -P {TCP, UDP, ICMP} | False | all | 1.5.3 | Only impact a specific protocol |
Latency
Parameter | Flag | Required | Default | Version | Description |
---|---|---|---|---|---|
Length | -l int | False | 60 |
0.0.1 | The length of the attack (seconds). |
IP Addresses [M] [W] | -i IP address | False | 0.0.1 | Only impact traffic to these IP addresses. Also accepts CIDR values (i.e. 10.0.0.0/24 ). |
|
Device | -d interface | False | eth0 |
0.0.1 | Impact traffic over this network interface. |
Hostnames [M] [W] | -h hostnames | False | ^api.gremlin.com |
0.0.1 | Only impact traffic to these hostnames. |
Egress Ports [M] [W] | -p port numbers | False | ^53 |
0.0.1 | Only impact egress traffic to these destination ports. Also accepts port ranges (e.g. 8080-8085 ). |
Source Ports [M] | -s port numbers | False | 0.0.1 | Only impact egress traffic from these source ports. Also accepts port ranges (e.g. 8080-8085 ). |
|
MS | -m int | False | 100 |
0.0.1 | How long to delay egress packets (millis). |
Protocol | -P {TCP, UDP, ICMP} | False | all | 1.5.3 | Only impact a specific protocol |
DNS
Parameter | Flag | Required | Default | Version | Description |
---|---|---|---|---|---|
Length | -l int | False | 60 |
1.4.7 | The length of the attack (seconds). |
IP Addresses [M] [W] | -i IP address | False | 1.4.7 | Only impact traffic to these IP addresses. Also accepts CIDR values (i.e. 10.0.0.0/24 ). |
|
Device | -d interface | False | eth0 |
1.4.7 | Impact traffic over this network interface. |
Protocol | -P {TCP, UDP, ICMP} | False | all | 1.4.7 | Only impact a specific protocol |
Packet Loss
Parameter | Flag | Required | Default | Version | Description |
---|---|---|---|---|---|
Length | -l int | False | 60 |
0.0.1 | The length of the attack (seconds). |
IP Addresses [M] [W] | -i IP address | False | 0.0.1 | Only impact traffic to these IP addresses. Also accepts CIDR values (i.e. 10.0.0.0/24 ). |
|
Device | -d interface | False | eth0 |
0.0.1 | Impact traffic over this network interface. |
Hostnames [M] [W] | -h hostnames | False | ^api.gremlin.com |
0.0.1 | Only impact traffic to these hostnames. |
Egress Ports [M] [W] | -p port numbers | False | ^53 |
0.0.1 | Only impact egress traffic to these destination ports. Also accepts port ranges (e.g. 8080-8085 ). |
Source Ports [M] | -s port numbers | False | 0.0.1 | Only impact egress traffic from these source ports. Also accepts port ranges (e.g. 8080-8085 ). |
|
Percent | -r <0-100> | False | 1 |
0.0.1 | Percentage of packets to drop (10 is 10%). |
Protocol | -P {TCP, UDP, ICMP} | False | all | 1.5.3 | Only impact a specific protocol |
Corrupt | -c | False | False |
0.0.1 | Corrupt the packets instead of just dropping them. |
[M] Multiple Values
Port and address options can be used multiple times in a single command.
# Attack both DynamoDB and database.mydomain.org
$ gremlin attack latency -h dynamodb.us-west-1.amazonaws.com -h database.mydomain.org
Alternatively, a ,
can also be used to specify multiple values.
$ gremlin attack latency -p 8080,443
[W] Whitelisting
A ^
can be used before a port or address to whitelist that parameter.
Note: If only a whitelist is supplied, all other traffic is impacted.
# Slow down all ports except DNS port
$ gremlin attack latency -p ^53
This can be particularly useful for whitelisting a specific IP from a range.
# Blackhole all hosts in 10.0.0.0/24 except for 10.0.0.11
$ gremlin attack blackhole -i 10.0.0.0/24 -i ^10.0.0.11