We've collected and curated well over 100 resources to help you with every aspect of your journey into Chaos Engineering. Learn about Chaos Engineering's origins and principles to shed light on what it's all about or dive right into one of the dozens of in-depth tutorials to get experimenting right away. You might also be interested in subscribing to some of the best Chaos Engineering blogs on the net or installing one of the many tools designed to inject failure into your applications, no matter the platform.
Chaos Engineering Best Practices & Principles
Without proper practices and principles Chaos Engineering becomes little more than unstructured Chaos. This section features a collection of the some of the most fundamental Chaos Engineering articles, practices, and principles ever devised.
- Chaos Engineering: The History, Principles, and Practice
- FIT: Failure Injection Testing
- ChAP: Chaos Automation Platform
- Chaos Engineering - O'Reilly Media
- The Limitations of Chaos Engineering
- 12 Factor Applications with Docker and Go
- Exploring Multi-level Weaknesses Using Automated Chaos Experiments
- Lineage Driven Failure Injection
- Failure Testing for Your Private Cloud
- Chaos Monkey for the Enterprise Cloud
- Chaos Engineering 101
- Chaos Monkey for Fun and Profit
- Chaos Monkey: Increasing SDN Reliability Through Systematic Network Destruction
- Agile DevOps: Unleash the Chaos Monkey
- Automating Failure Testing Research at Internet Scale
- The Netflix Simian Army
- Chaos Monkey Whitepaper
- Your First Chaos Experiment
Chaos Engineering Blogs
One-off articles and tutorials have their place, but staying abreast of the latest Chaos Engineering news and technologies requires a constant drip of relevant information. The following blogs and community sites provide some of the most up-to-date SRE and Chaos Engineering content on the web.
- Gremlin Blog
- The Netflix Tech Blog
- Microsoft Azure Blog
- Spinnaker Blog
- AWS Open Source Blog
- SRE Weekly Newsletter
- LaunchDarkly Blog
- Coding Horror Blog
- Hut 8 Labs Blog
Chaos Engineering Community & Culture
A day doesn't go by without multiple people joining the Chaos Engineering Slack Channel! It's an exciting time to hop onto the Chaos Engineering train, but that journey wouldn't be nearly as interesting without the incredible culture and community that has built up around Chaos Engineering. This collection of resources contains just some of the many awesome aspects of the Chaos Engineering community.
- The Chaos Engineering Slack Channel
- Chaos Engineering - Companies, People, Tools & Practices
- Chaos Conf - The Chaos Engineering Community Conference
- Gremlin Community
- Inside Azure Search: Chaos Engineering
- Breaking Things on Purpose
- Planning Your Own Chaos Day
- Business Continuity Plan & Disaster Recovery is Too Old
- Kafka in a Nutshell
- Can Spark Streaming survive Chaos Monkey?
- The Cloudcast #299 - The Discipline of Chaos Engineering
- Who is this Chaos Monkey and why did he crash my server?
- Netflix Chaos Monkey Upgraded
- Chaos Monkey and Resilience Testing - Insights From the Professionals
- Bees And Monkeys: 5 Cloud Lessons NAB Learned From AWS
- Working with the Chaos Monkey
- You've Heard of the Netflix Chaos Monkey? We Propose, for Cyber-Security, an "Infected Monkey"
- Building Your own Chaos Monkey
- Automated Failure Testing
- Active-Active for Multi-Regional Resiliency
- Post-Mortem of October 22, 2012 AWS Degradation
- Netflix to Open Source Army of Cloud Monkeys
- Chaos Engineering Upgraded
- When it Comes to Chaos, Gorillas Before Monkeys
- Continuous Chaos: Never Stop Iterating
- Oh Noes! The Sky is Falling!
Chaos Engineering Talks
As more people take up the banner of Chaos Engineering we're treated to even more incredible presentations from some of the most brilliant minds in the field. We've gathered a handful of the most ground-breaking and influential of these talks below.
- Intro to Chaos Engineering
- Testing In Production, The Netflix Way
- The Case for Chaos: Thinking About Failure Holistically
- 1000 Actors, One Chaos Monkey and... Everything OK
- Orchestrating Mayhem Functional Chaos Engineering
- Using Chaos Engineering to Level Up Kafka Skills
- Chaos Engineering for vSphere
- Unbreakable: Learning to Bend But Not Break at Netflix
- Automating Chaos Experiments in Production
- Resiliency Through Failure - Netflix's Approach to Extreme Availability in the Cloud
Chaos Engineering Tools
Proper tooling is the backbone of thoughtful and well-executed Chaos Engineering. As we showed in the Chaos Monkey Alternatives chapter, no matter what technology or platform you prefer, there are tools out there to begin injecting failure and to help you learn how to create more resilient systems.
- Awesome Chaos Engineering: A curated list of Chaos Engineering resources
- Gremlin: Break things on purpose.
- Chaos Toolkit: Chaos Engineering Experiments, Automation, & Orchestration
- Marathon: A container orchestration platform for Mesos and DC/OS
- WazMonkey: A simple tool for testing resilience of Windows Azure cloud services
- Pumba: Chaos testing and network emulation tool for Docker
- Docker Simian Army: Docker image of Netflix's Simian Army
- Docker Chaos Monkey: A Chaos Monkey system for Docker Swarm
- Chaos Monkey - Elixir: Kill Elixir processes randomly
- Chaos Spawn: Chaotic spawning for elixir
- GoogleCloudChaosMonkey: Google Cloud Chaos Monkey tool
- Chaos Toolkit- Google Cloud: Chaos Extension for the Google Cloud Engine platform
- Kube Monkey: An implementation of Netflix's Chaos Monkey for Kubernetes clusters
- Pod Reaper: Rule based pod killing kubernetes controller
- Powerful Seal: A powerful testing tool for Kubernetes clusters.
- Monkey Ops: Chaos Monkey for OpenShift V3.X
- GomJabbar: Chaos Monkey for your private cloud
- Toxiproxy: A TCP proxy to simulate network and system conditions for chaos and resiliency testing
- Chaos Lemur: An adaptation of the Chaos Monkey concept to BOSH releases
- Chaos Monkey: A resiliency tool that helps applications tolerate random instance failures
- Vegeta: HTTP load testing tool and library.
- Simian Army: Tools for keeping your cloud operating in top form
- Security Monkey: Monitors AWS, GCP, OpenStack, and GitHub orgs for assets and their changes over time
- The Chaos Monkey Army
- Chaos Monkey Engine: A Chaos Engineering swiss army knife
- 10 open-source Kubernetes tools for highly effective SRE and Ops Teams
- Chaos Lambda: Randomly terminate ASG instances during business hours
- Byte Monkey: Bytecode-level fault injection for the JVM
- Blockade: Docker-based utility for testing network failures and partitions in distributed applications
- Muxy: Chaos Engineering tool for simulating real-world distributed system failures
- Chaos Hub: A Chaos Engineering Control Plane
- Chaos Toolkit Demos
- OS Faults: An OpenStack fault injection library
- Curated list of resources on testing distributed systems
- Anarchy Ape: Fault injection tool for Hadoop clusters
- Hadoop Killer: A process-based fault injector for Java
Chaos Engineering Tutorials
Before you can swim in the deep end of Chaos Engineering you'll need to start by getting your feet wet. We've accumulated a number of tutorials covering just about every platform and technology you could be using, all of which provide a great jumping-off point to start executing Chaos Experiments in your own systems.
- How to Install and Use Gremlin on Ubuntu 16.04
- How to Deploy - Chaos Monkey
- 4 Chaos Experiments to Start With
- How to Setup and Use the Gremlin Datadog Integration
- Chaos Engineering and Mesos
- Create Chaos and Failover Tests for Azure Service Fabric
- Induce Chaos in Service Fabric Clusters
- How to Install and Use Gremlin with Kubernetes
- Chaos Testing for Docker Containers
- How to Install and Use Gremlin with Docker on Ubuntu 16.04
- Pumba - Chaos Testing for Docker
- Running Chaos Monkey on Spinnaker/Google Compute Engine
- Observing the Impact of Swapping Nodes in GKE with Chaos Engineering
- Chaos Monkey for Spring Boot
- How to Install and Use Gremlin on CentOS 7
- Improve Your Cloud Native DevOps Flow with Chaos Engineering
- Chaos Experiment: Split Braining Akka Clusters
- Kubernetes Chaos Monkey on IBM Cloud Private
- Introduction to Chaos Monkey
- Using Chaos Monkey Whenever You Feel Like It
- SimianArmy Wiki
- Continuous Delivery with Spinnaker
- Sailing with Spinnaker on AWS
- Chaos on OpenShift Clusters
- Automating a Chaos Engineering Environment on AWS with Terraform
- Gremlin Gameday: Breaking DynamoDB