Overview
Gremlin is a simple, safe and secure service for performing Chaos Engineering experiments through a SaaS-based platform.
This tutorial will show you how to use Gremlin Scenarios to reproduce the AWS S3 Outage.
If you are interested in learning more about the outage, we have shared a detailed analysis of the 2017 S3 Outage on our Gremlin Blog. A good question to ask yourself at every postmortem is “how do we ensure this never happens again?”
- Step 1 - Create a Sample App
- Step 2 - Build and run your Sample app using Docker and NGINX
- Step 3 - View your Sample app in your browser
- Step 4 - View your Sample app via VNC
- Step 5 - Running the Gremlin Unavailable Dependencies Scenario to reproduce the S3 outage
- Step 6 - Viewing the result of the Gremlin Unavailable Dependencies Scenario
Prerequisites
- An Ubuntu 18.04 host with the Gremlin agent and Docker
- A Gremlin Account (sign up here)
Step 1 - Create a Sample App
Connect to your host with ssh and create a Dockerfile:
ssh username@your_server_ip
vim Dockerfile
FROM nginx:alpine
COPY index.html /usr/share/nginx/html/index.html
Create the following index.html file:
vim index.html
<html lang="en">
<head>
<title>Mythical Mysfits</title>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<script src="https://ajax.googleapis.com/ajax/libs/angularjs/1.5.6/angular.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/popper.js/1.14.3/umd/popper.min.js"></script>
<script src="https://stackpath.bootstrapcdn.com/bootstrap/4.1.1/js/bootstrap.min.js"></script>
<link
rel="stylesheet"
href="https://stackpath.bootstrapcdn.com/bootstrap/4.1.1/css/bootstrap.min.css"
/>
</head>
<body ng-app="mysfitsApp">
<style>
@media (max-width: 800px) {
img {
max-width: 300px;
}
}
</style>
<div style="text-align: center;">
<img
src="https://www.mythicalmysfits.com/images/aws_mythical_banner.png"
width="800px"
align="center"
/>
</div>
<div class="container" ng-controller="mysfitsFilterController">
<div id="filterMenu">
<ul class="nav nav-pills">
<li
class="nav-item dropdown"
ng-repeat="filterCategory in filterOptionsList.categories"
>
<a
class="nav-link dropdown-toggle"
data-toggle="dropdown"
href="#!"
role="button"
aria-haspopup="true"
aria-expanded="false"
>
{{filterCategory.title}}
</a>
<div class="dropdown-menu">
<button
class="dropdown-item"
ng-repeat="filterCategorySelection in filterCategory.selections"
ng-click="queryMysfits(filterCategory.title, filterCategorySelection)"
>
{{filterCategorySelection}}
</button>
</div>
</li>
<li class="nav-item ">
<button
type="button"
class="btn btn-success"
ng-click="removeFilter()"
>
View All
</button>
</li>
</ul>
</div>
</div>
<br />
<div class="container">
<div id="mysfitsGrid" class="row" ng-controller="mysfitsListController">
<div
class="col-md-4 border border-warning"
ng-repeat="mysfit in mysfits"
>
<br />
<p align="center">
<strong> {{mysfit.name}}</strong>
<br />
<img src="{{mysfit.thumbImageUri}}" alt="{{mysfit.Name}}" />
</p>
<p>
<br />
<b>Species:</b> {{mysfit.species}}
<br />
<b>Age:</b> {{mysfit.age}}
<br />
<b>Good/Evil:</b> {{mysfit.goodevil}}
<br />
<b>Lawful/Chaotic:</b> {{mysfit.lawchaos}}
</p>
</div>
</div>
</div>
<p>
<br />
<br />
This site was created for use in the AWS Modern Application Workshop.
<a href="https://github.com/aws-samples/aws-modern-application-workshop"
>Please see details here.</a
>
</p>
</body>
<script>
var mysfitsApiEndpoint =
'http://mysfits-nlb-9c8e61c17ef3cd1d.elb.us-east-1.amazonaws.com';
var app = angular.module('mysfitsApp', []);
var gridScope;
var filterScope;
app.controller('clearFilterController', function($scope) {});
app.controller('mysfitsFilterController', function($scope) {
filterScope = $scope;
// The possible options for Mysfits to populate the dropdown filters.
$scope.filterOptionsList = {
categories: [
{
title: 'Good/Evil',
selections: ['Good', 'Neutral', 'Evil'],
},
{
title: 'Lawful/Chaotic',
selections: ['Lawful', 'Neutral', 'Chaotic'],
},
],
};
$scope.removeFilter = function() {
allMysfits = getAllMysfits(applyGridScope);
};
$scope.queryMysfits = function(filterCategory, filterValue) {
var filterCategoryQS = '';
if (filterCategory === 'Good/Evil') {
filterCategoryQS = 'GoodEvil';
} else {
filterCategoryQS = 'LawChaos';
}
var mysfitsApi =
mysfitsApiEndpoint +
'/mysfits?' +
'filter=' +
filterCategoryQS +
'&value=' +
filterValue;
$.ajax({
url: mysfitsApi,
type: 'GET',
success: function(response) {
applyGridScope(response.mysfits);
},
error: function(response) {
console.log('could not retrieve mysfits list.');
},
});
};
});
app.controller('mysfitsListController', function($scope) {
gridScope = $scope;
getAllMysfits(applyGridScope);
});
function applyGridScope(mysfitsList) {
gridScope.mysfits = mysfitsList;
gridScope.$apply();
}
function getAllMysfits(callback) {
var mysfitsApi = mysfitsApiEndpoint + '/mysfits';
$.ajax({
url: mysfitsApi,
type: 'GET',
success: function(response) {
callback(response.mysfits);
},
error: function(response) {
console.log('could not retrieve mysfits list.');
},
});
}
</script>
</html>
Save the index.html file.
Step 2 - Build and run your Sample app using Docker and NGINX
Next, build the Dockerfile by running the following:
docker build -t simple-nginx .
Now we can run our image by using
docker run -d -p 8080:80 simple-nginx
Step 3 - View your Sample app
Now you can see your sample running @ your_server_ip:8080
Step 4 - View your Sample app via VNC
The Gremlin Unavailable Dependencies Scenario uses a Blackhole attack. We will be using this Blackhole attack to disallow images stored in S3 from loading. To see the results of this Blackhole Network attack we will be using a service called VNC.
On your host, install the Xfce and TightVNC packages:
sudo apt-get update
sudo apt install xfce4 xfce4-goodies tightvncserver
To complete the VNC installation run the following command, you will be prompted to enter a password:
vncserver
Next, test the VNC connection on your local computer. Run the following command which uses port forwarding :
ssh -L 5901:127.0.0.1:5901 -N -f -l username server_ip_address
Now you can use a VNC client to connect to the VNC server at localhost:5901
. You’ll be prompted to authenticate. Use the password you set up earlier. You can use the built-in program for Mac called Screen Sharing or VNC Viewer to view your Xfce Desktop.
On your host you will need to ensure you have a browser installed, install Firefox by running the following command:
apt-get install firefox
Before you move onto the next step, ensure that you are able to view the sample app using VNC. Connect to localhost:5901
and click on Applications > Internet > Firefox Web Browser
:
Using Firefox, navigate to localhost:8080
, you should see the following:
Now we’re ready to run the Gremlin Unavailable Dependency Scenario to reproduce the S3 Outage.
Step 5 - Running the Gremlin Unavailable Dependencies Scenario to reproduce the S3 outage
First, navigate to Recommended Scenarios within the Gremlin UI and choose the Unavailable Dependency Scenario:
Next, select Add Targets and run. Then select your host using the local-hostname option:
Then click customize:
Next, you will click Add Attacks, this will take you to the Gremlin Attack configuration. To reproduce the S3 outage modify the default scenario to include 1 x Blackhole attack impacting AWS S3 us-east-1. You will need to make these changes in the Providers section of the Blackhole attack, see the screenshot below:
Next, click Unleash Scenario. Your Gremlin Scenario will now be running and it will begin to reproduce the S3 Outage:
Step 6 - Viewing the result of the Gremlin Unavailable Dependencies Scenario
To view the S3 Outage being reproduced open your VNC viewer and reload your Firefox tab, you will notice that the images stored in us-east-1 on S3 no longer load:
Conclusion
This tutorial has demonstrated how you can use the Gremlin Recommended Scenario “Unavailable Dependency” to reproduce the AWS S3 Outage. This demonstrates that our sample app is not resilient to this outage.
To ensure we could reliably handle this scenario, we could run this Gremlin Scenario again after rectifying this situation with one of many options:
- S3 failover
- Multi-cloud storage
- Multi-CDN
- Handling image errors in the browser using React