New Relic Infrastructure is the infrastructure monitoring tool in New Relic’s observability suite. Gremlin is a simple, safe and secure service for performing Chaos Engineering experiments through a SaaS-based platform. Gremlin Free is a free version of Gremlin that can run on up to five hosts, and run two types of Chaos Engineering attacks.
Prerequisites
To complete this tutorial you will need:
- A host running Ubuntu 18.04 to run the Chaos Engineering experiments on. This host will run the Gremlin agent. You need to have permissions to run commands as root with sudo on this host.
- A Gremlin Free account (sign up here).
- A New Relic account (sign up for a free trial here).
Overview
This tutorial will show you how to use New Relic’s Infrastructure monitoring tool along with Gremlin Free for your Chaos Engineering experiments. Observability is an important part of Chaos Engineering, as it’s how we view the results of the experiments.
- Step 1 - Install the Gremlin agent
- Step 2 - Install the New Relic agent
- Step 3 - Run a CPU attack
- Step 4 - Run a Shutdown attack
Step 1 - Install the Gremlin agent
First, ssh into your host and add the gremlin repo:
ssh username@your_server_ip
echo "deb https://deb.gremlin.com/ release non-free" | sudo tee /etc/apt/sources.list.d/gremlin.list
Import the GPG key:
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys C81FC2F43A48B25808F9583BDFF170F324D41134 9CDB294B29A5B1E2E00C24C022E8EF3461A50EF6
Then install the Gremlin client and daemon:
sudo apt-get update && sudo apt-get install -y gremlin gremlind
After you have created your Gremlin Free account (sign up here) you will need to find your Gremlin Daemon credentials. Login to the Gremlin App using your Company name and sign-on credentials. These were emailed to you when you signed up to start using Gremlin.
Navigate to Team Settings and click on your Team. Make a note of your Gremlin Secret and Gremlin Team ID.
Then initialise Gremlin and follow the prompts:
gremlin init
You are now ready to create attacks using the Gremlin App.
Step 2 - Install the New Relic agent
Install the New Relic Infrastructure agent in your Ubuntu host. The first step is to create a configuration file and add your license key:
echo "license_key: b842c6dc685e77fe90c07cdf9bc74092238bf9ea" | sudo tee -a /etc/newrelic-infra.yml
Next, add New Relic’s GPG key.
curl https://download.newrelic.com/infrastructure_agent/gpg/newrelic-infra.gpg | sudo apt-key add -
Create the agent’s apt repo:
printf "deb \[arch=amd64] http://download.newrelic.com/infrastructure_agent/linux/apt bionic main" | sudo tee -a /etc/apt/sources.list.d/newrelic-infra.list
Update your apt cache.
sudo apt-get update
Run the install script.
sudo apt-get install newrelic-infra -y
Step 3 - Run a CPU attack
Log in at newrelic.com and click the Infrastructure link.
You should see metrics for the Ubuntu host that you installed the client on. If they don’t appear immediately, you might need to wait a few minutes for the new client data to display. You can also try refreshing your browser.
Next, we’ll change the resolution of the graphs that are displayed. By default they show a 60 minute view, but we want to see the results of our experiments more quickly so we’ll change that to 5 minutes. Click Time Picker in the menu above the graphs and select 5m:
Log into your Gremlin Free account. Click the Attack link in the left menu and then New Attack. Select your Ubuntu host that you installed the Gremlin agent on for the target:
Scroll down and click Choose a Gremlin. The CPU attack should already be selected by default. If not, click on Resource and then CPU.
Scroll down and change the number of seconds for the attack to 120. Set the number of cores to the number you provisioned when you created the host. Then, click Unleash Gremlin. That will begin the CPU attack.
Switch to your New Relic browser window or tab and view the results. You should see a spike in the CPU usage.
Step 4 - Run a Shutdown Attack
In the Gremlin UI click on Attack in the left menu and New Attack, as we did before. Select your Ubuntu host as the target.
Scroll down and click Choose a Gremlin. Select State, and the Shutdown attack should be selected by default. If not, click on it. Leave the Delay set to 1 minute and leave Reboot selected. Then click Unleash Gremlin.
Go back to the New Relic UI and click on Events in the menu right above the graphs. You should see some new events start streaming in after the host reboots. If you don’t see anything new after a minute or two, you might try refreshing your browser.
Eventually you should see notifications from services that stopped and started when the host rebooted, as well as some other events.
Conclusion
We’ve seen how we can use Gremlin Free to perform CPU and Shutdown attacks, and how we can use New Relic’s Infrastructure tool to view metrics and events related to those attacks. There’s more you can do, like setting up alerts to let you know when a host reboots, or when the CPU threshold passes a certain amount. You could also create custom dashboards for your Chaos Engineering experiments with New Relic’s Insights product.
As we mentioned earlier, having observability tools is important for Chaos Engineering, as they give us the feedback we need about what happens in the experiments. New Relic’s Infrastructure tool is very flexible and provides the visibility we need to perform Chaos Engineering experiments.