October 16, 2018

Looking back on a day of chaos

Looking back on a day of chaos

Wow. Last month was a whirlwind. We hosted our first conference, launched a brand new product, and raised some funding to increase product velocity and continue to educate and grow the chaos engineering community.

We heard from many of you about your experience with Chaos Conf and we appreciate the feedback! Having a single-day, one-track conference around a focused topic (in this case Chaos Engineering) featuring world-class speakers was the best way we could think of to provide maximum value for the community. If you are interested in watching recorded video of the talks, scroll down a bit in this post to find them; you can also follow us on Twitter (@gremlininc) where we will be releasing transcripts once a week.

Conferences like this don’t happen without a strong community (all of you!) — it was awesome to meet engineers from Europe, Asia, and South America who came out to San Francisco to attend. And they also don’t happen without someone behind the scenes. In our case, that’s Dina. As a one-woman events team operation, she was able to organize a truly community-driven event at an incredible venue (shoutout to the Alamo Drafthouse in San Francisco!) where everyone had fun and was taken care of.

Chaos Conf Speakers

We also have to thank our incredible speakers. Coming from some of the best technology companies in the world, we are humbled that speakers took time out of their busy schedule to participate in our event.

Adrian Cockroft, VP of Cloud Architecture @ AWS

Adrian kicked off the conference with an incredible opening keynote. With a breadth of experience in topics ranging from distributed systems to observability to resilience, it was the perfect way to get the day started.


Kriss Rochefolle, Director, Operational Excellence @ Oui.sncf

Kriss gave a charming, funny, straightforward presentation about how to convince your organization to adopt Chaos Engineering. We love any talk that starts off “Let’s break things on purpose in production — it’s going to be fun!”


Vilas Veeraraghavan, Director of Engineering @ Walmart Labs

Vilas is an expert practitioner of Chaos Engineering. The results he’s been able to achieve at JET and Walmart speak for themselves. For big e-commerce websites, avoiding downtime is a top priority!


Ronnie Chen, Engineering Manager @ Twitter

Ronnie gave a fascinating talk about her life as a deep-sea diver. When you are swimming at great depths, it is incredibly important to do proactive failure testing. And the same is true for your applications — the more critical it is to your business, the more important it is to identify failures before they impact customers.


Mark McBride, CEO & Founder @ Turbine Labs

Chaos is endemic in software engineering — many things are unpredictable. Mark’s talk is full of insight and practical advice for getting out of the loop of: a system becomes destabilized → everyone drops what they’re doing to firefight → incident is hopefully resolved and things return to square one.


Mikolaj Pawlikowski, Software Engineer @ Bloomberg

Mikolaj gives some great background on why distributed systems are complicated and how the rise of cloud and microservices have fueled the need for chaos engineering. As Leslie Lamport says, “A distributed system is one in which the failure of a computer you didn’t even know existed can render your own computer unusable.”


Charity Majors, CEO @ Honeycomb

This was a highly anticipated talk. As the CEO of a product that offers better observability into systems, Charity emphasizes a recurring concern: if you can’t observe it, just don’t do it. But with the proper visibility, proactively testing for failure is a recipe for success.


Jessie Frazelle, Software Engineer @ Microsoft

Jessie is a rockstar best known for her work in the Docker ecosystem. If you’re interested in some of the more technical details about the development of containers, and how breaking them on purpose can lead to wonderful and surprising insights, then this is the talk for you.


Tammy Butow & Ana Medina, Principal SRE & Chaos Engineer @ Gremlin

In this joint talk, Tammy and Ana talk about their collective experience as SREs concerned with avoiding downtime and reducing incidents at companies like Dropbox, Uber, DigitalOcean…and now Gremlin.


Kolton Andrus, CEO & Co-Founder @ Gremlin

Kolton’s talk comprised of an overview of Chaos Engineering and the history that led us to today — from Jessie Robbins pulling out plugs in datacenters, to Chaos Monkey at Netflix randomly shutting down servers, to more refined and disciplined approaches to Chaos Engineering. To give a glimpse of the future, Kolton announced the availability of Application Level Fault Injection (ALFI) to all Gremlin customers, allowing DevOps teams to be much more targeted with their attacks, including on serverless environments

The Community

And, of course, we have to thank the incredible Chaos Engineering Community. Whether you were able to make it in person or watched the livestream, it was humbling and inspiring to hear so many of your stories — and gave us plenty of ideas on how to improve our own service.

And if you haven’t already joined the Chaos Engineering Community on slack then please do! No matter if you’re just getting started or have decades of experience breaking things, your voice would be appreciated. Since we launched in 2017 we’ve felt the love, and hopefully we can make you proud and give it back just as well 💚

Avoid downtime. Use Gremlin to turn failure into resilience.

Gremlin empowers you to proactively root out failure before it causes downtime. Use Gremlin for Free and see how you can harness chaos to build resilient systems.

Use For Free