William Ting

Bio

William Ting is a longtime FOSS advocate with contributions in various projects (Pelican, autojump, pyramid_swagger, Rust, GNOME). He's currently an infrastructure engineer at Reddit, and previously on the Yelp Transaction Platform team.

Self-Healing Systems: The Road to 99.99% Uptime

Scalable Python, Intermediate

Description 

Stop firefighting and start fireproofing! There are many tools that make oncall easier and increase availability, but we'll be mostly focusing on a few principles and design patterns that help make your systems more robust.

Abstract

Feature velocity is typically a higher priority early in a software's lifecycle, but as the system matures there is an effort to start fireproofing the system. On the Yelp Transactions Platform team we've used a combination of circuit breakers, queues, and idempotent operations to minimize downtime and waking up in the middle of the night.

We'll take a look at how these design patterns help us in a distributed system, when they should be used, and common pitfalls associated.

  • Meetup_square
  • Black Facebook Icon
  • Black Twitter Icon