Heroku Operations
Heroku has had hellish periods of downtime resulting in Quality-of-Service numbers that staff and customers alike were frustrated with. Hear horror stories that caused outages, and what the company had to do to fix root causes. The changes have been vast, such as: improving service architecture, sunsetting legacy services, re-vamping monitoring and alerting, re-thinking testing for distributed systems, and most importantly hacking our own engineering culture. The result is a team that now operates 24/7 mission critical services with minimal interruptions.