AWS network load balancers @ Ably: Ably is a platform other developers can use to provide realtime push notifications at scale to their users. They have to handle lots of persistent connections, and a variable connection rate that can spike dramatically. Sounds like the NLB isn’t quite delivering the extreme levels of service it claims to be able to. Note: It’s an amazing box for the rest of us running applications without those constraints (Probably the vast majority of us?!)
Devops practice @ Algolia: Nice write up about what the team does and their process for getting things done. Work buckets: projects, operations, on call. Meetings: Once weekly Production Meetup discussing what happened last week in on-call + project statuses. Priorities: Answer customer questions, answer internal team questions, incident response, infra provisioning + management
20m on how to think about defining slis, and slos. Very practicle and well researched. He references this:
Cross posting notes from my production page:
SLIs, SLOs, SLAs, Error Budgets … Oh My!
Want to be able to answer the question “When should we slow down a bit to work on making our application more reliable, or performant?”
This is a framework to define availability in concrete terms, determine acceptable levels of it, and then come to a consensus with product, development, and the business about what we do when we don’t have enough of it.
SLI – Service level indicator. A system metric that can be used to classify a request as either good or bad. eg 95th should be < 100ms for requests to our web application
SLO – Service level objective. A measure of how well we’re doing over time with respect to an agreement we make internally with ourselves of what good / bad even means. eg In the previous 30d, 99% of the time we should hit our stated latency goals. Note: A corollary is we don’t want to exceed our goals either. It’s time + energy taken away from work that could be put into making a better product
SLA – Service level agreement. A promise we’re comfortable making to customers about how much availability we’re willing to guarantee. There may even be financial penalties associated with failure to meet goals