We’re planning to put a CDN out in front of our web application at work for well known, good reasons (performance, security, availability, etc). Here’s a tech talk from AWS about how Cloudfront works:
- Scalability, and load testing VALORANT: Nice discussion of how to setup a load testing test harness. “Simulated player”, “scenario”, “player pool” are the basic abstractions they settled on. Architectural concerns for the game server they thought about up front were microservices, sharding their data store, and caching
- https://blog.cloudflare.com/computing-euclidean-distance-on-144-dimensions/: I love a good story about optimizing algorithms and data structures in order to make specific work tractable given dataset sizes
- Strava, The Boring Option: A story about a schema design decision (width of an id field in one of their tables) that worked great from 2009 – 2020 but then needed to change. A 32bit unsigned, monotonically increasing id field is good for 4b unique values before it wraps around. Depending on how many of these you’re using, it could last a long time. It did for Strava. The covid19 pandemic meant all their users were using their service way more than normal normal which accelerated the need for re-work here. They were pragmattic about what they did. They considered different datastores to store this data (huge table, lots of read/write activity on it) but in the end decided they knew mysql and were comfortable with it. They found a way using their current datastore (and reserved the right to consider different ones in the future but they had a problem to solve today). Great story!
Such a great talk. “Don’t think because you have an alert for something that you’re protected.” “The more alerts you have, the more information overload you may have for the operator.”
- Respect distributed systems
- Debuggability in production
- Debugging == new knowledge about the way a system works
This may very well be my favourite conference talk ever. So many big ideas presented in 1.5 hours and articulated well and with enthusiasm!
Watching this stream: https://www.twitch.tv/videos/821587009
Watching LizTheGrey’s stream. She’s fantastic. She’s going through computer science principles that factor into the choices she’s making
Today I learned
- Thinking about how much work we’re doing is an important exercise. Avoiding repeatedly computing the same thing makes a lot of sense (relates to algorithmic complexity)
- If you are making an assumption about input (eg no negative values, no zeros) you can make this explicit by adding an assertion so that your program fails fast (with hopefully a helpful error) when that assumption is invalidated
- This one is a bit of a bugger. My solution uses a binary search strategy with low, high pointers that shift as you get closer to your goal. Not hard to write but was a bit fiddly
- Liz (and others I can see) did something much simpler. Set ‘B’ -> 1 and ‘R’ -> 1 in the input, ignored ‘F’ and ‘L’ (or set these to zero) and somehow with only a little more energy got the answer. What the hell is going on here?
Have to think about this one a bit more
Had to remind myself today to not try to overly anticipate what part b will ask me to do. More often than not the complexity of part a goes up and what I think is coming next doesn’t.
Keeping it simple is an important principle in systems design!
Alright so part 2 I don’t understand the answer for. I have to think about this one a bit more. It’s only a few lines long …
- Revisit day 13, part b
- DNS load balancing: This company is using DNS load balancing to good effect for some of it’s traffic. Not machine-to-machine type api traffic it sounds like. (Works ok for human clients that honour ttl). 2 big problems with dns load balancing are 1) uneven distribution of load (a problem for load balancers too but you at least have some say in how requests are forwarded), and 2) how are failed servers removed from the pool?
- Cloudflare postmortem (Byzantine failure in etcd cluster): Interesting. A few distributed systems bolted together to create a bigger one. Each individual component is “fault tolerant” on it’s own but there emerge new kinds of failures when they are connected to eachother. Keep it boring for as long as you possibly can! This is usually a lot longer than you think
Lots to think about here. The way think about organizing logic, process management, deferred code execution, …
One of my favourite tech talks I found in 2020
Jeff talked about this today and I like the framework! Coming up with OKRs and high level goals is so hard. I’ve only been involved in the process once or twice so I don’t have a lot of experience but I remember it being fairly abstract (a bunch of people get in a room and talk about what they’ve learned in the past year or so – in terms of customers, the business, competition, etc)
Happy that our exec team are running out in front for the rest of us and that they know what they’re doing! 🙂