- The author describes debugging an issue that caused cpu utilization on a webserver to increase continously and eventually crash it. A great story about honing in on a bug and reward after perseverance. And learning!
# load averages uptime # kernel messages can be helpful sometimes dmesg | tail # rolling process, memory stats vmstat 1 # rolling cpu states on multicore systems mpstat -P ALL 1 # rolling ps aux (only shows non-idle processes) pidstat 1 # rolling disk performance metrics iostat -xz 1 # available memory, used, swap free -m # network visibility sar -n DEV 1 sar -n TCP,ETCP 1 # running processes in a system top
- Marc Brooker talking about how multi-threaded programs can run more slowly than single threaded ones. (Certainly they often behave not the way we might think initially.) Some good usage of perf as well. So context switching and serializing task (synchronization?) access to a lock. Eliminating shared (global) state and the need for coordination is helpful when you’re parallelizing programs
A short talk on the transition to arm64 instance types from intel @ aws by Shelby Spees. She talks about how they did it safely and why. Having clearly defined goals up front was important. (For honeycomb this included consistently low latency for users and reliable+fast storage of data coming from collectors.) For any change you’re making, there should be measurable value for the business or customer
Here’s a nice slide reminding me that taking care of people is super important
- Service degredation at honeycomb migrating kafka cluster nodes to a new instance type: An excellent writeup of what their goals were, and some issues that came up along the way. There’s a longer form article I’d like to dig into as well. Interesting takeaway is hitting limits you didn’t know were there
- Garbage collection in jdk16: ZGC enhancements reduces gc time. More efficient memory relocation on heap collections and heap root object set scanning is avoided entirely.
- Name your thread pools: Being able to trace back to the origin of work in a system doesn’t happen on its own. You have to plan for it. So important.
- Serverless app: Lenskart built a system with simple components that performs well given the current feature set at a reasonable cost
- Client tracing at slack: Talks about how slack is able to visualize what happens when a requests is sent from a client (browser, application) to the backend. Really neat. Mentions Honeycomb
- Lightstep distributed tracing guide: High level guide speaks to tracing, sampling, when you need to be think about this stuff. Head-based sampling (ie. Decision made up front in a request that you’re going to start tracing – which can use a non-trivial amount of server resources – vs. tail-based where you’ve done the buffering and can decide to keep or throw away data based on testing whether there’s anything interesting contained there-in)
- Performance improvements @ wix: Http2, brottli compression, and CDNs
- Teach yourself to program in 10 years: An article by Peter Norvig with tips for getting better at programming. Improvement happens over time. (Years in fact.) Working collaboratively on a larger project with others is something he advocates for which feels right to me
- The ultimate code kata: A post in a similar vein to Peter’s above by Jeff Atwood.
We’re planning to put a CDN out in front of our web application at work for well known, good reasons (performance, security, availability, etc). Here’s a tech talk from AWS about how Cloudfront works: