Categories
links

Links

  • Garbage collection in jdk16: ZGC enhancements reduces gc time. More efficient memory relocation on heap collections and heap root object set scanning is avoided entirely.
  • Name your thread pools: Being able to trace back to the origin of work in a system doesn’t happen on its own. You have to plan for it. So important.
  • Serverless app: Lenskart built a system with simple components that performs well given the current feature set at a reasonable cost

Categories
links

Links

  • Client tracing at slack: Talks about how slack is able to visualize what happens when a requests is sent from a client (browser, application) to the backend. Really neat. Mentions Honeycomb
  • Lightstep distributed tracing guide: High level guide speaks to tracing, sampling, when you need to be think about this stuff. Head-based sampling (ie. Decision made up front in a request that you’re going to start tracing – which can use a non-trivial amount of server resources – vs. tail-based where you’ve done the buffering and can decide to keep or throw away data based on testing whether there’s anything interesting contained there-in)

Categories
links

Links

  • The Document Culture of Amazon: Team member’s can cancel meetings that don’t have a document. The first 5-10 or more sometimes minutes of every meeting are spent by having everyone in the room reading about the issue under discussion so everyone starts with similar context and can participate. 1-pagers, press releases, FAQs, or 6-pages are the different formats. I’m going to try doing this in my meetings. Let’s see what people think 🙂
Categories
links work

Links

Categories
links

Links

Categories
links

Links

  • HL7 Fhir: A standard for health care information exchange between disparate software systems. Restful. Well specified. The problem we’re trying to solve is to allow controlled, specific movement of health data between systems to open up possibilities for collaboration, and other use cases we haven’t even considered yet

Categories
links

Links

  • Constant work builder’s library pattern: Certain aspects of route 53 and the ELB control plane have been designed such that they are always doing the amount of work that would handle peak load. They can reduce variance in a system this way (they also have to understand limits well and use cells to partition traffic to keep individual clusters within these limits)

Categories
links

Links

  • The Case for and Against Cognito: Building a user management and authentication system can be hard. (User directory, identity provider federation (SAML), …) 3rd parties can help out quite a bit here. Discusses Cognito pros, cons, and sources of confusion
  • Stuck? Do Something!: Timely post by Jamis Buck. I feel anxious when I’m asked to do something I haven’t done before, or solve a problem that’s new to me. A reminder to take a breath, and pick something to start on. It’s ok if what I try won’t work. I’ve learned something and probably have another experiment waiting in the wings because of it

Categories
links

Links

  • Async task framework design doc from dropbox: Nice discussion of the design of their job scheduler service. At least once execution, priorities, no concurrency, guaranteed start times for most jobs at a scale of 10,000 jobs per sec (at least at time of writing)

Categories
systems

An Availability Story

Marc Brooker from AWS talks about availability. 20m, very relevant stuff.

  • Availability is personal
  • Correlated failure limits availability
    • Redundancy isn’t always perfect (eg. Single points of failure)
  • Blast radius is critical to availability
  • My availability depends on the availability of my dependencies

The purpose of our system is not to hit an availability goal. (99.95% uptime)  It’s to service our customers. (People!) An uptime goal is a proxy for this.

Source