- Definitions for logs vs events, traces and spans. A good high level overview of these things that are being talked about a lot by developers and production support people in the current generation of observability tools
- A log line is an unstructured or semi-structured string of characters emitted by an application, events are similar but structured (eg json), spans are particular events that represent a particular duration of time related to an application flow
- Taking control of complex systems is what we do. Are we an engineer or not isn’t really the right question. Dynamic, complex systems. Controlling processes we don’t understand.
Annoyingly qcon links aren’t embeddable. This was a great talk about an internal tool made at Netflix that is used by developers and production support engineers (sre, operations, customer support) to learn about errors.
Tracing becomes especially important when you have many services involved in processing a single request. Putting together a picture of what happened when logs and metrics are scattered across log categories and dashboards (could be 1 per service in the worst case) is hard.
Edgar has a global view. It was important that all telemetry sources were fed into edgar. It wouldn’t have been a tool people could rely on if there were gaps.
Another important design decision was the sampling rate. Collecting traces is hard. (aka Resource intensive in a system in terms of ram) But less than 100% tracing means when you go to look for one, there’s a chance it won’t be there. The suggestion was to collect 100% for a small, critical subset of traffic. (eg /checkout)