SpringFramework

2019 was partly spent learning about Spring at work. We have a couple of microservices that are make good use of it in the app tier. There are so many good ideas in here worth study. Here are my notes:

-Dspring.profiles.active: Used to selectively load or not components defined in a spring application. eg We can selectively load / activate capabilities by environment: @Profile(“dev”)

# Profile tagging with annotations @Component @Profile("dev") Public Class DataSourceConfig {...} # Profile tagging in xml <beans profile="dev">...</beans>

Sources

Disney 2019

Characters

Food

Galaxy’s Edge

Rise of the Resistance

Smuggler’s Run

Parades

And they all lived happily ever after …

Srecon2019 : Growing Infrastructure @ Stripe

Things we care about : scaling people + processes, avoiding burnout, working with highly skilled people with lots of autonomy

eg Infra

  • developer tools : build, tests, ci, cd
  • data infra
  • core libraries & frameworks
  • model training and evaluation
  • “Tools used by 3+ teams that is business critical”

A couple of dimensions we care about

Forced work vs discretionary (continuum): Forced : scaling mongo, lower costs, gdpr. Discretionary : server to service (no containers), deep learning

Short term and long term (another continuum): Short : critical remediation, support launch. Long : QoS strategy, “bend the cost curve”?, rewrite the monolith

Ideal : towards long term, towards discretionary (not fire fighting). BUT NOT TOO MUCH!

Hacks

Reduce WIP: Doing lots of things == not finishing any one thing. Not deriving any value. FINISH SOMETHING USEFUL!

If you’re never doing anything but firefighting, you have to hire. Once there’s progress, stay the course. Lol, don’t fall in love with firefighting. 🙂

What do you do? How do you learn?

tl;dr. Listen (talk?) to your users more.

Discovery tools

  • Benchmark with peer companies (similarly sized, what are they working on / struggling with)
  • Coffee chats with users
  • SLOs
  • Developer surveys

An eg from Stripe Sorbet

(A project stripe invested in to improve life)

Invested in static typing for ruby to create more stability, safety, speed

THIS: Learning a new piece of tech is literally never what the business wants from you. Might be incidental to something more important but not the first thing.

Pull a user into the room when you’re talking about priorities

Innovation problem: right opportunity, wrong solution. Guh.

eg “Let’s rewrite it?” Need context

Approach validation : start by violently trying to disprove that your thing will work. (… Like google moonshot programs). Try hardest cases early.

Embed with teams (people who will use your thing)

Investment lanes for infra @stripe

  • Security: invest in better security
  • Reliability: and make the platforms and tools around it better. Deployment, environments, monitoring
  • Usability: Lead, cycle time for features from idea to getting data in prod
  • Efficiency
  • Latency
  • *in priority order

eg Investment weights, completely arbitrary, change from cycle to cycle (eg spring, quarter, half, …)

  • 40% user asks <-Whatever they want. If we don’t understand the ask, that’s our fault
  • 30% platform quality
  • 30% key initiatives

The container operator’s manual

The container operator’s manual

  • good for
    • good for reproducing perfectly certain environments
    • good for stateless apps
    • good for test environments because they’re easy to clean up after
    • good for upgrades <-very easy to do
  • bad for stateful apps … don’t containerize your database <-this is hard
    • you’re not google
    • don’t
  • takes longer than you’d expect … containerization is hard
  • give it to ops … LOL! they already do deployment && config management && incident response && monitoring && infrastructure management && security … why not give the container project to them

Srecon 2019: Building a scalable monitoring system

Molly Struve

https://www.usenix.org/conference/srecon19emea/presentation/struve

The monitoring platform that grew organically over time (can be overwhelmed by the number of different tools): New relic, honey badger (exception reporting), pagerduty, cron, dashboards, elastalert

How alerts were delivered to engineers: slack notifications, sms, email, phone

Alerts inconsistent

  • some reported data but didn’t suggest action
  • some needed immediate action

Eventually overhauled. Goals of alerting system:

  • consolidate monitoring to a single place (what does this mean?)
    • kenna used datadog for this. hooks into all other tools.
    • she’s meaning this in the alert manager sense? using different tools for logs, metrics, traces, etc
  • alerts are actionable. (no alert should allowed to be ignored) can put non actionable things away from actionable things.
  • alerts are mutable (turn them off when needed … eg when we’ve already acknowledged a problem)
    • for a set period of time (should come back on)
  • track alert history. does this condition happen regularly?

Behaviours:

  • if an alert goes off you have to acknowledge
  • here’s how you mute, and when, and how long, and how to dump alerts that aren’t helpful
    • CASE: context heavy, actionable, symptom based, evaluated
  • here’s where the monitoring tool is and how to use it
  • developers should help make monitoring better