Srecon2019 : Growing Infrastructure @ Stripe

Things we care about : scaling people + processes, avoiding burnout, working with highly skilled people with lots of autonomy

eg Infra

  • developer tools : build, tests, ci, cd
  • data infra
  • core libraries & frameworks
  • model training and evaluation
  • “Tools used by 3+ teams that is business critical”

A couple of dimensions we care about

Forced work vs discretionary (continuum): Forced : scaling mongo, lower costs, gdpr. Discretionary : server to service (no containers), deep learning

Short term and long term (another continuum): Short : critical remediation, support launch. Long : QoS strategy, “bend the cost curve”?, rewrite the monolith

Ideal : towards long term, towards discretionary (not fire fighting). BUT NOT TOO MUCH!

Hacks

Reduce WIP: Doing lots of things == not finishing any one thing. Not deriving any value. FINISH SOMETHING USEFUL!

If you’re never doing anything but firefighting, you have to hire. Once there’s progress, stay the course. Lol, don’t fall in love with firefighting. 🙂

What do you do? How do you learn?

tl;dr. Listen (talk?) to your users more.

Discovery tools

  • Benchmark with peer companies (similarly sized, what are they working on / struggling with)
  • Coffee chats with users
  • SLOs
  • Developer surveys

An eg from Stripe Sorbet

(A project stripe invested in to improve life)

Invested in static typing for ruby to create more stability, safety, speed

THIS: Learning a new piece of tech is literally never what the business wants from you. Might be incidental to something more important but not the first thing.

Pull a user into the room when you’re talking about priorities

Innovation problem: right opportunity, wrong solution. Guh.

eg “Let’s rewrite it?” Need context

Approach validation : start by violently trying to disprove that your thing will work. (… Like google moonshot programs). Try hardest cases early.

Embed with teams (people who will use your thing)

Investment lanes for infra @stripe

  • Security: invest in better security
  • Reliability: and make the platforms and tools around it better. Deployment, environments, monitoring
  • Usability: Lead, cycle time for features from idea to getting data in prod
  • Efficiency
  • Latency
  • *in priority order

eg Investment weights, completely arbitrary, change from cycle to cycle (eg spring, quarter, half, …)

  • 40% user asks <-Whatever they want. If we don’t understand the ask, that’s our fault
  • 30% platform quality
  • 30% key initiatives

Seems like an important thing to keep in mind about complexity …

The container operator’s manual

The container operator’s manual

  • good for
    • good for reproducing perfectly certain environments
    • good for stateless apps
    • good for test environments because they’re easy to clean up after
    • good for upgrades <-very easy to do
  • bad for stateful apps … don’t containerize your database <-this is hard
    • you’re not google
    • don’t
  • takes longer than you’d expect … containerization is hard
  • give it to ops … LOL! they already do deployment && config management && incident response && monitoring && infrastructure management && security … why not give the container project to them

State of DevOps Report 2019

Metrics from previous years

  • Throughput
    • # daily deploys
    • Cycle time
  • Stability
    • Time to recover
    • # change related failures

Here’s a great capability summary diagram for SDO performance (software delivery and operational performance)

Psychological safety: can team members take calculated risks (What does this mean in practice? What are the limits to risk taking and how do people know they might be taking a larger leap than maybe they should at this point in time?), and can they be transparent with each other without worry of reprisal? [Culture]

Software delivery org’s effectiveness as a contributor to an organization’s overall goals / outcomes

Eg Organization metrics

  • Customer satisfaction
  • Profitability
  • Market share
  • Number of customers

Productivity

There’s a big focus on individual, team productivity in this year’s report. Neat.

A definition of productivity that isn’t a count of lines of code or story points completed in a sprint. More it relates to a measure of our ability to dig in to complex tasks with minimal interruption. Hrm. Interesting. There are several contributing factors:

Productivity, burnout, and work juggling sidebar: as a general goal work to limit work in progress and context switching. Do this through process improvement and automation largely. Want … repeatable, reliable, fast, auditable systems changes always. <- These added capabilities that come from our tools dramatically increase our capacity to take on new work.

Internal search: how is the information from day to day conversations between team mates and diagrams, etc captured in a way that is easily retrievable in the future by maybe somebody who didn’t participate the original conversation? A knowledge base or radiator is hugely important. People join teams all the time. How long does it take them to get up to speed?

Cute. Technical debt as defined by Ward Cunningham is more nuanced that I guess I gave it credit for. Architecture, coding practice, complexity were in but not much else. There’s a nice list of stuff here though …

“Work Recovery” A thing I have struggled with in the past. You can create a context where people can leave work at work and achieve balance in life that contributes to overall wellbeing. Or not. Discourage this at your peril.

How new practices, ideas, tools spread through an org: community of practice, grassroots. (How teams transform.)

Links

Privacy Law in Canada

Data privacy has always been and will always be important regardless of the industry I work in, but I can’t deny there’s a different feel to it working at an eHealth startup handling what I consider to be deeply personal, and private data about the people our tools are built to help.

So I have some learning to do about my role and responsibility to our users who can be doctors and patients. Notes to follow:

Laws

  • PIPEDA : Canadian Privacy Law
  • PHIPA : Ontario Privacy Law (Supercedes federal one.)