Things we care about : scaling people + processes, avoiding burnout, working with highly skilled people with lots of autonomy
eg Infra
- developer tools : build, tests, ci, cd
- data infra
- core libraries & frameworks
- model training and evaluation
- “Tools used by 3+ teams that is business critical”
A couple of dimensions we care about
Forced work vs discretionary (continuum): Forced : scaling mongo, lower costs, gdpr. Discretionary : server to service (no containers), deep learning
Short term and long term (another continuum): Short : critical remediation, support launch. Long : QoS strategy, “bend the cost curve”?, rewrite the monolith
Ideal : towards long term, towards discretionary (not fire fighting). BUT NOT TOO MUCH!
Hacks
Reduce WIP: Doing lots of things == not finishing any one thing. Not deriving any value. FINISH SOMETHING USEFUL!
If you’re never doing anything but firefighting, you have to hire. Once there’s progress, stay the course. Lol, don’t fall in love with firefighting. 🙂
What do you do? How do you learn?
tl;dr. Listen (talk?) to your users more.
Discovery tools
- Benchmark with peer companies (similarly sized, what are they working on / struggling with)
- Coffee chats with users
- SLOs
- Developer surveys
An eg from Stripe Sorbet
(A project stripe invested in to improve life)
Invested in static typing for ruby to create more stability, safety, speed
THIS: Learning a new piece of tech is literally never what the business wants from you. Might be incidental to something more important but not the first thing.
Pull a user into the room when you’re talking about priorities
Innovation problem: right opportunity, wrong solution. Guh.
eg “Let’s rewrite it?” Need context
Approach validation : start by violently trying to disprove that your thing will work. (… Like google moonshot programs). Try hardest cases early.
Embed with teams (people who will use your thing)
Investment lanes for infra @stripe
- Security: invest in better security
- Reliability: and make the platforms and tools around it better. Deployment, environments, monitoring
- Usability: Lead, cycle time for features from idea to getting data in prod
- Efficiency
- Latency
- *in priority order
eg Investment weights, completely arbitrary, change from cycle to cycle (eg spring, quarter, half, …)
- 40% user asks <-Whatever they want. If we don’t understand the ask, that’s our fault
- 30% platform quality
- 30% key initiatives