What are they
A couple of my favourite definitions
- A group of computers working together on a problem that have to communicate over a shared, unreliable network, and
- You know you have a distributed system when some computer somewhere you didn’t know existed can take your program out :)
Why we need them
- Reliability: single points of failure can be avoided. How much fun is it when the 1 web server responsible for running a critical piece of software goes down in the middle of the night?
- Scalability: let us incrementally add capacity. Need stateless things
- Durability: if your bits only exist on a single disk and that disk fails, you’re going to have a bad day. How well is your process for restoring your data written down? When was the last time you practiced it?
- Performance efficiency: How well is the compute, network, disk and memory you have being utilized? Well designed distributed systems will let you add / remove capacity as needed without disrupting work in flight
- Marc Brooker’s blog is great