Notes from Reading 'Designing Data-Intensive Applications'

Key insights from Martin Kleppmann's book on building reliable, scalable, and maintainable systems

##Why I Read This

I was struggling with consistency issues in Cropnest's order management system. Multiple users updating the same order simultaneously led to lost updates and confusing state. I needed to understand the fundamentals of concurrent data access.

##Key Ideas

###The Three Main Concerns

Reliability - Systems should continue to work correctly even when things go wrong
Scalability - Systems should handle growth gracefully
Maintainability - Systems should be operable, simple, and evolvable

###Consistency Models Are Not Binary

I used to think "consistent or not." The book showed me a spectrum:

Linearizability (strongest)
Sequential consistency
Causal consistency
Eventual consistency (weakest)

What changed: I now choose consistency models based on business requirements, not technical idealism.

###Transactions Are Not Just ACID

The ACID acronym oversimplifies. What matters:

Atomicity: All or nothing
Consistency: Invariants preserved
Isolation: Concurrent transactions don't interfere
Durability: Committed data survives crashes

What changed: I now think in terms of isolation levels (Read Committed, Snapshot Isolation, Serializable) rather than just "transactions."

###The CAP Theorem Is Often Misunderstood

CAP isn't "pick two." It's: during a network partition, you must choose between consistency and availability. Most of the time, you have both.

What changed: I stopped using "CAP theorem" as an excuse for design decisions.

##What Changed My Mind

###On Database Choice

I used to default to PostgreSQL for everything. Now I ask:

What are my query patterns?
What consistency requirements do I have?
What's my scale?

Sometimes the answer is still PostgreSQL. Sometimes it's something else.

###On Distributed Systems

I used to think distributed systems were "hard but necessary." Now I think:

Avoid them if possible
If unavoidable, use proven patterns
Never build your own consensus algorithm

##Where This Breaks

###Small Teams

The book assumes you have resources to implement proper distributed systems. For a 2-person team, some recommendations are impractical.

###Rapid Prototyping

When validating an idea, perfect consistency matters less than speed of iteration. The book is written for mature systems.

##How I Might Apply This

Audit Cropnest's consistency requirements
- Which operations need strong consistency?
- Where can we relax requirements?
Implement proper isolation levels
- Currently using default Read Committed
- Should some operations use Repeatable Read?
Design for failure modes
- What happens when the database is unavailable?
- How do we handle partial failures?

##Questions Still Open

How do you balance "proper distributed systems design" with "startup velocity"?
When is it worth investing in custom consistency solutions vs. using managed services?
How do you communicate consistency tradeoffs to non-technical stakeholders?

# Notes from Reading 'Designing Data-Intensive Applications'