1) Databases have been around for a long time but newer non-relational databases are gaining popularity. However, databases still have issues that can "kill you" like fragile replication and poor failover support.
2) The CAP theorem states that a distributed system cannot simultaneously provide consistency, availability, and partition tolerance. This introduced needed realism that different data stores solve different problems depending on their focus on two of these areas.
3) New non-relational databases like MongoDB provide better solutions for issues like replication and failover that were weaknesses of relational databases. Document stores in particular are good alternatives when data is not truly relational.
16. ACID
16
The set of rules relational databases follow to assure the data gets where it needs to go and
is consistent.
They’re fine for a certain kind of workload.
17. Atomicity
17
Transactions are all or nothing. If any part of the transaction fails, the WHOLE thing has to
fail and roll back.
That means a lot of locking, which can become a performance problem.
18. Consistency
18
Any transaction brings the database from one valid “state” to another - which means you can
have a bunch of rules inside the database to judge the validity of data, and any transaction
that doesn’t pass fails and rolls back.
Again, not great for performance.
19. Isolation
19
Transactions executed concurrently have to result in the same state of the database as if they
had been executed serially.
Requires partially applied transactions to NOT be visible to other transactions.
23. Replication
23
It’s evil, and almost all RDBMS’s do it wrong.
It’s so fragile that you spend more time redoing it than actually getting any benefit from it.
MySQL can do master/master. PostgreSQL ships binary logs via scp.
It’s all horrible and gives me grey hairs.
Because it was an afterthought and not designed from the beginning.
Add-on replication is almost always horrible.
24. Failover
24
This is even worse than replication. Because it was even more of an afterthought.
Most of the time it fails over on accident and breaks replication.
And then someone gets woken up to clean up a steaming pile of bad data.
And that person isn’t very happy about it.
32. Partition Tolerance
32
The system will continue to operate despite arbitrary message loss or a failure of part of the
system.
Also known as “split brain” - which happens to me if I don’t get enough coffee.
33. But, you can never have
all three. It’s impossible.
33
Finally, some reality! Stop trying to be everything to everyone and solve all types of problems
with the same hammer.
So when you’re looking at a data store, see which two it can do and which you need for your
data!
34. Enter all the NoSQL!
34
Stands for either “NO SQL” or “Not Only SQL” - but it’s really a bunch of different data stores
that aren’t relational and solve different kinds of problems.
And provide some solutions for old school reliability problems.
35. Document Stores
35
-
MongoDB, Riak, CouchDB, etc
Not relational (though you can convince mongodb to do it, you shouldn’t)
Usually have really good replication stories
Let’s look at MongoDB vs traditional MySQL
36. MySQL Replication
36
That’s typical master/master.
Each can take writes (but you shouldn’t)
They ship bin logs back and forth
Fragile
Easy to break replication by having conflicting writes committed near the same time on both
sides - so split-brain is always a possibility.
37. MongoDB Replica Set
37
- There’s an election, and one node is picked as the primary.
- It takes all writes, distributes to the secondaries
- If the primary goes down, there’s an election and a new primary is chosen (usually less than
1 second).
- New nodes join the replica set and get all the data, then can be elected primary
38. Benefits of Replica Sets
38
- Replication and failover designed into the system as core functionality!
- Much better failover
- Much better reliability
- I get to sleep more
- Easy to add capacity as the replica set grows (either shard by adding new replica sets or
add more nodes to scale reads).
39. Riak & the “Ring”
39
Riak is crazy town
Document store with very light querying (though the new search stuff is badass)
Super scalable via the “Ring”
Data is automagically replicated around the ring based on configuration
- Number of copies
40. The Ring
40
-
All nodes “gossip” to confirm they’re up.
Any node can take a query and will gather the results from the other nodes.
Nodes dropping out are “noticed” by the ring and data gets shuffled around.
New news automatically join the node and get their “share” of the data.
Theoretically infinitely scalable (though the gossip gets REALLY noisy)
Useful as a file store (see Riak CS)
I think that drawing can be used to summon Beetlejuice.
41. What’s Old is New
41
-
MariaDB + Galera Cluster = MySQL replica sets! (kind of)
row-based replication is much more reliable
automatic failover and syncing of new nodes
can be load balanced for reads and writes!
still the same sql everyone’s used to
theoretically any node can take writes - but I don’t trust it
42. My MariaDB
42
- Yes, this is the mongodb diagram
- I use haproxy to send all the writes to a single primary, with the others as backups in case
it goes down.
- I have a separate haproxy frontend that load balances across all three for reads.
- so far, i love it to pieces
43. Here’s HAProxy
43
-
rmcom_backend - app servers
mariadb_read_backend - the leastconn balanced pool of readers
mariadb_write_backend - db1 is the primary unless it goes down, then db2 is “promoted”
rails, mariadb_read and mariadb_write are the frontends
45. If you query it, index it.
45
- As your data grows, you’ll see query speed decrease.
- Add indexes for your common queries!
- Don’t forget compound indexes.
46. As data increases,
flexibility decreases.
46
- You’ll need to limit the types of queries you allow people to perform because they’ll lock
things up and stop everyone from accessing it.
- You’ll need to find other ways to “protect” the database, like.
47. Cache it!
47
- Use memcached or other caching technologies to keep common queries away from the
database.
- If it can be read, it can be cached.
- Saves you a ton of money in vertically scaling your database.
- You may also need to add other ways to access your data, like say, elasticsearch or solr.
48. Scale vertically
48
- Throw hardware at it until it’s too expensive, then shard it.
- Because sharding is almost always horrible.
49. What does it all mean?
49
-
Don’t default to RDBMS!
Use RDBMS if you need transactions and your data truly is relational.
If it’s a document, use a document store
Understand the tradeoffs
Understand how your data will be queried
Don’t forget you can combine technologies to build whatever you need
50. If We Have Time...
•
•
•
Key/Value Stores
•
Questions!
Elasticsearch
Why you shouldn’t use
Redis... ever.
50