Slides from my talk at ACCU2011 in Oxford on 16th April 2011. A whirlwind tour of the non-relational database families, with a little more detail on Redis, MongoDB, Neo4j and HBase.
3. Me
• Director of Engineering at MyDrive
• Hands-on coding in Ruby, C++ & others
• Big data, SW architecture, robustness, tdd,
devops, data analysis
• Background of SW for telecoms, mobile,
embedded
• @gavinheavyside
4. MyDrive Solutions
• Driver behaviour analysis and scoring for
telematics-based insurance
• Large-scale geospatial processing of GPS
and map data
• Relational DBs - PostgreSQL, MySQL
• Non-relational DBs - Redis, HBase
• Big Data tools - Hadoop
• Built on Linux and open-source stack
45. redis
• By Salvatore Sanfillipo (@antirez)
• Sponsored by VMware
• data-structure server
• strings, hashes, lists
• sets, sorted sets
• All operations in memory, backed by disk
13 rules, numbered 0 to 12\nNo popular DBMS is actually ‘relational’ by 12 rules - they all break some of them\nLeading commercial - Oracle, MS, IBM (DB2)\nLeading open-source - MySQL, PostgreSQL, SQLite\n
\n
If one part of transaction fails, it all fails, DB left unchanged.\nFailures: HW, system, DB (disk etc), application (violate constraints on data)\n
The DB will enforce consistency and relationships/constraints that have been specified in the schema - everything else is the responsibility of the application\n
Dirty reads - allow other transactions to read, but not modify uncommitted data - improve performance\n
\n
DB creates new version of data for a TX\nOther TXes read the old version until TX completed.\nMVCC used by some non-relational databases\n
Usually use a transaction log that can be replayed to rebuild data in event of failure.\n
\n
\n
What most of these companies have in common is scale\nHow would an RDBMS handle the size of data they deal with?\nMost of the big companies have built their own solutions.\nMost of them also use RDBMSes - Facebook is huge MySQL user.\n
\n
Scaling - RDBMs don’t scale linearly - big box == $$$$\ne.g. Graph relationships don’t map to tables & rows easily\nSemi/Unstructured data, lots of columns, lots of nulls\n
Caching - e.g. memcacheDB, store common queries in memory\ndenormalise - add redundant data, grouped data to reduce table joins - reduce load on physical hardware - improve locality of reference\nSo... you choose a distributed NOSQL fancy modern DB\n
\n
Not really...\n
C - all nodes see same data at the same time\nA - survivors continue to operate when nodes fail\nP - system continues to operate despite message loss between nodes\nMany systems relax consistency\n
Also by Eric Brewer \nBASE system relaxes the C in CAP\nBA - might lose access to some data if nodes fail\nSS - System state might change over time without input (eventual consistency, propagation)\n
Different ways to consider whether a write has succeeded, whether new value is returned.\n
\n
Consistent Smashing - video from Basho/Riak\n
Lots of overlap between families - esp. column & key-value/DHT\n
\n
Schema-less way of looking at data as documents rather than fields - all related data in document. \nMaps very well to a lot of applications\n
huMONGOus\n10gen\n
Can be ACID if using replication for durability\n
\n
\n
\n
Object mapper - not ORM\n
\n
\n
FlockDB - Twitter, social graph - simpler than neo4j\nNeo4j - dual open-source/commercial license\nHama - apache project\n
Tokyo Tyrant - network access protocol for Tokyo Cabinet DB\nVoldemort - LinkedIn\n
\n
Can be ACID if aof fsyncs all the time\n
\n
\n
\n
replication non-blocking on master. Writes will work even if slave blocked.\nReplication for scaling (read-only slaves) or for redundancy.\nAOF log - everything that changes the dataset.\nIf server crashes redis replays the AOF\nBGREWRITEAOF to optimize AOF - minimum steps to rebuild dataset in memory\nconfigurable fsync options - every command, every second, never\n\n
\n
Oracle Berkeley DB, Berkeley DB Java, Berkeley DB XML\nMemcache + Berkeley DB = MemcacheDB, a bit like Redis, for KV\n\n
OSDI 2006 (MapReduce was 2004)\n
Bigtable - column families, distributed, scale\n
\n
Consider a whiteboard overview of Hadoop here. \nReal-time (low-latency) as opposed to Hadoop & mapreduce batch jobs. \nNot ACID - effect of distributed writes on consistency and isolation of views\nRelaxes A of cap - consistent & partition tolerant\n
partitioned on row count/size\nRegion is basic unit of availability\n\n
\n
\n
\n
Queries - no support for complex queries\nCompute query in application (mapreduce, etc)\nall necessary data is denormalised in the row - wide table with lots of columns.\n“versioned get” returns older version of row\n
Couchbase - combination of CouchDB, Membase, Memcached\nKyoto Cabinet - C++ implementation by Tokyo Cabinet author.\n