Non-Relational Databases at ACCU2011

*
databases
query_language
 <> ‘SQL’;
Gavin Heavyside - ACCU Conference - 16 April 2011

*
databases
query_language
 <> ‘SQL’
LIMIT 4;

Me
• Director of Engineering at MyDrive
• Hands-on coding in Ruby, C++ & others
• Big data, SW architecture, robustness, tdd,
devops, data analysis
• Background of SW for telecoms, mobile,
embedded
• @gavinheavyside

MyDrive Solutions
• Driver behaviour analysis and scoring for
telematics-based insurance
• Large-scale geospatial processing of GPS
and map data
• Relational DBs - PostgreSQL, MySQL
• Non-relational DBs - Redis, HBase
• Big Data tools - Hadoop
• Built on Linux and open-source stack

What is an RDBMS

• “Codd’s 12 Rules”, 1970
• Relations
• e.g. tables, rows, columns
• Relational Operators
• Manipulate data in tabular form

ACID

• Atomicity
• Consistency
• Isolation
• Durability

Atomicity

• All or nothing
• Maintain atomicity across failures

Consistency

• DB moves from one consistent state to
another
• Only valid data is written to DB
• It can only enforce rules it knows about

Isolation

• Transactions can’t see data from other
incomplete transactions
• Blocking & Deadlocks
• Dirty reads
• MVCC

Locking

• Row locking
• Whole table locking
• TX might require lots of locks
• Blocking

MVCC

• Multi-Version Concurrency Control
• Maintain several versions of objects
• Read & write timestamps on transactions
• Reads never blocked

Durability

• Data from successful tx is never lost

What’s wrong with
relational DBs?

http://www.ﬂickr.com/photos/exfordy/4734358134/

All the cool kids use
non-relational DBs...
Facebook LinkedIn

Twitter
Google

What’s wrong with
relational DBs?

• Nothing
• ‘Impedance Mismatch’
• Scaling

Scaling an RDBMS
• Launch successful service
• Read saturation - add caching
• Write saturation - add hardware (£££)
• Queries slow - denormalise
• Reads still too slow - prematerialise
common queries, stop joining
• Writes too slow - drop secondary indexes
and triggers

Denormalising
• Normalise logical data design
• Joins
• Materialised views can optimise queries
• Denormalise logical data design
• Eliminate joins
• Application must ensure data consistency

Scaling a distributed DB

• Just add more commodity servers...
• ...we wish

CAP Theorem

• Eric Brewer, 2000
• Distributed System can’t simultaneously be
• Consistent
• Available
• Partition-tolerant

BASE

• Basically Available
• Soft state
• Eventually consistent
• Relaxation of the C in CAP

Eventual Consistency

• All nodes eventually see the same data
• Different strategies
• One
• Quorum
• All

Horizontal Scaling

• Partitioning
• Sharding
• Dynamo-style

Non-relational
Database Families
• Document-oriented
• Graph
• Column-oriented
• Key-value & DHT
• Others

Document Databases

• IBM Lotus
• CouchDB
• MongoDB
• Riak

MongoDB

• JSON-style documents
• Indexes on any ﬁeld
• Replication, auto-sharding
• Map/Reduce

Other Features

• Document linking & embedding
• GridFS - store large ﬁles
• Geospatial indexes and searches

Graph DBs

http://www.ﬂickr.com/photos/thefangmonster/2301364418/

Graph Databases

• Nodes, relationships & properties
• Query by traversing graph
• Natural ﬁt for recommendations, shortest
paths, social graph

Graph DBs

• FlockDB
• Neo4j
• Apache Hama
• Google Pregel

Neo4j

• Embedded
• Server
• REST
• Components - indexing, management, rdf,
geospatial

Key-Value & DHT

• Amazon Dynamo
• Project Voldemort
• Redis
• Tokyo Cabinet
• Amazon SimpleDB

redis
• By Salvatore Sanﬁllipo (@antirez)
• Sponsored by VMware
• data-structure server
• strings, hashes, lists
• sets, sorted sets
• All operations in memory, backed by disk

Text
Interactive
Documentation

Other features

• Replication (master/slaves)
• Persistence
• Snapshotting
• Append-only log ﬁle

Object Hash Mappers

• cf ORM
• OHM

Other KV Stores

• Berkeley DB
• Memcache
• Microsoft Dynomite

Column-Oriented DBs

http://www.ﬂickr.com/photos/nationalmediamuseum/3588099765/

Column-Oriented
Databases
• Google Bigtable
• Cassandra
• Hypertable
• HBase

HBase

http://www.ﬂickr.com/photos/negativz/14470756/

• Apache top-level project
• Implementation of Google Bigtable
• Distributed
• High write throughput
• ‘real-time’ read/write

HBase

• Automatic partitioning
• Scale linearly and automatically
• Commodity HW
• Fault tolerant
• MapReduce

Data Model

• Schema-less
• Versioned cells
• key/column family/cell qualiﬁer/timestamp
• Column Families

http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html

Text

http://www.larsgeorge.com/2010/01/hbase-architecture-101-write-ahead-log.html

Other DBs

• Couchbase
• Kyoto Cabinet
• Many more I’ve omitted

Wrap Up

• RDBMS vs non-relational
• Distribute DBs
• Non-relational families

The End

@gavinheavyside
gavin.heavyside@mydrivesolutions.com

Non-Relational Databases at ACCU2011

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (20)

Similaire à Non-Relational Databases at ACCU2011

Similaire à Non-Relational Databases at ACCU2011 (20)

Dernier

Dernier (20)

Non-Relational Databases at ACCU2011

Notes de l'éditeur