SlideShare une entreprise Scribd logo
1  sur  51
NOSQL, COUCHDB
 AND THE CLOUD
    Brad Anderson
       Cloudant




          1
BRAD ANDERSON

• BS   Hotel Management

• Restaurant   Chain Data - econometric modeling, BI/DW

• Open   Source - trac, dsource.org, couchdb

• NOSQLEast     2009

• Cloudant

• http://twitter.com/boorad


                                2
AGENDA

• NOSQL

• COUCHDB

 •   Erlang

• Cloud

 •   Dynamo

 •   MapReduce


                   3
YOU’RE SCREWED




http://www.bigfatmoneybags.com/blog/wp-content/uploads/2009/12/screwed.jpg

                                                                             4
RELATIONAL DATABASES


                                                           RDBMS
• Rigid         Schema / ORM fun
                                                           1970-2009
• Scalability

• Everything                 is a Nail



http://www.flickr.com/photos/36041246@N00/3419197777/

                                                       5
MASTER-SLAVE
•   Master-Slave Replication
    •   One (and only one) master
    •   One or more slaves
    •   All writes go to the master, replicated to slaves
    •   Reads balanced among master and slaves
•   Issues
    •   single point of failure
    •   single point of bottleneck
    •   static topology
                                       6
MASTER-MASTER

•   Master-Master Replication
    •   One or more masters
    •   Writes and reads can go to any master
    •   Writes are replicated among masters
•   Issues
    •   limited performance and scalability (typically due to 2PC)
    •   complexity
    •   static topology

                                      7
VERTICAL PARTITION

•   Vertical Partitioning
    •   Put tables belonging to different functional areas on different
        database nodes
        •   Scale data & load by function
        •   Move joins to the application level
•   Issues
    •   no longer truly relational
    •   a functional area grows too much
                                        8
HORIZONTAL PARTITION

•   Horizontal Partitioning
    •   Split tables by key and put partitions (shards) on different nodes
        •   Scale data & load by key
        •   Move joins to the application level
•   Issues
    •   no longer truly relational
    •   a partition grows too much

                                        9
CACHING

•   Put a cache in front of your database
    •   Distribute
    •   Write-through for scaling reads
    •   Write-behind for scaling reads and writes
•   Issues
    •   “only” scales read/write load
    •   invalidation

                                        10
OKAY, NOT SCREWED




http://www.bigfatmoneybags.com/blog/wp-content/uploads/2009/12/screwed.jpg

                                                                             11
NOSQL

        NOT ONLY SQL
   A moniker for different data storage systems
         solving very different problems,
all where a relational database is not the right fit.

                         12
RIGHT FIT


• Google   indexes 400 Pb / day (2007)

• CERN, LHC       generates 100 Pb / sec

• Unique   data created each year (IDC, 2007)
 •   2007 40 Eb

 •   2010 988 Eb (exponential growth)



                                    13
FOUR CATEGORIES
• Key/Value     Stores
   •   Dynomite, Voldemort, Tokyo

• Document       Stores
   •   CouchDB, MongoDB

• Column     Stores / BigTable
   •   HBase, Hypertable

• Graph    Databases
   •   Neo4j, AllegroGraph, VertexDB


                                       14
BIG TAKEAWAY

                                                           function

       function                                             data
                                                function              function

                                                 data                  data



                                     function                                    function
data              data
                                      data                                        data
data              data

data              data
                                     function                                    function
data              data
                                      data                                        data
data              data

                                                function              function

                                                 data                  data
                                                           function

                                                            data




                  Bring the function to the data
                                15
16
HUH? ERLANG?


• Programming    Language created at Ericsson (20 yrs
 old now)

• Designed   for scalable, long-lived systems

• Compiled, Functional, Dynamically Typed, Open
 Source




                                  17
3 BIGGIES
• Massively      Concurrent

    •   green threads, very lightweight != os threads


• Seamlessly       Distributed

    •   node = os thread = VM, processes can live anywhere


• Fault Tolerant

    •   99.9999999 = 32ms downtime per year - AXD301



                                                 18
Of
                  fi   cia
                         lB
                              et
                                a!




CouchDB
 Apache


          19
COUCHDB
• Schema-free      document database server

• Robust, highly   concurrent, fault-tolerant

• RESTful   JSON API

• Futon   web admin console

• MapReduce     system for generating custom views

• Bi-directional   incremental replication

• couchapp: lightweight
                    HTML+JavaScript apps served directly
 from CouchDB using views to transform JSON
                                   20
FROM INTEREST TO ADOPTION




• 100+   production users          • Active
                                          commercial
•3
                                    development
     books being written
                                   • Rapidly   maturing
• Vibrant, open   community   21
OF THE WEB

   Django may be built for the Web, but
  CouchDB is built of the Web. I've never
seen software that so completely embraces
  the philosophies behind HTTP ... this is
 what the software of the future looks like.


                Jacob Kaplan-Moss
                 October 17 2007

   http://jacobian.org/writing/of-the-web/
                       22
DOCUMENTS




• Documents   are JSON Objects

• Underscore-prefixed   fields are reserved

• Documents   can have binary attachments

• MVCC   _rev deterministically generated from doc content
                               23
ROBUST

• Never   overwrite previously committed data

• In
   the event of a server crash or power failure, just restart
 CouchDB -- there is no “repair”

• Take   snapshots with “cp”

• Configurable levels of durability: can choose to fsync after
 every update, or less often to gain better throughput



                                24
CONCURRENT

• Erlang
       approach: lightweight processes to model the natural
 concurrency in a problem

• For   CouchDB that means one process per TCP connection

• Lock-free
          architecture; each process works with an MVCC
 snapshot of a DB.

• Performance   degrades gracefully under heavy concurrent load


                               25
REST API
• Create
 PUT /mydb/mydocid

• Retrieve
 GET /mydb/mydocid

• Update
 PUT /mydb/mydocid

• Delete
 DELETE /mydb/mydocid

                        26
27
VIEWS
• Custom, persistent   representations of document data

• “Closeto the metal” -- no dynamic queries in production, so
 you know exactly what you’re getting

• Generated using MapReduce functions written in JavaScript
 (and other languages)

     view must have a map function and may also have a
• Each
 reduce function

• Leverages   view collation, rich view query API
                                 28
DOCUMENTS BY AUTHOR




         29
WORD COUNT




    30
INCREMENTAL
• Computing   a view can be expensive, so CouchDB saves the
 result in a B-tree and keeps it up-to-date

• Leafnodes store map results, inner nodes store reductions of
 children




 http://horicky.blogspot.com/2008/10/couchdb-implementation.html
                               31
REPLICATION
• Peer-based, bi-directional   replication using normal HTTP calls

• Mediated  by a replicator process which can live on the
 source, target, or somewhere else entirely

• Replicate
          a subset of documents in a DB meeting criteria
 defined in a custom filter function (coming soon)

• Applications   (_design documents) replicate along with the
 data

• Ideal   for offline applications -- “ground computing”
                                   32
CLOUD




  33
SHOWROOM
 A cluster of couches




          34
ARCHITECTURE

• Each   cluster is a ring of nodes (Dynamo, Dynomite)

• Any    node can handle request (consistent hashing)

  • O(1), with   a hop

• nodes    own partitions (ring is divided)

• data   are distributed evenly across partitions and replicas

• mapreduce     functions are passed to nodes for execution
RESEARCH


• Google’s   MapReduce, http://bit.ly/bJbyq5

• Amazon’s   Dynamo, http://bit.ly/b7FlsN

• CAP   theorem, http://bit.ly/bERr2H
CLUSTER CONTROLS

•N   - Replication
                     Q
•Q   - Partitions = 2

•R   - Read Quorum

•W   - Write Quorum



• These   constants define the cluster
N


                                              Consistency
Throughput
                                               Durability




    N = Number of replicas per item stored in cluster
Q


Throughput                                    Scalability




     2^Q = Number of partitions (shards) in cluster
           T = Number of nodes in cluster
       2^Q / T = Number of partitions per node
R


Latency                                      Consistency




          R = Number of successful reads before
               returning value(s) to client
W


Latency                                   Durability




      W = Number of successful writes before
           returning ‘success’ to client
Load Balancer




                                Node 1

            24                                           No
       de                A     B     C       D              de
    No                                           B
                                                                 2
                     A
                 Z                                   C
     Y                                                      D
X                                                                    E


                                                                         C       N
                                                                                  od
                                                                                     e
                                                                             D           3

                                                                                 E

                                                                                             F




                                                                                                 D



                                                                                                             No
                                                                                                             de
                                                                                                     E



                                                                                                              4
                                                                                                         F
                                                                                                             G
request

    PUT http://boorad.cloudant.com/dbname/blah?w=2




                             Load Balancer




                                Node 1

            24                                           No
       de                A     B     C       D              de
    No                                           B
                                                                 2
                     A
                 Z                                   C
      Y                                                     D
X                                                                    E


                                                                         C       N
                                                                                  od
                                                                                     e
                                                                             D           3

                                                                                 E

                                                                                             F




                                                                                                 D



                                                                                                             No
                                                                                                             de
                                                                                                     E



                                                                                                              4
                                                                                                         F
                                                                                                             G
request

    PUT http://boorad.cloudant.com/dbname/blah?w=2




                             Load Balancer




                                Node 1

            24                                           No
       de                A     B     C       D              de
    No                                           B
                                                                 2
                     A
                 Z                                   C
      Y                                                     D
X                                                                    E


                                                                         C       N
                                                                                  od
                                                                                     e
                                                                             D           3

                                                                                 E

                                                                                             F




                                                                                                 D



                                                                                                             No
                                                                                                             de
                                                                                                     E



                                                                                                              4
                                                                                                         F
                                                                                                             G
request

    PUT http://boorad.cloudant.com/dbname/blah?w=2




                             Load Balancer




                                Node 1

            24                                           No
       de                A     B     C       D              de
    No                                           B
                                                                 2
                     A
                 Z                                   C
      Y                                                     D
X                        hash(blah) = E                              E


                                                                         C       N
                                                                                  od
                                                                                     e
                                                                             D           3

                                                                                 E

                                                                                             F




                                                                                                 D



                                                                                                             No
                                                                                                             de
                                                                                                     E



                                                                                                              4
                                                                                                         F
                                                                                                             G
request

    PUT http://boorad.cloudant.com/dbname/blah?w=2

                                                                                                         N=3
                                                                                                         W=2
                             Load Balancer
                                                                                                         R=2

                                Node 1

            24                                           No
       de                A     B     C       D              de
    No                                           B
                                                                 2
                     A
                 Z                                   C
      Y                                                     D
X                        hash(blah) = E                              E


                                                                         C       N
                                                                                  od
                                                                                     e
                                                                             D           3

                                                                                 E

                                                                                             F




                                                                                                 D



                                                                                                             No
                                                                                                             de
                                                                                                     E



                                                                                                              4
                                                                                                         F
                                                                                                             G
request

    PUT http://boorad.cloudant.com/dbname/blah?w=2

                                                                                                         N=3
                                                                                                         W=2
                             Load Balancer
                                                                                                         R=2

                                Node 1

            24                                           No
       de                A     B     C       D              de
    No                                           B
                                                                 2
                     A
                 Z                                   C
      Y                                                     D
X                        hash(blah) = E                              E


                                                                         C       N
                                                                                  od
                                                                                     e
                                                                             D           3

                                                                                 E

                                                                                             F




                                                                                                 D



                                                                                                             No
                                                                                                             de
                                                                                                     E



                                                                                                              4
                                                                                                         F
                                                                                                             G
request

    PUT http://boorad.cloudant.com/dbname/blah?w=2

                                                                                                         N=3
                                                                                                         W=2
                             Load Balancer
                                                                                                         R=2


            24
                                Node 1

                                                         No
                                                                                                 node down
       de                A     B     C       D              de
    No                                           B
                                                                 2
                     A
                 Z                                   C
      Y                                                     D
X                        hash(blah) = E                              E


                                                                         C       N
                                                                                  od
                                                                                     e
                                                                             D           3

                                                                                 E

                                                                                             F




                                                                                                 D



                                                                                                             No
                                                                                                             de
                                                                                                     E



                                                                                                              4
                                                                                                         F
                                                                                                             G
RESULT

• For   standalone or cluster
  •   one REST API

  •   one URL

• For   cluster
  •   redundant data

  •   distributed queries

  •   scale out

                                43
QUESTIONS?
CREDITS



• Emil    Eifrem, http://bit.ly/5D40WQ

• Sergio    Bossa, http://bit.ly/c9UoRZ

• Cliff   Moon, http://bit.ly/bX887c




                                   45

Contenu connexe

Tendances

Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)Chris Richardson
 
[Hi c2011]building mission critical messaging system(guoqiang jerry)
[Hi c2011]building mission critical messaging system(guoqiang jerry)[Hi c2011]building mission critical messaging system(guoqiang jerry)
[Hi c2011]building mission critical messaging system(guoqiang jerry)baggioss
 
Storage infrastructure using HBase behind LINE messages
Storage infrastructure using HBase behind LINE messagesStorage infrastructure using HBase behind LINE messages
Storage infrastructure using HBase behind LINE messagesLINE Corporation (Tech Unit)
 
Developing polyglot persistence applications #javaone 2012
Developing polyglot persistence applications  #javaone 2012Developing polyglot persistence applications  #javaone 2012
Developing polyglot persistence applications #javaone 2012Chris Richardson
 
MongoDB at eBay
MongoDB at eBayMongoDB at eBay
MongoDB at eBayMongoDB
 
Petabyte scale on commodity infrastructure
Petabyte scale on commodity infrastructurePetabyte scale on commodity infrastructure
Petabyte scale on commodity infrastructureelliando dias
 
HBase @ Twitter
HBase @ TwitterHBase @ Twitter
HBase @ Twitterctrezzo
 
Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...
Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...
Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...Cloudera, Inc.
 
Learning To Relax
Learning To RelaxLearning To Relax
Learning To RelaxCloudant
 
Hadoop Successes and Failures to Drive Deployment Evolution
Hadoop Successes and Failures to Drive Deployment EvolutionHadoop Successes and Failures to Drive Deployment Evolution
Hadoop Successes and Failures to Drive Deployment EvolutionBenoit Perroud
 
What's behind facebook
What's behind facebookWhat's behind facebook
What's behind facebookAjen 陳
 
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB present...
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB  present...MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB  present...
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB present...MongoDB
 
Yahoo Pipes Middleware In The Cloud
Yahoo Pipes Middleware In The CloudYahoo Pipes Middleware In The Cloud
Yahoo Pipes Middleware In The CloudConSanFrancisco123
 
MongoDB Case Study at NoSQL Now 2012
MongoDB Case Study at NoSQL Now 2012MongoDB Case Study at NoSQL Now 2012
MongoDB Case Study at NoSQL Now 2012Sean Laurent
 
Big data for the rest of us
Big data for the rest of usBig data for the rest of us
Big data for the rest of usSteven Francia
 
MariaDB - a MySQL Replacement #SELF2014
MariaDB - a MySQL Replacement #SELF2014MariaDB - a MySQL Replacement #SELF2014
MariaDB - a MySQL Replacement #SELF2014Colin Charles
 
Your backend architecture is what matters slideshare
Your backend architecture is what matters slideshareYour backend architecture is what matters slideshare
Your backend architecture is what matters slideshareColin Charles
 

Tendances (20)

Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)
 
[Hi c2011]building mission critical messaging system(guoqiang jerry)
[Hi c2011]building mission critical messaging system(guoqiang jerry)[Hi c2011]building mission critical messaging system(guoqiang jerry)
[Hi c2011]building mission critical messaging system(guoqiang jerry)
 
Storage infrastructure using HBase behind LINE messages
Storage infrastructure using HBase behind LINE messagesStorage infrastructure using HBase behind LINE messages
Storage infrastructure using HBase behind LINE messages
 
Developing polyglot persistence applications #javaone 2012
Developing polyglot persistence applications  #javaone 2012Developing polyglot persistence applications  #javaone 2012
Developing polyglot persistence applications #javaone 2012
 
MongoDB at eBay
MongoDB at eBayMongoDB at eBay
MongoDB at eBay
 
Petabyte scale on commodity infrastructure
Petabyte scale on commodity infrastructurePetabyte scale on commodity infrastructure
Petabyte scale on commodity infrastructure
 
HBase @ Twitter
HBase @ TwitterHBase @ Twitter
HBase @ Twitter
 
Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...
Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...
Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...
 
Learning To Relax
Learning To RelaxLearning To Relax
Learning To Relax
 
Hadoop Successes and Failures to Drive Deployment Evolution
Hadoop Successes and Failures to Drive Deployment EvolutionHadoop Successes and Failures to Drive Deployment Evolution
Hadoop Successes and Failures to Drive Deployment Evolution
 
What's behind facebook
What's behind facebookWhat's behind facebook
What's behind facebook
 
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB present...
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB  present...MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB  present...
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB present...
 
Yahoo Pipes Middleware In The Cloud
Yahoo Pipes Middleware In The CloudYahoo Pipes Middleware In The Cloud
Yahoo Pipes Middleware In The Cloud
 
VF NZ
VF NZVF NZ
VF NZ
 
MongoDB Case Study at NoSQL Now 2012
MongoDB Case Study at NoSQL Now 2012MongoDB Case Study at NoSQL Now 2012
MongoDB Case Study at NoSQL Now 2012
 
Big data for the rest of us
Big data for the rest of usBig data for the rest of us
Big data for the rest of us
 
Why MariaDB?
Why MariaDB?Why MariaDB?
Why MariaDB?
 
Qcon
QconQcon
Qcon
 
MariaDB - a MySQL Replacement #SELF2014
MariaDB - a MySQL Replacement #SELF2014MariaDB - a MySQL Replacement #SELF2014
MariaDB - a MySQL Replacement #SELF2014
 
Your backend architecture is what matters slideshare
Your backend architecture is what matters slideshareYour backend architecture is what matters slideshare
Your backend architecture is what matters slideshare
 

Similaire à NOSQL, CouchDB, and the Cloud

An Introduction to Big Data, NoSQL and MongoDB
An Introduction to Big Data, NoSQL and MongoDBAn Introduction to Big Data, NoSQL and MongoDB
An Introduction to Big Data, NoSQL and MongoDBWilliam LaForest
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQLDon Demcsak
 
Running MongoDB in the Cloud
Running MongoDB in the CloudRunning MongoDB in the Cloud
Running MongoDB in the CloudTony Tam
 
Intro to NoSQL and MongoDB
Intro to NoSQL and MongoDBIntro to NoSQL and MongoDB
Intro to NoSQL and MongoDBDATAVERSITY
 
Large scale computing with mapreduce
Large scale computing with mapreduceLarge scale computing with mapreduce
Large scale computing with mapreducehansen3032
 
NoSQL in the context of Social Web
NoSQL in the context of Social WebNoSQL in the context of Social Web
NoSQL in the context of Social WebBogdan Gaza
 
NoSQL on the move
NoSQL on the moveNoSQL on the move
NoSQL on the moveCodemotion
 
001 hbase introduction
001 hbase introduction001 hbase introduction
001 hbase introductionScott Miao
 
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...Qian Lin
 
Microsoft Openness Mongo DB
Microsoft Openness Mongo DBMicrosoft Openness Mongo DB
Microsoft Openness Mongo DBHeriyadi Janwar
 
Discover MongoDB - Israel
Discover MongoDB - IsraelDiscover MongoDB - Israel
Discover MongoDB - IsraelMichael Fiedler
 
Lviv EDGE 2 - NoSQL
Lviv EDGE 2 - NoSQLLviv EDGE 2 - NoSQL
Lviv EDGE 2 - NoSQLzenyk
 

Similaire à NOSQL, CouchDB, and the Cloud (20)

An Introduction to Big Data, NoSQL and MongoDB
An Introduction to Big Data, NoSQL and MongoDBAn Introduction to Big Data, NoSQL and MongoDB
An Introduction to Big Data, NoSQL and MongoDB
 
Drill njhug -19 feb2013
Drill njhug -19 feb2013Drill njhug -19 feb2013
Drill njhug -19 feb2013
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQL
 
Running MongoDB in the Cloud
Running MongoDB in the CloudRunning MongoDB in the Cloud
Running MongoDB in the Cloud
 
Hadoop DB
Hadoop DBHadoop DB
Hadoop DB
 
Intro to NoSQL and MongoDB
Intro to NoSQL and MongoDBIntro to NoSQL and MongoDB
Intro to NoSQL and MongoDB
 
MongoDB
MongoDBMongoDB
MongoDB
 
Large scale computing with mapreduce
Large scale computing with mapreduceLarge scale computing with mapreduce
Large scale computing with mapreduce
 
NoSQL in the context of Social Web
NoSQL in the context of Social WebNoSQL in the context of Social Web
NoSQL in the context of Social Web
 
Apache Drill
Apache DrillApache Drill
Apache Drill
 
NoSQL on the move
NoSQL on the moveNoSQL on the move
NoSQL on the move
 
Drop acid
Drop acidDrop acid
Drop acid
 
Anti-social Databases
Anti-social DatabasesAnti-social Databases
Anti-social Databases
 
Wmware NoSQL
Wmware NoSQLWmware NoSQL
Wmware NoSQL
 
001 hbase introduction
001 hbase introduction001 hbase introduction
001 hbase introduction
 
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
 
Microsoft Openness Mongo DB
Microsoft Openness Mongo DBMicrosoft Openness Mongo DB
Microsoft Openness Mongo DB
 
Discover MongoDB - Israel
Discover MongoDB - IsraelDiscover MongoDB - Israel
Discover MongoDB - Israel
 
Lviv EDGE 2 - NoSQL
Lviv EDGE 2 - NoSQLLviv EDGE 2 - NoSQL
Lviv EDGE 2 - NoSQL
 
A peek into the future
A peek into the futureA peek into the future
A peek into the future
 

Plus de boorad

Big Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns with Hadoop, Mahout and SolrBig Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns with Hadoop, Mahout and Solrboorad
 
Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013boorad
 
Hadoop and Storm - AJUG talk
Hadoop and Storm - AJUG talkHadoop and Storm - AJUG talk
Hadoop and Storm - AJUG talkboorad
 
Realtime Computation with Storm
Realtime Computation with StormRealtime Computation with Storm
Realtime Computation with Stormboorad
 
Big Data Use Cases
Big Data Use CasesBig Data Use Cases
Big Data Use Casesboorad
 
PhillyDB Talk - Beyond Batch
PhillyDB Talk - Beyond BatchPhillyDB Talk - Beyond Batch
PhillyDB Talk - Beyond Batchboorad
 
TriHUG - Beyond Batch
TriHUG - Beyond BatchTriHUG - Beyond Batch
TriHUG - Beyond Batchboorad
 
Realtime Computation with Storm
Realtime Computation with StormRealtime Computation with Storm
Realtime Computation with Stormboorad
 
Large Scale Data Analysis Tools
Large Scale Data Analysis ToolsLarge Scale Data Analysis Tools
Large Scale Data Analysis Toolsboorad
 
DevNexus 2011
DevNexus 2011DevNexus 2011
DevNexus 2011boorad
 
Why Erlang? - Bar Camp Atlanta 2008
Why Erlang?  - Bar Camp Atlanta 2008Why Erlang?  - Bar Camp Atlanta 2008
Why Erlang? - Bar Camp Atlanta 2008boorad
 

Plus de boorad (11)

Big Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns with Hadoop, Mahout and SolrBig Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns with Hadoop, Mahout and Solr
 
Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013
 
Hadoop and Storm - AJUG talk
Hadoop and Storm - AJUG talkHadoop and Storm - AJUG talk
Hadoop and Storm - AJUG talk
 
Realtime Computation with Storm
Realtime Computation with StormRealtime Computation with Storm
Realtime Computation with Storm
 
Big Data Use Cases
Big Data Use CasesBig Data Use Cases
Big Data Use Cases
 
PhillyDB Talk - Beyond Batch
PhillyDB Talk - Beyond BatchPhillyDB Talk - Beyond Batch
PhillyDB Talk - Beyond Batch
 
TriHUG - Beyond Batch
TriHUG - Beyond BatchTriHUG - Beyond Batch
TriHUG - Beyond Batch
 
Realtime Computation with Storm
Realtime Computation with StormRealtime Computation with Storm
Realtime Computation with Storm
 
Large Scale Data Analysis Tools
Large Scale Data Analysis ToolsLarge Scale Data Analysis Tools
Large Scale Data Analysis Tools
 
DevNexus 2011
DevNexus 2011DevNexus 2011
DevNexus 2011
 
Why Erlang? - Bar Camp Atlanta 2008
Why Erlang?  - Bar Camp Atlanta 2008Why Erlang?  - Bar Camp Atlanta 2008
Why Erlang? - Bar Camp Atlanta 2008
 

Dernier

Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 

Dernier (20)

Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 

NOSQL, CouchDB, and the Cloud

  • 1. NOSQL, COUCHDB AND THE CLOUD Brad Anderson Cloudant 1
  • 2. BRAD ANDERSON • BS Hotel Management • Restaurant Chain Data - econometric modeling, BI/DW • Open Source - trac, dsource.org, couchdb • NOSQLEast 2009 • Cloudant • http://twitter.com/boorad 2
  • 3. AGENDA • NOSQL • COUCHDB • Erlang • Cloud • Dynamo • MapReduce 3
  • 5. RELATIONAL DATABASES RDBMS • Rigid Schema / ORM fun 1970-2009 • Scalability • Everything is a Nail http://www.flickr.com/photos/36041246@N00/3419197777/ 5
  • 6. MASTER-SLAVE • Master-Slave Replication • One (and only one) master • One or more slaves • All writes go to the master, replicated to slaves • Reads balanced among master and slaves • Issues • single point of failure • single point of bottleneck • static topology 6
  • 7. MASTER-MASTER • Master-Master Replication • One or more masters • Writes and reads can go to any master • Writes are replicated among masters • Issues • limited performance and scalability (typically due to 2PC) • complexity • static topology 7
  • 8. VERTICAL PARTITION • Vertical Partitioning • Put tables belonging to different functional areas on different database nodes • Scale data & load by function • Move joins to the application level • Issues • no longer truly relational • a functional area grows too much 8
  • 9. HORIZONTAL PARTITION • Horizontal Partitioning • Split tables by key and put partitions (shards) on different nodes • Scale data & load by key • Move joins to the application level • Issues • no longer truly relational • a partition grows too much 9
  • 10. CACHING • Put a cache in front of your database • Distribute • Write-through for scaling reads • Write-behind for scaling reads and writes • Issues • “only” scales read/write load • invalidation 10
  • 12. NOSQL NOT ONLY SQL A moniker for different data storage systems solving very different problems, all where a relational database is not the right fit. 12
  • 13. RIGHT FIT • Google indexes 400 Pb / day (2007) • CERN, LHC generates 100 Pb / sec • Unique data created each year (IDC, 2007) • 2007 40 Eb • 2010 988 Eb (exponential growth) 13
  • 14. FOUR CATEGORIES • Key/Value Stores • Dynomite, Voldemort, Tokyo • Document Stores • CouchDB, MongoDB • Column Stores / BigTable • HBase, Hypertable • Graph Databases • Neo4j, AllegroGraph, VertexDB 14
  • 15. BIG TAKEAWAY function function data function function data data function function data data data data data data data data function function data data data data data data function function data data function data Bring the function to the data 15
  • 16. 16
  • 17. HUH? ERLANG? • Programming Language created at Ericsson (20 yrs old now) • Designed for scalable, long-lived systems • Compiled, Functional, Dynamically Typed, Open Source 17
  • 18. 3 BIGGIES • Massively Concurrent • green threads, very lightweight != os threads • Seamlessly Distributed • node = os thread = VM, processes can live anywhere • Fault Tolerant • 99.9999999 = 32ms downtime per year - AXD301 18
  • 19. Of fi cia lB et a! CouchDB Apache 19
  • 20. COUCHDB • Schema-free document database server • Robust, highly concurrent, fault-tolerant • RESTful JSON API • Futon web admin console • MapReduce system for generating custom views • Bi-directional incremental replication • couchapp: lightweight HTML+JavaScript apps served directly from CouchDB using views to transform JSON 20
  • 21. FROM INTEREST TO ADOPTION • 100+ production users • Active commercial •3 development books being written • Rapidly maturing • Vibrant, open community 21
  • 22. OF THE WEB Django may be built for the Web, but CouchDB is built of the Web. I've never seen software that so completely embraces the philosophies behind HTTP ... this is what the software of the future looks like. Jacob Kaplan-Moss October 17 2007 http://jacobian.org/writing/of-the-web/ 22
  • 23. DOCUMENTS • Documents are JSON Objects • Underscore-prefixed fields are reserved • Documents can have binary attachments • MVCC _rev deterministically generated from doc content 23
  • 24. ROBUST • Never overwrite previously committed data • In the event of a server crash or power failure, just restart CouchDB -- there is no “repair” • Take snapshots with “cp” • Configurable levels of durability: can choose to fsync after every update, or less often to gain better throughput 24
  • 25. CONCURRENT • Erlang approach: lightweight processes to model the natural concurrency in a problem • For CouchDB that means one process per TCP connection • Lock-free architecture; each process works with an MVCC snapshot of a DB. • Performance degrades gracefully under heavy concurrent load 25
  • 26. REST API • Create PUT /mydb/mydocid • Retrieve GET /mydb/mydocid • Update PUT /mydb/mydocid • Delete DELETE /mydb/mydocid 26
  • 27. 27
  • 28. VIEWS • Custom, persistent representations of document data • “Closeto the metal” -- no dynamic queries in production, so you know exactly what you’re getting • Generated using MapReduce functions written in JavaScript (and other languages) view must have a map function and may also have a • Each reduce function • Leverages view collation, rich view query API 28
  • 31. INCREMENTAL • Computing a view can be expensive, so CouchDB saves the result in a B-tree and keeps it up-to-date • Leafnodes store map results, inner nodes store reductions of children http://horicky.blogspot.com/2008/10/couchdb-implementation.html 31
  • 32. REPLICATION • Peer-based, bi-directional replication using normal HTTP calls • Mediated by a replicator process which can live on the source, target, or somewhere else entirely • Replicate a subset of documents in a DB meeting criteria defined in a custom filter function (coming soon) • Applications (_design documents) replicate along with the data • Ideal for offline applications -- “ground computing” 32
  • 34. SHOWROOM A cluster of couches 34
  • 35. ARCHITECTURE • Each cluster is a ring of nodes (Dynamo, Dynomite) • Any node can handle request (consistent hashing) • O(1), with a hop • nodes own partitions (ring is divided) • data are distributed evenly across partitions and replicas • mapreduce functions are passed to nodes for execution
  • 36. RESEARCH • Google’s MapReduce, http://bit.ly/bJbyq5 • Amazon’s Dynamo, http://bit.ly/b7FlsN • CAP theorem, http://bit.ly/bERr2H
  • 37. CLUSTER CONTROLS •N - Replication Q •Q - Partitions = 2 •R - Read Quorum •W - Write Quorum • These constants define the cluster
  • 38. N Consistency Throughput Durability N = Number of replicas per item stored in cluster
  • 39. Q Throughput Scalability 2^Q = Number of partitions (shards) in cluster T = Number of nodes in cluster 2^Q / T = Number of partitions per node
  • 40. R Latency Consistency R = Number of successful reads before returning value(s) to client
  • 41. W Latency Durability W = Number of successful writes before returning ‘success’ to client
  • 42. Load Balancer Node 1 24 No de A B C D de No B 2 A Z C Y D X E C N od e D 3 E F D No de E 4 F G
  • 43. request PUT http://boorad.cloudant.com/dbname/blah?w=2 Load Balancer Node 1 24 No de A B C D de No B 2 A Z C Y D X E C N od e D 3 E F D No de E 4 F G
  • 44. request PUT http://boorad.cloudant.com/dbname/blah?w=2 Load Balancer Node 1 24 No de A B C D de No B 2 A Z C Y D X E C N od e D 3 E F D No de E 4 F G
  • 45. request PUT http://boorad.cloudant.com/dbname/blah?w=2 Load Balancer Node 1 24 No de A B C D de No B 2 A Z C Y D X hash(blah) = E E C N od e D 3 E F D No de E 4 F G
  • 46. request PUT http://boorad.cloudant.com/dbname/blah?w=2 N=3 W=2 Load Balancer R=2 Node 1 24 No de A B C D de No B 2 A Z C Y D X hash(blah) = E E C N od e D 3 E F D No de E 4 F G
  • 47. request PUT http://boorad.cloudant.com/dbname/blah?w=2 N=3 W=2 Load Balancer R=2 Node 1 24 No de A B C D de No B 2 A Z C Y D X hash(blah) = E E C N od e D 3 E F D No de E 4 F G
  • 48. request PUT http://boorad.cloudant.com/dbname/blah?w=2 N=3 W=2 Load Balancer R=2 24 Node 1 No node down de A B C D de No B 2 A Z C Y D X hash(blah) = E E C N od e D 3 E F D No de E 4 F G
  • 49. RESULT • For standalone or cluster • one REST API • one URL • For cluster • redundant data • distributed queries • scale out 43
  • 51. CREDITS • Emil Eifrem, http://bit.ly/5D40WQ • Sergio Bossa, http://bit.ly/c9UoRZ • Cliff Moon, http://bit.ly/bX887c 45

Notes de l'éditeur

  1. 20 yrs old, open source since mid-90’s, iirc. like a mobile telephone grid compiled (but to bytecode for a VM) open source
  2. Why Erlang? Here are my three big ticket items - massively concurrent - seamlessly distributed into multi-machine clusters - extremely fault tolerant Great for my projects - data storage & retrieval - scalable web apps Maybe not so hot for computationally intensive projects - unless they lend themselves to parallelism