SlideShare une entreprise Scribd logo
1  sur  34
Télécharger pour lire hors ligne
Migrating to MongoDB
Why we moved from MySQL to Mongo
Getting to know Mongo
Demo app using Mongo with PHP
Reasons we looked for
alternative to RDBM setup
Issues with our RDBM setup

Architecture was highly distributed, number of
databases was becoming an issue
Storing similar objects with different structure
Options for scalability
Storing files
Many DBs
In a MySQL server (with MyISAM)...
  1 database = 1 directory
  1 table = more than 1 file in DB directory
Filesystem limits number of inodes per directory and it’s
not that big
Had a mix of MySQL with SQLite databases spreaded
across directory hierarchy
Many DBs
In a Mongo server ...
  No 1:1 relation between databases and files
  Stores data set of files pre-allocated with increasing
  size
  Number of files grows as needed
Using many collections within a single database
allowed to move everything in DB server
A “collection”?

 RDBM model:
   Database has tables which hold records
   Records in a table are identical
 Document-oriented storage
   Database has collections which hold documents
Obj. with differing structure

 For example, events where attributes vary based on
 type of event
   Event A: from, att1
   Event B: from, att1, att2
   Event C: from, att3, att4
 What’s your schema for this?
tbl_events_A
      id     from          Att1

      1      Jim           1237

      2      Dave          362                  tbl_events_C
      3      Bob           9283         id   from    Att3      Att4

                                        1    Bob     hello     7249

       tbl_events_B                     2    Bill   goodbye   23091

id   from           Att1         Att2   3    Jim    testing    2334

1    Bill       2938              23

2    Jim            632           9

3    Hugh      12832              14
tbl_events
id   type   from   Att1     Att2    Att3     Att4
1     A     Jim    1237    NULL     NULL     NULL
2     A     Dave   362     NULL     NULL     NULL
3     B     Bill   2938     23      NULL     NULL
4     C     Bob    NULL    NULL     hello    7249
5     A     Bob    9283    NULL     NULL     NULL
6     C     Bill   NULL    NULL    goodbye   23091
7     B     Jim    632       9      NULL     NULL
8     B     Hugh   12832    14      NULL     NULL
9     C     Jim    NULL    NULL    testing   2334
tbl_events
id   type   from                    Attributes
1     A     Jim                  “{‘att1’:1237}”
2     A     Dave                  “{‘att1’:362}”
3     B     Bill            “{‘att1’:2938, ‘att2’:23}”
4     C     Bob           “{‘att3’:‘hello’, ‘att4’:7249}”
5     A     Bob                  “{‘att1’:9283}”
6     C     Bill        “{‘att3’:‘goodbye’, ‘att4’:2391}”
7     B     Jim              “{‘att1’:632, ‘att2’:9}”
8     B     Hugh           “{‘att1’:12832, ‘att2’:14}”
9     C     Jim          “{‘att3’:‘testing’, ‘att4’:2334}”
tbl_events               tbl_events_attributes
id     type       from   id      eventId     name        value
1       A         Jim    1         1             att1    1237
2       A         Dave   2         2             att1    362
3       B         Bill   3         3             att1    2938
4       C         Bob    4         3             att2     23
5       A         Bob    5         4             att3    hello
6       C         Bill
                         6         4             att4    7249
7       B         Jim
                         7         5             att1    9283
8       B         Hugh
                         8         6             att3   goodbye
9       C         Jim
                         9         6             att4    2391
                         10        7             att1    632
                         11        7             att2     9
                                           ...
Obj. with differing structure

 Document-oriented storage link Mongo is schema-less
   1 collection for all events
   Each document has the structure applicable for its
   type
   Can index common attributes for queries
events collection :

{id:1,   type:’A’,   from:‘Jim’, att1:1237}
{id:2,   type:’A’,   from:‘Dave’, att1:362}
{id:5,   type:’A’,   from:‘Bob’, att1:9238}
{id:3,   type:’B’,   from:‘Bill’, att1:2938, att2:23}
{id:7,   type:’B’,   from:‘Jim’, att1:632, att2:9}
{id:8,   type:’B’,   from:‘Hugh’, att1:12832, att2:14}
{id:4,   type:’C’,   from:‘Bill’, att3:‘hello’, att4:7249}
{id:6,   type:’C’,   from:‘Jim’, att3:‘goodbye’, att4:23091}
{id:9,   type:’C’,   from:‘Hugh’, att3:‘testing’, att4:2334}
Options for scalability


 MySQL - Master-slave replication
 Mongo - Support master slave, replica pairs, master
 master and ... auto-sharding
Storing files

 In MySQL, you can use a table with BLOB field and
 other field for file meta data
 Mongo has GridFS
   Built for storage of large objects
   Split into chunks, also stores metadata
> db.fs.files.findOne();
{
! "_id" : ObjectId("4b9525096b00bd59b95f791f"),
! "filename" : "user.png",
! "length" : 43717,
! "chunkSize" : 262144,
! "uploadDate" : "Mon Mar 08 2010 11:25:45 GMT-0500 (EST)",
! "md5" : "3f6fcd4c0a51655d392fe95a99c29140",
! "mimeType" : "image/png"
}
> db.fs.chunks.findOne();
{
! "_id" : ObjectId("4b952509c568bb9fc8e3cddb"),
! "files_id" : ObjectId("4b9525096b00bd59b95f791f"),
! "n" : 0,
! "data" : BinData type: 2 len: 43721
}
Getting to know MongoDB
Basic concepts
A database has collections which holds documents
Documents in a collection can have any structure
Documents are JSON objects, stored as BSON
Data types:
  all basic JSON types: string, integer, boolean,
  double, null, array, object
  Special types: date, object id, binary, regexp, code
Important differences

 Collections instead of tables
 ObjectID instead of primary keys
 References instead of foreign keys
 JavaScript code execution instead of stored
 procedures
 [NULL] instead of joins
Inserting data
> doc = { author: 'joe',
  created : new Date('03-28-2009'),
  title : 'Yet another blog post',
  text : 'Here is the text...',
  tags : [ 'example', 'joe' ],
  comments : [
    { author: 'jim', comment: 'I disagree' },
    { author: 'nancy', comment: 'Good post' }
  ]
}
> db.posts.insert(doc);
Querying data
>   db.posts.find();
>   db.posts.find({‘author’:‘joe’});
>   db.posts.find({‘comments.author’:‘nancy’});
>   db.posts.find({‘comments.comment’: /disagree/i });

> db.posts.findOne({‘comment.author’:‘nancy’});
> db.posts.find({‘comment.author’:‘nancy’}).limit(5);

> db.posts.find({},{‘author’:true, ‘tags’:true});

> db.posts.find({‘author’:‘nancy’}).sort({‘created’:1});
Querying - advanced
features
  Support of OR conditions
  $ modifiers to introduce conditions
> db.posts.find({timestamp: {$gte:1268149684}});

  $where modifiers
> db.pictures.find({$where: function() { return
(this.creationTimestamp >= 1268149684) }})

  MapReduce
  Server-side code execution
> function getUniques() {
...   var uniques = [];
...   db.pictures.find({},{tags:true}).forEach(function(pic) {
...     pic.tags.forEach(function(tag) {
...       if (uniques.indexOf(tag) == -1) uniques.push(tag);
...     });
...   });
...   return uniques;
... }
> db.eval(getUniques);
[
! "firstTag",
! "thirdTag",
! "toto",
! "test",
! "comic",
! "secondTag"
]
Updating data
update( criteria, objNew, upsert, multi )
> db.myColl.update( { name: "Joe" }, { name: "Joe", age:
20 }, true, false );


save(object) - insert or update if _id exists
Update modifier operators

  $inc, $set, $unset, $push, $pushAll, $addToSet, $pop,
  $pull, $pullAll
> db.myColl.update({name:"Joe"}, { $set:{age:20}});

> db.posts.update({author:”Joe”},{$push:{tags:‘hockey’}});

> db.posts.update({},{$addToSet:{tags:‘hockey’}});
Removing data
> db.things.remove({});    // removes all
> db.things.remove({n:1}); // removes all where n == 1
> db.things.remove({_id: myobject._id});
References
>   p = db.postings.findOne();
{
!    "_id" : ObjectId("4b866f08234ae01d21d89604"),
!    "author" : "jim",
!    "title" : "Brewing Methods"
}
>   // get more info on author
>   db.users.findOne( { _id : p.author } )
{   "_id" : "jim", "email" : "jim@gmail.com" }
>   x = { name : 'Biology' }
{   "name" : "Biology" }
>   db.courses.save(x)
>   x
{   "name" : "Biology", "_id" : ObjectId("4b0552b0f0da7d1eb6f126a1") }

> stu = { name : 'Joe', classes : [ new DBRef('courses', x._id) ] }
> db.students.save(stu)
> stu
{
        "name" : "Joe",
        "classes" : [
                 {
                        "$ref" : "courses",
                        "$id" : ObjectId("4b0552b0f0da7d1eb6f126a1")
                 }
        ],
        "_id" : ObjectId("4b0552e4f0da7d1eb6f126a2")
}
> stu.classes[0]
{ "$ref" : "courses", "$id" : ObjectId("4b0552b0f0da7d1eb6f126a1") }

> stu.classes[0].fetch()
{ "_id" : ObjectId("4b0552b0f0da7d1eb6f126a1"), "name" : "Biology" }
Limitations to keep in mind


 Namespace limit (24 000 collections and indexes)
 Database size maxed to 2GB on 32-bit systems ... use
 a 64-bit production system!
Licensing

   MongoDB is GNU AGPL 3.0, supported drivers re
   Apache License v2.0
   From www.mongodb.org/display/DOCS/Licensing :
If you are using a vanilla MongoDB server from either source or binary packages you
have NO obligations. You can ignore the rest of this page.
Hands-on example
SQL schema
                                                               tags
            pictures
                                                   pictureId          int
pictureId           int
                                                   tag                varchar
title               varchar

creationTimestamp   int
content             blob




             users
userId              int                   comments
name                varchar   pictureId           int

                              userId              int
                              txt                 varchar

                              creationTimestamp   int
let’s see some code ...

Contenu connexe

En vedette

Continuous Deployment
Continuous DeploymentContinuous Deployment
Continuous DeploymentBrian Moon
 
Memcached vs redis
Memcached vs redisMemcached vs redis
Memcached vs redisqianshi
 
Why Memcached?
Why Memcached?Why Memcached?
Why Memcached?Gear6
 
MongoDB for Beginners
MongoDB for BeginnersMongoDB for Beginners
MongoDB for BeginnersEnoch Joshua
 
Intro to NoSQL and MongoDB
Intro to NoSQL and MongoDBIntro to NoSQL and MongoDB
Intro to NoSQL and MongoDBDATAVERSITY
 
Microservices Platforms - Which is Best?
Microservices Platforms - Which is Best?Microservices Platforms - Which is Best?
Microservices Platforms - Which is Best?Payara
 
Intro To MongoDB
Intro To MongoDBIntro To MongoDB
Intro To MongoDBAlex Sharp
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBRavi Teja
 

En vedette (13)

Continuous Deployment
Continuous DeploymentContinuous Deployment
Continuous Deployment
 
Memcached vs redis
Memcached vs redisMemcached vs redis
Memcached vs redis
 
Why Memcached?
Why Memcached?Why Memcached?
Why Memcached?
 
Mongo db basics
Mongo db basicsMongo db basics
Mongo db basics
 
Mongo DB
Mongo DB Mongo DB
Mongo DB
 
MongoDB for Beginners
MongoDB for BeginnersMongoDB for Beginners
MongoDB for Beginners
 
Mongo DB
Mongo DBMongo DB
Mongo DB
 
Intro to NoSQL and MongoDB
Intro to NoSQL and MongoDBIntro to NoSQL and MongoDB
Intro to NoSQL and MongoDB
 
Mongo db
Mongo dbMongo db
Mongo db
 
Microservices Platforms - Which is Best?
Microservices Platforms - Which is Best?Microservices Platforms - Which is Best?
Microservices Platforms - Which is Best?
 
Mongo DB
Mongo DBMongo DB
Mongo DB
 
Intro To MongoDB
Intro To MongoDBIntro To MongoDB
Intro To MongoDB
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 

Similaire à ConFoo - Migrating To Mongo Db

Windows Azure Storage
Windows Azure StorageWindows Azure Storage
Windows Azure Storagegoodfriday
 
San Francisco Java User Group
San Francisco Java User GroupSan Francisco Java User Group
San Francisco Java User Groupkchodorow
 
MongoDB - Monitoring and queueing
MongoDB - Monitoring and queueingMongoDB - Monitoring and queueing
MongoDB - Monitoring and queueingBoxed Ice
 
MongoDB - Monitoring & queueing
MongoDB - Monitoring & queueingMongoDB - Monitoring & queueing
MongoDB - Monitoring & queueingBoxed Ice
 
Understanding Git - GOTO London 2015
Understanding Git - GOTO London 2015Understanding Git - GOTO London 2015
Understanding Git - GOTO London 2015Steve Smith
 

Similaire à ConFoo - Migrating To Mongo Db (7)

Windows Azure Storage
Windows Azure StorageWindows Azure Storage
Windows Azure Storage
 
San Francisco Java User Group
San Francisco Java User GroupSan Francisco Java User Group
San Francisco Java User Group
 
Tricks
TricksTricks
Tricks
 
MongoDB - Monitoring and queueing
MongoDB - Monitoring and queueingMongoDB - Monitoring and queueing
MongoDB - Monitoring and queueing
 
MongoDB - Monitoring & queueing
MongoDB - Monitoring & queueingMongoDB - Monitoring & queueing
MongoDB - Monitoring & queueing
 
Understanding Git - GOTO London 2015
Understanding Git - GOTO London 2015Understanding Git - GOTO London 2015
Understanding Git - GOTO London 2015
 
Git as NoSQL
Git as NoSQLGit as NoSQL
Git as NoSQL
 

Dernier

Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 

Dernier (20)

Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 

ConFoo - Migrating To Mongo Db

  • 1. Migrating to MongoDB Why we moved from MySQL to Mongo Getting to know Mongo Demo app using Mongo with PHP
  • 2.
  • 3. Reasons we looked for alternative to RDBM setup
  • 4. Issues with our RDBM setup Architecture was highly distributed, number of databases was becoming an issue Storing similar objects with different structure Options for scalability Storing files
  • 5. Many DBs In a MySQL server (with MyISAM)... 1 database = 1 directory 1 table = more than 1 file in DB directory Filesystem limits number of inodes per directory and it’s not that big Had a mix of MySQL with SQLite databases spreaded across directory hierarchy
  • 6. Many DBs In a Mongo server ... No 1:1 relation between databases and files Stores data set of files pre-allocated with increasing size Number of files grows as needed Using many collections within a single database allowed to move everything in DB server
  • 7. A “collection”? RDBM model: Database has tables which hold records Records in a table are identical Document-oriented storage Database has collections which hold documents
  • 8. Obj. with differing structure For example, events where attributes vary based on type of event Event A: from, att1 Event B: from, att1, att2 Event C: from, att3, att4 What’s your schema for this?
  • 9. tbl_events_A id from Att1 1 Jim 1237 2 Dave 362 tbl_events_C 3 Bob 9283 id from Att3 Att4 1 Bob hello 7249 tbl_events_B 2 Bill goodbye 23091 id from Att1 Att2 3 Jim testing 2334 1 Bill 2938 23 2 Jim 632 9 3 Hugh 12832 14
  • 10. tbl_events id type from Att1 Att2 Att3 Att4 1 A Jim 1237 NULL NULL NULL 2 A Dave 362 NULL NULL NULL 3 B Bill 2938 23 NULL NULL 4 C Bob NULL NULL hello 7249 5 A Bob 9283 NULL NULL NULL 6 C Bill NULL NULL goodbye 23091 7 B Jim 632 9 NULL NULL 8 B Hugh 12832 14 NULL NULL 9 C Jim NULL NULL testing 2334
  • 11. tbl_events id type from Attributes 1 A Jim “{‘att1’:1237}” 2 A Dave “{‘att1’:362}” 3 B Bill “{‘att1’:2938, ‘att2’:23}” 4 C Bob “{‘att3’:‘hello’, ‘att4’:7249}” 5 A Bob “{‘att1’:9283}” 6 C Bill “{‘att3’:‘goodbye’, ‘att4’:2391}” 7 B Jim “{‘att1’:632, ‘att2’:9}” 8 B Hugh “{‘att1’:12832, ‘att2’:14}” 9 C Jim “{‘att3’:‘testing’, ‘att4’:2334}”
  • 12. tbl_events tbl_events_attributes id type from id eventId name value 1 A Jim 1 1 att1 1237 2 A Dave 2 2 att1 362 3 B Bill 3 3 att1 2938 4 C Bob 4 3 att2 23 5 A Bob 5 4 att3 hello 6 C Bill 6 4 att4 7249 7 B Jim 7 5 att1 9283 8 B Hugh 8 6 att3 goodbye 9 C Jim 9 6 att4 2391 10 7 att1 632 11 7 att2 9 ...
  • 13. Obj. with differing structure Document-oriented storage link Mongo is schema-less 1 collection for all events Each document has the structure applicable for its type Can index common attributes for queries
  • 14. events collection : {id:1, type:’A’, from:‘Jim’, att1:1237} {id:2, type:’A’, from:‘Dave’, att1:362} {id:5, type:’A’, from:‘Bob’, att1:9238} {id:3, type:’B’, from:‘Bill’, att1:2938, att2:23} {id:7, type:’B’, from:‘Jim’, att1:632, att2:9} {id:8, type:’B’, from:‘Hugh’, att1:12832, att2:14} {id:4, type:’C’, from:‘Bill’, att3:‘hello’, att4:7249} {id:6, type:’C’, from:‘Jim’, att3:‘goodbye’, att4:23091} {id:9, type:’C’, from:‘Hugh’, att3:‘testing’, att4:2334}
  • 15. Options for scalability MySQL - Master-slave replication Mongo - Support master slave, replica pairs, master master and ... auto-sharding
  • 16. Storing files In MySQL, you can use a table with BLOB field and other field for file meta data Mongo has GridFS Built for storage of large objects Split into chunks, also stores metadata
  • 17. > db.fs.files.findOne(); { ! "_id" : ObjectId("4b9525096b00bd59b95f791f"), ! "filename" : "user.png", ! "length" : 43717, ! "chunkSize" : 262144, ! "uploadDate" : "Mon Mar 08 2010 11:25:45 GMT-0500 (EST)", ! "md5" : "3f6fcd4c0a51655d392fe95a99c29140", ! "mimeType" : "image/png" } > db.fs.chunks.findOne(); { ! "_id" : ObjectId("4b952509c568bb9fc8e3cddb"), ! "files_id" : ObjectId("4b9525096b00bd59b95f791f"), ! "n" : 0, ! "data" : BinData type: 2 len: 43721 }
  • 18. Getting to know MongoDB
  • 19. Basic concepts A database has collections which holds documents Documents in a collection can have any structure Documents are JSON objects, stored as BSON Data types: all basic JSON types: string, integer, boolean, double, null, array, object Special types: date, object id, binary, regexp, code
  • 20. Important differences Collections instead of tables ObjectID instead of primary keys References instead of foreign keys JavaScript code execution instead of stored procedures [NULL] instead of joins
  • 21. Inserting data > doc = { author: 'joe', created : new Date('03-28-2009'), title : 'Yet another blog post', text : 'Here is the text...', tags : [ 'example', 'joe' ], comments : [ { author: 'jim', comment: 'I disagree' }, { author: 'nancy', comment: 'Good post' } ] } > db.posts.insert(doc);
  • 22. Querying data > db.posts.find(); > db.posts.find({‘author’:‘joe’}); > db.posts.find({‘comments.author’:‘nancy’}); > db.posts.find({‘comments.comment’: /disagree/i }); > db.posts.findOne({‘comment.author’:‘nancy’}); > db.posts.find({‘comment.author’:‘nancy’}).limit(5); > db.posts.find({},{‘author’:true, ‘tags’:true}); > db.posts.find({‘author’:‘nancy’}).sort({‘created’:1});
  • 23. Querying - advanced features Support of OR conditions $ modifiers to introduce conditions > db.posts.find({timestamp: {$gte:1268149684}}); $where modifiers > db.pictures.find({$where: function() { return (this.creationTimestamp >= 1268149684) }}) MapReduce Server-side code execution
  • 24. > function getUniques() { ... var uniques = []; ... db.pictures.find({},{tags:true}).forEach(function(pic) { ... pic.tags.forEach(function(tag) { ... if (uniques.indexOf(tag) == -1) uniques.push(tag); ... }); ... }); ... return uniques; ... } > db.eval(getUniques); [ ! "firstTag", ! "thirdTag", ! "toto", ! "test", ! "comic", ! "secondTag" ]
  • 25. Updating data update( criteria, objNew, upsert, multi ) > db.myColl.update( { name: "Joe" }, { name: "Joe", age: 20 }, true, false ); save(object) - insert or update if _id exists
  • 26. Update modifier operators $inc, $set, $unset, $push, $pushAll, $addToSet, $pop, $pull, $pullAll > db.myColl.update({name:"Joe"}, { $set:{age:20}}); > db.posts.update({author:”Joe”},{$push:{tags:‘hockey’}}); > db.posts.update({},{$addToSet:{tags:‘hockey’}});
  • 27. Removing data > db.things.remove({}); // removes all > db.things.remove({n:1}); // removes all where n == 1 > db.things.remove({_id: myobject._id});
  • 28. References > p = db.postings.findOne(); { ! "_id" : ObjectId("4b866f08234ae01d21d89604"), ! "author" : "jim", ! "title" : "Brewing Methods" } > // get more info on author > db.users.findOne( { _id : p.author } ) { "_id" : "jim", "email" : "jim@gmail.com" }
  • 29. > x = { name : 'Biology' } { "name" : "Biology" } > db.courses.save(x) > x { "name" : "Biology", "_id" : ObjectId("4b0552b0f0da7d1eb6f126a1") } > stu = { name : 'Joe', classes : [ new DBRef('courses', x._id) ] } > db.students.save(stu) > stu { "name" : "Joe", "classes" : [ { "$ref" : "courses", "$id" : ObjectId("4b0552b0f0da7d1eb6f126a1") } ], "_id" : ObjectId("4b0552e4f0da7d1eb6f126a2") } > stu.classes[0] { "$ref" : "courses", "$id" : ObjectId("4b0552b0f0da7d1eb6f126a1") } > stu.classes[0].fetch() { "_id" : ObjectId("4b0552b0f0da7d1eb6f126a1"), "name" : "Biology" }
  • 30. Limitations to keep in mind Namespace limit (24 000 collections and indexes) Database size maxed to 2GB on 32-bit systems ... use a 64-bit production system!
  • 31. Licensing MongoDB is GNU AGPL 3.0, supported drivers re Apache License v2.0 From www.mongodb.org/display/DOCS/Licensing : If you are using a vanilla MongoDB server from either source or binary packages you have NO obligations. You can ignore the rest of this page.
  • 33. SQL schema tags pictures pictureId int pictureId int tag varchar title varchar creationTimestamp int content blob users userId int comments name varchar pictureId int userId int txt varchar creationTimestamp int
  • 34. let’s see some code ...