SlideShare une entreprise Scribd logo
1  sur  24
Living with SQL and
NoSQL at Craigslist
      Jeremy Zawodny
          craigslist
There is no stack
     anymore...
-- Mårten Mickos during Wednesday’s Keynote
Data Storage at craigslist
• MySQL
• Memcached
• Redis
• MongoDB
• Sphinx
• Filesystem
Choosing the Right Tool
• Durability
• Performance
• Query API
• Features
• Complexity
• Support
Request Flow (reads)
Browser                       Load Balancer                       Caching Proxy
         Posting, Search, Browse                                  Perl+epoll      Memcached

                                                                        Proxy Cache


     Web Server                                                   Async Services
Apache      mod_perl     Memcached                                Perl+epoll      Memcached

         Posting Cache


                                                                        haproxy


  MongoDB                                   Sphinx                        MySQL
 Archived Postings                   Live and Archived Postings           Live Postings
Request Flow (reads)
Browser                Load Balancer                   Caching Proxy
      Image Requests                                   Perl+epoll    Memcached

                                                             Proxy Cache




                            Image Storage
                        Apache   mod_perl   xfs+JBOD
Data Repositories
   MongoDB                      MySQL                 Filesystem
OldPostings   Email Meta    Postings      Finance    Images      Logs


                             Users       Misc Meta

                             Abuse      WorkQueue

                             Stats      Monitoring



                                     Redis
 Memcached                  Counters         Lists        Sphinx
 Counters      Postings      Blobs      Monitoring   Postings   Internal

  Blobs        Objects     WorkQueue                 Forums     Archive
MySQL at craigslist
•   Vertical Partitioning: Clusters
    •   auth/users, abuse/spam, postings, finance
•   Sub-partitioning: Roles
    •   master, read, long read, dumper, thrash
•   Lots of SSD storage (mostly fusion-io)
    •   solved most of our performance problems
•   Few manual tasks
    •   re-cloning slaves, master swaps
MySQL at craigslist
• MySQL 5.5.x
 • hoping to move to 5.6.x
    • GTID + crash-safe slaves?!?!
• InnoDB almost everywhere
 • InnoDB compression where it works well
 • Large buffer pool (48GB common)
• haproxy sits between clients and servers
MySQL at Craigslist
      Postings Database Cluster




                                       long read

                                                   long read




                                                                        dumper
                                                               thrash
   write




                                read
           read

                  read

                         read




                           haproxy

                           client(s)
Why MySQL?
•   It’s the devil we know!
    •   Very reliable
    •   Lots of Admin and Dev skills
•   Durability
•   Replication
•   Support
    •   Seriously, look at this ecosystem
•   Data Model
Why memcached?
• Wickedly Fast
• Stable
• Virtually zero administration required
• Easily co-exists with CPU-intensive services
• Muti-core? Run more instances!
Memcached at craigslist
• Primary cache for rendered pages
  (compresed and full), serialized objects, and
  misc. other data
• Used for lots of transient data blobs (and
  occasional counters)
• Custom async client library
 • Some key encoding issues
• Durability via client-side mirroring (think
  RAID-1)
Redis at craigslist
• Primary repository of posting activity
  metadata used in analysis tasks
• Remote replication in 2nd data center
• 80+% of data in sorted sets (ZSETS)
• Sharded multi-node cluster
 • See: http://bit.ly/I4XUCj
Why Redis?
• Features
• Performance
• Flexible Persistence
• Excellent but simple API
• Project Vision
• Muti-core? Run more instances!
MongoDB at craigslist
•   Repository of 2.5+ billion archived postings
    •   growing and growing and growing
•   3 shards across 3 node replica sets
    •   duplicate config in 2nd data center
•   ~6TB of data, sized up to 12TB
•   Biggest challenge was data migration
•   Previous talks:
    •   http://bit.ly/HEYJ57 (before)
    •   http://bit.ly/Hr2qMf (after)
Why MongoDB?
• Schema free
• Active community
• Commercial support
• Perl client!
• Ease of scaling
  • Yay! for built-in sharding support
• Fewer single points of failure
  • Replica sets are awesome
Sphinx at craigslist
• Full-text indexing and search of
 • all live postings
 • all archived postings
 • all forums (in progress)
• 300+ million daily queries
Why Sphinx?
• Performance
• Friendly API
• Flexibility in deployment model
• Commercial support
Filesystem at craigslist
• All uploaded images are stored in XFS
• Multiple image sizes, resized upon upload
Why Filesystem?
• Reliable (and Simple)
 • We use XFS for images and databases
 • Proven technology
• Fast
 • Some other filesystems have had
    performance issues
• Easy to move data around
• No other metadata/indexes to worry about
So Many Data Stores...
• Can be hard for developers if you don’t have
  good APIs or abstractions in place!
  • We built an object layer for our MongoDB
    migration
  • It speaks MySQL, Sphinx, MongoDB,
    Memcached
• Relational vs. Non-Relational?
  • In practice, we often just don’t care
  • NoSQL is a stupid label
Craigslist Tech FAQs
• Self-hosted (no virtualization or “cloud”)
• Mix of hardware (2 main vendors)
 • Blades
 • Larger multi-U multi-disk RAID boxes
• Mostly local storage (SAN for backups)
• Virtually all open source infrastructure
  tools
• Famously small (but growing) tech team
Craigslist is Hiring!
• Developers
 • Back-end
 • Front-end
• Systems Administrators
• Network Engineers
• Email: z@craiglist.org plain text resume!

Contenu connexe

Tendances

Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's includedJames Serra
 
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon RedshiftBDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon RedshiftAmazon Web Services
 
Scaling Instagram
Scaling InstagramScaling Instagram
Scaling Instagramiammutex
 
Enterprise Ready: A Look at Neo4j in Production
Enterprise Ready: A Look at Neo4j in ProductionEnterprise Ready: A Look at Neo4j in Production
Enterprise Ready: A Look at Neo4j in ProductionNeo4j
 
MongoDB Fundamentals
MongoDB FundamentalsMongoDB Fundamentals
MongoDB FundamentalsMongoDB
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)James Serra
 
Introducing MongoDB Atlas
Introducing MongoDB AtlasIntroducing MongoDB Atlas
Introducing MongoDB AtlasMongoDB
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineeringThang Bui (Bob)
 
Building robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumBuilding robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumTathastu.ai
 
Building Data Products with BigQuery for PPC and SEO (SMX 2022)
Building Data Products with BigQuery for PPC and SEO (SMX 2022)Building Data Products with BigQuery for PPC and SEO (SMX 2022)
Building Data Products with BigQuery for PPC and SEO (SMX 2022)Christopher Gutknecht
 
Collibra - Forrester Presentation : Data Governance 2.0
Collibra - Forrester Presentation : Data Governance 2.0Collibra - Forrester Presentation : Data Governance 2.0
Collibra - Forrester Presentation : Data Governance 2.0Guillaume LE GALIARD
 
Migrating to MongoDB: Best Practices
Migrating to MongoDB: Best PracticesMigrating to MongoDB: Best Practices
Migrating to MongoDB: Best PracticesMongoDB
 
Enterprise Master Data Architecture
Enterprise Master Data ArchitectureEnterprise Master Data Architecture
Enterprise Master Data ArchitectureBoris Otto
 
HIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopHIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopZheng Shao
 
The Zen of DataOps – AWS Lake Formation and the Data Supply Chain Pipeline
The Zen of DataOps – AWS Lake Formation and the Data Supply Chain PipelineThe Zen of DataOps – AWS Lake Formation and the Data Supply Chain Pipeline
The Zen of DataOps – AWS Lake Formation and the Data Supply Chain PipelineAmazon Web Services
 

Tendances (20)

Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's included
 
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon RedshiftBDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
BDA306 Building a Modern Data Warehouse: Deep Dive on Amazon Redshift
 
Scaling Instagram
Scaling InstagramScaling Instagram
Scaling Instagram
 
Enterprise Ready: A Look at Neo4j in Production
Enterprise Ready: A Look at Neo4j in ProductionEnterprise Ready: A Look at Neo4j in Production
Enterprise Ready: A Look at Neo4j in Production
 
MongoDB Fundamentals
MongoDB FundamentalsMongoDB Fundamentals
MongoDB Fundamentals
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
Introducing MongoDB Atlas
Introducing MongoDB AtlasIntroducing MongoDB Atlas
Introducing MongoDB Atlas
 
NoSql
NoSqlNoSql
NoSql
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineering
 
Building robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumBuilding robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and Debezium
 
Building Data Products with BigQuery for PPC and SEO (SMX 2022)
Building Data Products with BigQuery for PPC and SEO (SMX 2022)Building Data Products with BigQuery for PPC and SEO (SMX 2022)
Building Data Products with BigQuery for PPC and SEO (SMX 2022)
 
NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
 
Collibra - Forrester Presentation : Data Governance 2.0
Collibra - Forrester Presentation : Data Governance 2.0Collibra - Forrester Presentation : Data Governance 2.0
Collibra - Forrester Presentation : Data Governance 2.0
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data Engineering
 
Migrating to MongoDB: Best Practices
Migrating to MongoDB: Best PracticesMigrating to MongoDB: Best Practices
Migrating to MongoDB: Best Practices
 
Enterprise Master Data Architecture
Enterprise Master Data ArchitectureEnterprise Master Data Architecture
Enterprise Master Data Architecture
 
HIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopHIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on Hadoop
 
The Zen of DataOps – AWS Lake Formation and the Data Supply Chain Pipeline
The Zen of DataOps – AWS Lake Formation and the Data Supply Chain PipelineThe Zen of DataOps – AWS Lake Formation and the Data Supply Chain Pipeline
The Zen of DataOps – AWS Lake Formation and the Data Supply Chain Pipeline
 
Spark SQL
Spark SQLSpark SQL
Spark SQL
 

En vedette

Realtime Search Infrastructure at Craigslist (OpenWest 2014)
Realtime Search Infrastructure at Craigslist (OpenWest 2014)Realtime Search Infrastructure at Craigslist (OpenWest 2014)
Realtime Search Infrastructure at Craigslist (OpenWest 2014)Jeremy Zawodny
 
Lessons Learned Migrating 2+ Billion Documents at Craigslist
Lessons Learned Migrating 2+ Billion Documents at CraigslistLessons Learned Migrating 2+ Billion Documents at Craigslist
Lessons Learned Migrating 2+ Billion Documents at CraigslistJeremy Zawodny
 
Why Your MongoDB Needs Redis
Why Your MongoDB Needs RedisWhy Your MongoDB Needs Redis
Why Your MongoDB Needs RedisItamar Haber
 
Webinar - Approaching 1 billion documents with MongoDB
Webinar - Approaching 1 billion documents with MongoDBWebinar - Approaching 1 billion documents with MongoDB
Webinar - Approaching 1 billion documents with MongoDBBoxed Ice
 
Sphinx at Craigslist in 2012
Sphinx at Craigslist in 2012Sphinx at Craigslist in 2012
Sphinx at Craigslist in 2012Jeremy Zawodny
 
MySQL And Search At Craigslist
MySQL And Search At CraigslistMySQL And Search At Craigslist
MySQL And Search At CraigslistJeremy Zawodny
 
Craigslist by the Numbers
Craigslist by the NumbersCraigslist by the Numbers
Craigslist by the NumbersDevin Foley
 
Fulltext engine for non fulltext searches
Fulltext engine for non fulltext searchesFulltext engine for non fulltext searches
Fulltext engine for non fulltext searchesAdrian Nuta
 
PostgreSQL and Redis - talk at pgcon 2013
PostgreSQL and Redis - talk at pgcon 2013PostgreSQL and Redis - talk at pgcon 2013
PostgreSQL and Redis - talk at pgcon 2013Andrew Dunstan
 
Midas - on-the-fly schema migration tool for MongoDB.
Midas - on-the-fly schema migration tool for MongoDB.Midas - on-the-fly schema migration tool for MongoDB.
Midas - on-the-fly schema migration tool for MongoDB.Dhaval Dalal
 
Shopping Cart Optimization for eCommerce Web Sites
Shopping Cart Optimization for eCommerce Web SitesShopping Cart Optimization for eCommerce Web Sites
Shopping Cart Optimization for eCommerce Web SitesCharles Wiedenhoft
 
Fusion-io and MySQL at Craigslist
Fusion-io and MySQL at CraigslistFusion-io and MySQL at Craigslist
Fusion-io and MySQL at CraigslistJeremy Zawodny
 
Real time fulltext search with sphinx
Real time fulltext search with sphinxReal time fulltext search with sphinx
Real time fulltext search with sphinxAdrian Nuta
 
Managing Big Data with MySQL
Managing Big Data with MySQLManaging Big Data with MySQL
Managing Big Data with MySQLmwasaha mwagambo
 
Social Media Trends - Content Curation
Social Media Trends - Content CurationSocial Media Trends - Content Curation
Social Media Trends - Content CurationChris Mikulin
 
Sphinx - High performance full-text search for MySQL
Sphinx - High performance full-text search for MySQLSphinx - High performance full-text search for MySQL
Sphinx - High performance full-text search for MySQLNguyen Van Vuong
 
Apache Spark Streaming - www.know bigdata.com
Apache Spark Streaming - www.know bigdata.comApache Spark Streaming - www.know bigdata.com
Apache Spark Streaming - www.know bigdata.comknowbigdata
 

En vedette (20)

Realtime Search Infrastructure at Craigslist (OpenWest 2014)
Realtime Search Infrastructure at Craigslist (OpenWest 2014)Realtime Search Infrastructure at Craigslist (OpenWest 2014)
Realtime Search Infrastructure at Craigslist (OpenWest 2014)
 
Lessons Learned Migrating 2+ Billion Documents at Craigslist
Lessons Learned Migrating 2+ Billion Documents at CraigslistLessons Learned Migrating 2+ Billion Documents at Craigslist
Lessons Learned Migrating 2+ Billion Documents at Craigslist
 
Why Your MongoDB Needs Redis
Why Your MongoDB Needs RedisWhy Your MongoDB Needs Redis
Why Your MongoDB Needs Redis
 
Webinar - Approaching 1 billion documents with MongoDB
Webinar - Approaching 1 billion documents with MongoDBWebinar - Approaching 1 billion documents with MongoDB
Webinar - Approaching 1 billion documents with MongoDB
 
Sphinx at Craigslist in 2012
Sphinx at Craigslist in 2012Sphinx at Craigslist in 2012
Sphinx at Craigslist in 2012
 
MySQL And Search At Craigslist
MySQL And Search At CraigslistMySQL And Search At Craigslist
MySQL And Search At Craigslist
 
Craigslist by the Numbers
Craigslist by the NumbersCraigslist by the Numbers
Craigslist by the Numbers
 
Fulltext engine for non fulltext searches
Fulltext engine for non fulltext searchesFulltext engine for non fulltext searches
Fulltext engine for non fulltext searches
 
PostgreSQL and Redis - talk at pgcon 2013
PostgreSQL and Redis - talk at pgcon 2013PostgreSQL and Redis - talk at pgcon 2013
PostgreSQL and Redis - talk at pgcon 2013
 
Midas - on-the-fly schema migration tool for MongoDB.
Midas - on-the-fly schema migration tool for MongoDB.Midas - on-the-fly schema migration tool for MongoDB.
Midas - on-the-fly schema migration tool for MongoDB.
 
Red Box Commerce Shopping Cart
Red Box Commerce Shopping CartRed Box Commerce Shopping Cart
Red Box Commerce Shopping Cart
 
Shopping Cart Optimization for eCommerce Web Sites
Shopping Cart Optimization for eCommerce Web SitesShopping Cart Optimization for eCommerce Web Sites
Shopping Cart Optimization for eCommerce Web Sites
 
Tayra
TayraTayra
Tayra
 
Fusion-io and MySQL at Craigslist
Fusion-io and MySQL at CraigslistFusion-io and MySQL at Craigslist
Fusion-io and MySQL at Craigslist
 
SphinxSearch
SphinxSearchSphinxSearch
SphinxSearch
 
Real time fulltext search with sphinx
Real time fulltext search with sphinxReal time fulltext search with sphinx
Real time fulltext search with sphinx
 
Managing Big Data with MySQL
Managing Big Data with MySQLManaging Big Data with MySQL
Managing Big Data with MySQL
 
Social Media Trends - Content Curation
Social Media Trends - Content CurationSocial Media Trends - Content Curation
Social Media Trends - Content Curation
 
Sphinx - High performance full-text search for MySQL
Sphinx - High performance full-text search for MySQLSphinx - High performance full-text search for MySQL
Sphinx - High performance full-text search for MySQL
 
Apache Spark Streaming - www.know bigdata.com
Apache Spark Streaming - www.know bigdata.comApache Spark Streaming - www.know bigdata.com
Apache Spark Streaming - www.know bigdata.com
 

Similaire à Living with SQL and NoSQL at craigslist, a Pragmatic Approach

High Performance Drupal Sites
High Performance Drupal SitesHigh Performance Drupal Sites
High Performance Drupal SitesAbayomi Ayoola
 
High-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and JavaHigh-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and Javasunnygleason
 
Redis e Memcached - Daniel Naves - Omnilogic
Redis e Memcached - Daniel Naves - OmnilogicRedis e Memcached - Daniel Naves - Omnilogic
Redis e Memcached - Daniel Naves - OmnilogicFelipe Guimarães
 
My Sql And Search At Craigslist
My Sql And Search At CraigslistMy Sql And Search At Craigslist
My Sql And Search At CraigslistMySQLConference
 
MySQL Options in OpenStack
MySQL Options in OpenStackMySQL Options in OpenStack
MySQL Options in OpenStackTesora
 
OpenStack Days East -- MySQL Options in OpenStack
OpenStack Days East -- MySQL Options in OpenStackOpenStack Days East -- MySQL Options in OpenStack
OpenStack Days East -- MySQL Options in OpenStackMatt Lord
 
Microsoft Openness Mongo DB
Microsoft Openness Mongo DBMicrosoft Openness Mongo DB
Microsoft Openness Mongo DBHeriyadi Janwar
 
How does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsDataHow does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsDataacelyc1112009
 
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?DATAVERSITY
 
Getting started with Riak in the Cloud
Getting started with Riak in the CloudGetting started with Riak in the Cloud
Getting started with Riak in the CloudInes Sombra
 
High Performance Weibo QCon Beijing 2011
High Performance Weibo QCon Beijing 2011High Performance Weibo QCon Beijing 2011
High Performance Weibo QCon Beijing 2011Tim Y
 
ActiveMQ 5.9.x new features
ActiveMQ 5.9.x new featuresActiveMQ 5.9.x new features
ActiveMQ 5.9.x new featuresChristian Posta
 
NOSQL Meets Relational - The MySQL Ecosystem Gains More Flexibility
NOSQL Meets Relational - The MySQL Ecosystem Gains More FlexibilityNOSQL Meets Relational - The MySQL Ecosystem Gains More Flexibility
NOSQL Meets Relational - The MySQL Ecosystem Gains More FlexibilityIvan Zoratti
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitterRoger Xia
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...smallerror
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...xlight
 
Navigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesNavigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesshnkr_rmchndrn
 

Similaire à Living with SQL and NoSQL at craigslist, a Pragmatic Approach (20)

High Performance Drupal Sites
High Performance Drupal SitesHigh Performance Drupal Sites
High Performance Drupal Sites
 
High-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and JavaHigh-Performance Storage Services with HailDB and Java
High-Performance Storage Services with HailDB and Java
 
Redis e Memcached - Daniel Naves - Omnilogic
Redis e Memcached - Daniel Naves - OmnilogicRedis e Memcached - Daniel Naves - Omnilogic
Redis e Memcached - Daniel Naves - Omnilogic
 
Drop acid
Drop acidDrop acid
Drop acid
 
My Sql And Search At Craigslist
My Sql And Search At CraigslistMy Sql And Search At Craigslist
My Sql And Search At Craigslist
 
MySQL Options in OpenStack
MySQL Options in OpenStackMySQL Options in OpenStack
MySQL Options in OpenStack
 
OpenStack Days East -- MySQL Options in OpenStack
OpenStack Days East -- MySQL Options in OpenStackOpenStack Days East -- MySQL Options in OpenStack
OpenStack Days East -- MySQL Options in OpenStack
 
Microsoft Openness Mongo DB
Microsoft Openness Mongo DBMicrosoft Openness Mongo DB
Microsoft Openness Mongo DB
 
How does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsDataHow does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsData
 
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?
 
Getting started with Riak in the Cloud
Getting started with Riak in the CloudGetting started with Riak in the Cloud
Getting started with Riak in the Cloud
 
High Performance Weibo QCon Beijing 2011
High Performance Weibo QCon Beijing 2011High Performance Weibo QCon Beijing 2011
High Performance Weibo QCon Beijing 2011
 
ActiveMQ 5.9.x new features
ActiveMQ 5.9.x new featuresActiveMQ 5.9.x new features
ActiveMQ 5.9.x new features
 
NOSQL Meets Relational - The MySQL Ecosystem Gains More Flexibility
NOSQL Meets Relational - The MySQL Ecosystem Gains More FlexibilityNOSQL Meets Relational - The MySQL Ecosystem Gains More Flexibility
NOSQL Meets Relational - The MySQL Ecosystem Gains More Flexibility
 
Why ruby and rails
Why ruby and railsWhy ruby and rails
Why ruby and rails
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitter
 
Fixing_Twitter
Fixing_TwitterFixing_Twitter
Fixing_Twitter
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Navigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesNavigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skies
 

Dernier

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 

Dernier (20)

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 

Living with SQL and NoSQL at craigslist, a Pragmatic Approach

  • 1. Living with SQL and NoSQL at Craigslist Jeremy Zawodny craigslist
  • 2. There is no stack anymore... -- Mårten Mickos during Wednesday’s Keynote
  • 3. Data Storage at craigslist • MySQL • Memcached • Redis • MongoDB • Sphinx • Filesystem
  • 4. Choosing the Right Tool • Durability • Performance • Query API • Features • Complexity • Support
  • 5. Request Flow (reads) Browser Load Balancer Caching Proxy Posting, Search, Browse Perl+epoll Memcached Proxy Cache Web Server Async Services Apache mod_perl Memcached Perl+epoll Memcached Posting Cache haproxy MongoDB Sphinx MySQL Archived Postings Live and Archived Postings Live Postings
  • 6. Request Flow (reads) Browser Load Balancer Caching Proxy Image Requests Perl+epoll Memcached Proxy Cache Image Storage Apache mod_perl xfs+JBOD
  • 7. Data Repositories MongoDB MySQL Filesystem OldPostings Email Meta Postings Finance Images Logs Users Misc Meta Abuse WorkQueue Stats Monitoring Redis Memcached Counters Lists Sphinx Counters Postings Blobs Monitoring Postings Internal Blobs Objects WorkQueue Forums Archive
  • 8. MySQL at craigslist • Vertical Partitioning: Clusters • auth/users, abuse/spam, postings, finance • Sub-partitioning: Roles • master, read, long read, dumper, thrash • Lots of SSD storage (mostly fusion-io) • solved most of our performance problems • Few manual tasks • re-cloning slaves, master swaps
  • 9. MySQL at craigslist • MySQL 5.5.x • hoping to move to 5.6.x • GTID + crash-safe slaves?!?! • InnoDB almost everywhere • InnoDB compression where it works well • Large buffer pool (48GB common) • haproxy sits between clients and servers
  • 10. MySQL at Craigslist Postings Database Cluster long read long read dumper thrash write read read read read haproxy client(s)
  • 11. Why MySQL? • It’s the devil we know! • Very reliable • Lots of Admin and Dev skills • Durability • Replication • Support • Seriously, look at this ecosystem • Data Model
  • 12. Why memcached? • Wickedly Fast • Stable • Virtually zero administration required • Easily co-exists with CPU-intensive services • Muti-core? Run more instances!
  • 13. Memcached at craigslist • Primary cache for rendered pages (compresed and full), serialized objects, and misc. other data • Used for lots of transient data blobs (and occasional counters) • Custom async client library • Some key encoding issues • Durability via client-side mirroring (think RAID-1)
  • 14. Redis at craigslist • Primary repository of posting activity metadata used in analysis tasks • Remote replication in 2nd data center • 80+% of data in sorted sets (ZSETS) • Sharded multi-node cluster • See: http://bit.ly/I4XUCj
  • 15. Why Redis? • Features • Performance • Flexible Persistence • Excellent but simple API • Project Vision • Muti-core? Run more instances!
  • 16. MongoDB at craigslist • Repository of 2.5+ billion archived postings • growing and growing and growing • 3 shards across 3 node replica sets • duplicate config in 2nd data center • ~6TB of data, sized up to 12TB • Biggest challenge was data migration • Previous talks: • http://bit.ly/HEYJ57 (before) • http://bit.ly/Hr2qMf (after)
  • 17. Why MongoDB? • Schema free • Active community • Commercial support • Perl client! • Ease of scaling • Yay! for built-in sharding support • Fewer single points of failure • Replica sets are awesome
  • 18. Sphinx at craigslist • Full-text indexing and search of • all live postings • all archived postings • all forums (in progress) • 300+ million daily queries
  • 19. Why Sphinx? • Performance • Friendly API • Flexibility in deployment model • Commercial support
  • 20. Filesystem at craigslist • All uploaded images are stored in XFS • Multiple image sizes, resized upon upload
  • 21. Why Filesystem? • Reliable (and Simple) • We use XFS for images and databases • Proven technology • Fast • Some other filesystems have had performance issues • Easy to move data around • No other metadata/indexes to worry about
  • 22. So Many Data Stores... • Can be hard for developers if you don’t have good APIs or abstractions in place! • We built an object layer for our MongoDB migration • It speaks MySQL, Sphinx, MongoDB, Memcached • Relational vs. Non-Relational? • In practice, we often just don’t care • NoSQL is a stupid label
  • 23. Craigslist Tech FAQs • Self-hosted (no virtualization or “cloud”) • Mix of hardware (2 main vendors) • Blades • Larger multi-U multi-disk RAID boxes • Mostly local storage (SAN for backups) • Virtually all open source infrastructure tools • Famously small (but growing) tech team
  • 24. Craigslist is Hiring! • Developers • Back-end • Front-end • Systems Administrators • Network Engineers • Email: z@craiglist.org plain text resume!

Notes de l'éditeur

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n
  23. \n
  24. \n