SlideShare une entreprise Scribd logo
1  sur  16
Télécharger pour lire hors ligne
Best Practices for Scaling Websites
           Lessons from eBay


                        Randy Shoup
                 eBay Distinguished Architect




QCon Asia 2009
Challenges at Internet Scale


• eBay manages …
  – 86.3 million active users worldwide
  – 120 million items for sale in 50,000 categories
  – Over 2 billion page views per day

    – eBay users trade over $2000 in goods every
      second -- $60 billion per year
    – eBay site stores over 2 PB of data
    – eBay processes 50 TB of new, incremental data
      per day
    – eBay Data Warehouse analyzes 50 PB per day

       • In a dynamic environment
           – 300+ features per quarter
           – We roll 100,000+ lines of code every two weeks
             • In 39 countries, in 8 languages, 24x7x365
                        >48 Billion SQL executions/day!

                                                              © 2009 eBay Inc.
Architectural Forces at Internet Scale

• Scalability
   – Resource usage should increase linearly (or better!) with load
   – Design for 10x growth in data, traffic, users, etc.
• Availability
   – Resilience to failure (MTBF)
   – Rapid recoverability from failure (MTTR)
   – Graceful degradation
• Latency
   – User experience latency
   – Data latency
• Manageability
   – Simplicity
   – Maintainability
   – Diagnostics
• Cost
   – Development effort and complexity
   – Operational cost (TCO)
                                                                      © 2009 eBay Inc.
Best Practices for Scaling



1. Partition Everything
2. Asynchrony Everywhere
3. Automate Everything
4. Remember Everything Fails
5. Embrace Inconsistency




                               © 2009 eBay Inc.
Best Practice 1: Partition Everything


• Split every problem into manageable chunks
   – By data, load, and/or usage pattern
   – “If you can’t split it, you can’t scale it”


• Motivations
   –   Scalability: can scale horizontally and independently
   –   Availability: can isolate failures
   –   Manageability: can decouple different segments and functional areas
   –   Cost: can use less expensive hardware




                                                                             © 2009 eBay Inc.
Best Practice 1: Partition Everything


Pattern: Functional Segmentation                                  Selling             Search              View Item

   – Segment processing into pools, services, and stages
   – Segment data along usage boundaries                                    Bidding            Checkout                Feedback




Pattern: Horizontal Split
   – Load-balance processing
       •   Within a pool, all servers are created equal
   – Split (or “shard”) data along primary access path              User               Item         Transaction
       •   Partition by range, modulo of a key, lookup, etc.




                                                                            Product            Account                Feedback
Corollary: No Session State
   – User session flow moves through multiple application pools
   – Absolutely no session state in application tier




                                                                                                                 © 2009 eBay Inc.
Best Practice 2: Asynchrony Everywhere

• Prefer Asynchronous Processing
   – Move as much processing as possible to asynchronous flows
   – Where possible, integrate disparate components asynchronously


• Motivations
   – Scalability: can scale components independently
   – Availability
      • Can decouple availability state
      • Can retry operations
   – Latency
      • Can significantly improve user experience latency at cost of data/execution latency
      • Can allocate more time to processing than user would tolerate
   – Cost: can spread peak load over time




                                                                                              © 2009 eBay Inc.
Best Practice 2: Asynchrony Everywhere


Pattern: Event Queue
   – Primary use-case produces event
       •   Create event (ITEM.NEW, ITEM.SOLD) transactionally
           with primary insert/update
   – Consumers subscribe to event
       •   At least once delivery
       •   No guaranteed order
       •   Idempotency and readback



Pattern: Message Multicast
   – Search Feeder publishes item updates
       •   Reads item updates from primary database
       •   Publishes sequenced updates via SRM-inspired protocol
   – Nodes listen to assigned subset of messages
       •   Update in-memory index in real time
       •   Request recovery (NAK) when messages
           are missed




                                                                   © 2009 eBay Inc.
Best Practice 3: Automate Everything

• Prefer Adaptive / Automated Systems to Manual Systems


• Motivations
   – Scalability
      • Can scale with machines, not humans
   – Availability / Latency
      • Can adapt to changing environment more rapidly
   – Cost
      • Machines are far less expensive than humans
      • Can learn / improve / adjust over time without manual effort




                                                                       © 2009 eBay Inc.
Best Practice 3: Automate Everything


Pattern: Adaptive Configuration                                               SLA 15 seconds
                                                                                               Consumer A
   – Define SLA for a given logical consumer
       •   E.g., 99% of events processed in 15 seconds
                                                                              SLA 30 seconds
   – Dynamically adjust config to meet defined SLA                                             Consumer B



                                                                              SLA 5 minutes
                                                                                               Consumer C




Pattern: Machine Learning
   – Dynamically adapt search experience
       •   Determine best inventory and assemble optimal page for that user
           and context
   – Feedback loop enables system to learn and improve over time
       •   Collect user behavior
       •   Aggregate and analyze offline
       •   Deploy updated metadata
       •   Decide and serve appropriate experience
   – Perturbation and dampening




                                                                                                 © 2009 eBay Inc.
Best Practice 4: Remember Everything Fails

• Build all systems to be tolerant of failure
   –   Assume every operation will fail and every resource will be unavailable
   –   Detect failure as rapidly as possible
   –   Recover from failure as rapidly as possible
   –   Do as much as possible during failure


• Motivation
   – Availability




                                                                                 © 2009 eBay Inc.
Best Practice 4: Remember Everything Fails


Pattern: Failure Detection
   – Servers log all requests
       •   Log all application activity, database and service calls on
           multicast message bus
       •   Over 2TB of log messages per day
   – Listeners automate failure detection and notification


Pattern: Rollback
   – Absolutely no changes to the site which cannot be undone (!)
   – Every feature has on / off state driven by central configuration
       •   Feature can be immediately turned off for operational or business reasons
       •   Features can be deployed “wired-off” to unroll dependencies



Pattern: Graceful Degradation
   – Application “marks down” an unavailable or distressed resource
   – Non-critical functionality is removed or ignored
   – Critical functionality is retried or deferred


                                                                                       © 2009 eBay Inc.
Best Practice 5: Embrace Inconsistency

• Brewer’s CAP Theorem
  – Any shared-data system can have at most two of the following properties:
     • Consistency: All clients see the same data, even in the presence of updates
     • Availability: All clients will get a response, even in the presence of failures
     • Partition-tolerance: The system properties hold even when the network is partitioned


  – This trade-off is fundamental to all distributed systems




                                                                                              © 2009 eBay Inc.
Best Practice 5: Embrace Inconsistency


Choose Appropriate Consistency Guarantees
   –   To guarantee availability and partition-tolerance, we trade off immediate consistency
   –   Most real-world systems (even financial systems!) do not require immediate consistency
   –   Consistency is a spectrum
   –   Prefer eventual consistency to immediate consistency




           Immediate                              Eventual                                No
          Consistency                            Consistency                          Consistency
         Bids, Purchases                Search Engine, Billing System, etc.             Preferences



Avoid Distributed Transactions
   – eBay does absolutely no distributed transactions – no two-phase commit
   – Minimize inconsistency through state machines and careful ordering of operations
   – Eventual consistency through asynchronous event or reconciliation batch



                                                                                                      © 2009 eBay Inc.
Recap: Best Practices for Scaling



1. Partition Everything
2. Asynchrony Everywhere
3. Automate Everything
4. Remember Everything Fails
5. Embrace Inconsistency




                                    © 2009 eBay Inc.
Questions?




 About the Presenter
 Randy Shoup has been the primary architect for eBay's search
 infrastructure since 2004. Prior to eBay, Randy was Chief Architect and
 Technical Fellow at Tumbleweed Communications, and has also held a
 variety of software development and architecture roles at Oracle and
 Informatica.
 rshoup@ebay.com




                                                                       © 2009 eBay Inc.

Contenu connexe

Tendances

Bp Introduction Pack
Bp Introduction PackBp Introduction Pack
Bp Introduction Packgilmour.w
 
Patterns of enterprise application architecture
Patterns of enterprise application architecturePatterns of enterprise application architecture
Patterns of enterprise application architecturethlias
 
ISD 6.3 and IBM i june 2012
ISD 6.3 and IBM i june 2012ISD 6.3 and IBM i june 2012
ISD 6.3 and IBM i june 2012COMMON Europe
 
Going beyond the pale 10292012
Going beyond the pale 10292012Going beyond the pale 10292012
Going beyond the pale 10292012bamboolibrarian
 
Presentation common task differences between sdmc and hmc
Presentation   common task differences between sdmc and hmcPresentation   common task differences between sdmc and hmc
Presentation common task differences between sdmc and hmcxKinAnx
 
4515 Modernize your CICS applications for Mobile and Cloud
4515 Modernize your CICS applications for Mobile and Cloud4515 Modernize your CICS applications for Mobile and Cloud
4515 Modernize your CICS applications for Mobile and Cloudnick_garrod
 
Considering The Cloud? Thinking Beyond The Readme File
Considering The Cloud? Thinking Beyond The Readme FileConsidering The Cloud? Thinking Beyond The Readme File
Considering The Cloud? Thinking Beyond The Readme FileBill Malchisky Jr.
 
Memory Matters in 2011
Memory Matters in 2011Memory Matters in 2011
Memory Matters in 2011Martin Packer
 
Accelerating Machine Learning Applications on Spark Using GPUs
Accelerating Machine Learning Applications on Spark Using GPUsAccelerating Machine Learning Applications on Spark Using GPUs
Accelerating Machine Learning Applications on Spark Using GPUsIBM
 
Smb Sme Virtualization_Bundles - EMC - Accel
Smb Sme Virtualization_Bundles - EMC - AccelSmb Sme Virtualization_Bundles - EMC - Accel
Smb Sme Virtualization_Bundles - EMC - Accelaccelfb
 

Tendances (17)

IBM zAware
IBM zAwareIBM zAware
IBM zAware
 
OMEGAMON XE for Mainframe Networks v5.3 Long presentation
OMEGAMON XE for Mainframe Networks v5.3 Long presentationOMEGAMON XE for Mainframe Networks v5.3 Long presentation
OMEGAMON XE for Mainframe Networks v5.3 Long presentation
 
Bp Introduction Pack
Bp Introduction PackBp Introduction Pack
Bp Introduction Pack
 
IBM OMEGAMON Performance Management Suite - Long Presentation
IBM OMEGAMON Performance Management Suite - Long PresentationIBM OMEGAMON Performance Management Suite - Long Presentation
IBM OMEGAMON Performance Management Suite - Long Presentation
 
Patterns of enterprise application architecture
Patterns of enterprise application architecturePatterns of enterprise application architecture
Patterns of enterprise application architecture
 
ISD 6.3 and IBM i june 2012
ISD 6.3 and IBM i june 2012ISD 6.3 and IBM i june 2012
ISD 6.3 and IBM i june 2012
 
Going beyond the pale 10292012
Going beyond the pale 10292012Going beyond the pale 10292012
Going beyond the pale 10292012
 
Presentation common task differences between sdmc and hmc
Presentation   common task differences between sdmc and hmcPresentation   common task differences between sdmc and hmc
Presentation common task differences between sdmc and hmc
 
4515 Modernize your CICS applications for Mobile and Cloud
4515 Modernize your CICS applications for Mobile and Cloud4515 Modernize your CICS applications for Mobile and Cloud
4515 Modernize your CICS applications for Mobile and Cloud
 
Oracle Advanced Benefits Webinar Slides
Oracle Advanced Benefits Webinar SlidesOracle Advanced Benefits Webinar Slides
Oracle Advanced Benefits Webinar Slides
 
OMEGAMON XE for Storage V530 Long client presentation
OMEGAMON XE for Storage V530 Long client presentationOMEGAMON XE for Storage V530 Long client presentation
OMEGAMON XE for Storage V530 Long client presentation
 
Considering The Cloud? Thinking Beyond The Readme File
Considering The Cloud? Thinking Beyond The Readme FileConsidering The Cloud? Thinking Beyond The Readme File
Considering The Cloud? Thinking Beyond The Readme File
 
IBM Capacity Management Analytics
IBM Capacity Management AnalyticsIBM Capacity Management Analytics
IBM Capacity Management Analytics
 
Compare Blade Servers
Compare Blade ServersCompare Blade Servers
Compare Blade Servers
 
Memory Matters in 2011
Memory Matters in 2011Memory Matters in 2011
Memory Matters in 2011
 
Accelerating Machine Learning Applications on Spark Using GPUs
Accelerating Machine Learning Applications on Spark Using GPUsAccelerating Machine Learning Applications on Spark Using GPUs
Accelerating Machine Learning Applications on Spark Using GPUs
 
Smb Sme Virtualization_Bundles - EMC - Accel
Smb Sme Virtualization_Bundles - EMC - AccelSmb Sme Virtualization_Bundles - EMC - Accel
Smb Sme Virtualization_Bundles - EMC - Accel
 

Similaire à Ebay架构原则

Qcon best practices for scaling websites
Qcon best practices for scaling websitesQcon best practices for scaling websites
Qcon best practices for scaling websitesyouzitang
 
eBay’s Challenges and Lessons
eBay’s Challenges and LessonseBay’s Challenges and Lessons
eBay’s Challenges and Lessonshutuworm
 
Best Practices for Large-Scale Websites -- Lessons from eBay
Best Practices for Large-Scale Websites -- Lessons from eBayBest Practices for Large-Scale Websites -- Lessons from eBay
Best Practices for Large-Scale Websites -- Lessons from eBayRandy Shoup
 
Best Practices for Large-Scale Web Sites
Best Practices for Large-Scale Web SitesBest Practices for Large-Scale Web Sites
Best Practices for Large-Scale Web SitesCraig Dickson
 
071310 sun d_0930_feldman_stephen
071310 sun d_0930_feldman_stephen071310 sun d_0930_feldman_stephen
071310 sun d_0930_feldman_stephenSteve Feldman
 
10 Tips for Your Journey to the Public Cloud
10 Tips for Your Journey to the Public Cloud10 Tips for Your Journey to the Public Cloud
10 Tips for Your Journey to the Public CloudIntuit Inc.
 
Azug - successfully breeding rabits
Azug - successfully breeding rabitsAzug - successfully breeding rabits
Azug - successfully breeding rabitsYves Goeleven
 
Goal driven performance optimization (Пётр Зайцев)
Goal driven performance optimization (Пётр Зайцев)Goal driven performance optimization (Пётр Зайцев)
Goal driven performance optimization (Пётр Зайцев)Ontico
 
S016579 data-optimization-spectrum-control-brazil-v2
S016579 data-optimization-spectrum-control-brazil-v2S016579 data-optimization-spectrum-control-brazil-v2
S016579 data-optimization-spectrum-control-brazil-v2Tony Pearson
 
Design principles & quality factors
Design principles & quality factorsDesign principles & quality factors
Design principles & quality factorsAalia Barbe
 
Backing up your virtual environment best practices
Backing up your virtual environment   best practicesBacking up your virtual environment   best practices
Backing up your virtual environment best practicesInterop
 
The Hard Problems of Continuous Deployment
The Hard Problems of Continuous DeploymentThe Hard Problems of Continuous Deployment
The Hard Problems of Continuous DeploymentTimothy Fitz
 
Integrating Private Cloud into Your Enterprise Session
Integrating Private Cloud into Your Enterprise SessionIntegrating Private Cloud into Your Enterprise Session
Integrating Private Cloud into Your Enterprise SessionMelissa Maheux
 
Cloud Economics for Java at Java2Days
Cloud Economics for Java at Java2DaysCloud Economics for Java at Java2Days
Cloud Economics for Java at Java2DaysSteve Poole
 
CIRCUIT 2015 - Akamai: Caching and Beyond
CIRCUIT 2015 - Akamai:  Caching and BeyondCIRCUIT 2015 - Akamai:  Caching and Beyond
CIRCUIT 2015 - Akamai: Caching and BeyondICF CIRCUIT
 
Petabytes and Nanoseconds
Petabytes and NanosecondsPetabytes and Nanoseconds
Petabytes and NanosecondsRobert Greiner
 
071510 sun b_1515_feldman_stephen_forpublic
071510 sun b_1515_feldman_stephen_forpublic071510 sun b_1515_feldman_stephen_forpublic
071510 sun b_1515_feldman_stephen_forpublicSteve Feldman
 
cse40822-CAP.pptx
cse40822-CAP.pptxcse40822-CAP.pptx
cse40822-CAP.pptxNedaaHamed1
 

Similaire à Ebay架构原则 (20)

Qcon best practices for scaling websites
Qcon best practices for scaling websitesQcon best practices for scaling websites
Qcon best practices for scaling websites
 
eBay’s Challenges and Lessons
eBay’s Challenges and LessonseBay’s Challenges and Lessons
eBay’s Challenges and Lessons
 
Best Practices for Large-Scale Websites -- Lessons from eBay
Best Practices for Large-Scale Websites -- Lessons from eBayBest Practices for Large-Scale Websites -- Lessons from eBay
Best Practices for Large-Scale Websites -- Lessons from eBay
 
Best Practices for Large-Scale Web Sites
Best Practices for Large-Scale Web SitesBest Practices for Large-Scale Web Sites
Best Practices for Large-Scale Web Sites
 
071310 sun d_0930_feldman_stephen
071310 sun d_0930_feldman_stephen071310 sun d_0930_feldman_stephen
071310 sun d_0930_feldman_stephen
 
10 Tips for Your Journey to the Public Cloud
10 Tips for Your Journey to the Public Cloud10 Tips for Your Journey to the Public Cloud
10 Tips for Your Journey to the Public Cloud
 
Azug - successfully breeding rabits
Azug - successfully breeding rabitsAzug - successfully breeding rabits
Azug - successfully breeding rabits
 
Goal driven performance optimization (Пётр Зайцев)
Goal driven performance optimization (Пётр Зайцев)Goal driven performance optimization (Пётр Зайцев)
Goal driven performance optimization (Пётр Зайцев)
 
S016579 data-optimization-spectrum-control-brazil-v2
S016579 data-optimization-spectrum-control-brazil-v2S016579 data-optimization-spectrum-control-brazil-v2
S016579 data-optimization-spectrum-control-brazil-v2
 
Master.pptx
Master.pptxMaster.pptx
Master.pptx
 
Design principles & quality factors
Design principles & quality factorsDesign principles & quality factors
Design principles & quality factors
 
Google Cloud Platform Certification Cloud Architect Exam Prep Review Virtual ...
Google Cloud Platform Certification Cloud Architect Exam Prep Review Virtual ...Google Cloud Platform Certification Cloud Architect Exam Prep Review Virtual ...
Google Cloud Platform Certification Cloud Architect Exam Prep Review Virtual ...
 
Backing up your virtual environment best practices
Backing up your virtual environment   best practicesBacking up your virtual environment   best practices
Backing up your virtual environment best practices
 
The Hard Problems of Continuous Deployment
The Hard Problems of Continuous DeploymentThe Hard Problems of Continuous Deployment
The Hard Problems of Continuous Deployment
 
Integrating Private Cloud into Your Enterprise Session
Integrating Private Cloud into Your Enterprise SessionIntegrating Private Cloud into Your Enterprise Session
Integrating Private Cloud into Your Enterprise Session
 
Cloud Economics for Java at Java2Days
Cloud Economics for Java at Java2DaysCloud Economics for Java at Java2Days
Cloud Economics for Java at Java2Days
 
CIRCUIT 2015 - Akamai: Caching and Beyond
CIRCUIT 2015 - Akamai:  Caching and BeyondCIRCUIT 2015 - Akamai:  Caching and Beyond
CIRCUIT 2015 - Akamai: Caching and Beyond
 
Petabytes and Nanoseconds
Petabytes and NanosecondsPetabytes and Nanoseconds
Petabytes and Nanoseconds
 
071510 sun b_1515_feldman_stephen_forpublic
071510 sun b_1515_feldman_stephen_forpublic071510 sun b_1515_feldman_stephen_forpublic
071510 sun b_1515_feldman_stephen_forpublic
 
cse40822-CAP.pptx
cse40822-CAP.pptxcse40822-CAP.pptx
cse40822-CAP.pptx
 

Plus de yiditushe

Spring入门纲要
Spring入门纲要Spring入门纲要
Spring入门纲要yiditushe
 
J Bpm4 1中文用户手册
J Bpm4 1中文用户手册J Bpm4 1中文用户手册
J Bpm4 1中文用户手册yiditushe
 
性能测试实践2
性能测试实践2性能测试实践2
性能测试实践2yiditushe
 
性能测试实践1
性能测试实践1性能测试实践1
性能测试实践1yiditushe
 
性能测试技术
性能测试技术性能测试技术
性能测试技术yiditushe
 
Load runner测试技术
Load runner测试技术Load runner测试技术
Load runner测试技术yiditushe
 
J2 ee性能测试
J2 ee性能测试J2 ee性能测试
J2 ee性能测试yiditushe
 
面向对象的Js培训
面向对象的Js培训面向对象的Js培训
面向对象的Js培训yiditushe
 
Flex3中文教程
Flex3中文教程Flex3中文教程
Flex3中文教程yiditushe
 
开放源代码的全文检索Lucene
开放源代码的全文检索Lucene开放源代码的全文检索Lucene
开放源代码的全文检索Luceneyiditushe
 
基于分词索引的全文检索技术介绍
基于分词索引的全文检索技术介绍基于分词索引的全文检索技术介绍
基于分词索引的全文检索技术介绍yiditushe
 
Lucene In Action
Lucene In ActionLucene In Action
Lucene In Actionyiditushe
 
Lucene2 4学习笔记1
Lucene2 4学习笔记1Lucene2 4学习笔记1
Lucene2 4学习笔记1yiditushe
 
Lucene2 4 Demo
Lucene2 4 DemoLucene2 4 Demo
Lucene2 4 Demoyiditushe
 
Lucene 全文检索实践
Lucene 全文检索实践Lucene 全文检索实践
Lucene 全文检索实践yiditushe
 
Lucene 3[1] 0 原理与代码分析
Lucene 3[1] 0 原理与代码分析Lucene 3[1] 0 原理与代码分析
Lucene 3[1] 0 原理与代码分析yiditushe
 
7 面向对象设计原则
7 面向对象设计原则7 面向对象设计原则
7 面向对象设计原则yiditushe
 
10 团队开发
10  团队开发10  团队开发
10 团队开发yiditushe
 
9 对象持久化与数据建模
9  对象持久化与数据建模9  对象持久化与数据建模
9 对象持久化与数据建模yiditushe
 
8 Uml构架建模
8  Uml构架建模8  Uml构架建模
8 Uml构架建模yiditushe
 

Plus de yiditushe (20)

Spring入门纲要
Spring入门纲要Spring入门纲要
Spring入门纲要
 
J Bpm4 1中文用户手册
J Bpm4 1中文用户手册J Bpm4 1中文用户手册
J Bpm4 1中文用户手册
 
性能测试实践2
性能测试实践2性能测试实践2
性能测试实践2
 
性能测试实践1
性能测试实践1性能测试实践1
性能测试实践1
 
性能测试技术
性能测试技术性能测试技术
性能测试技术
 
Load runner测试技术
Load runner测试技术Load runner测试技术
Load runner测试技术
 
J2 ee性能测试
J2 ee性能测试J2 ee性能测试
J2 ee性能测试
 
面向对象的Js培训
面向对象的Js培训面向对象的Js培训
面向对象的Js培训
 
Flex3中文教程
Flex3中文教程Flex3中文教程
Flex3中文教程
 
开放源代码的全文检索Lucene
开放源代码的全文检索Lucene开放源代码的全文检索Lucene
开放源代码的全文检索Lucene
 
基于分词索引的全文检索技术介绍
基于分词索引的全文检索技术介绍基于分词索引的全文检索技术介绍
基于分词索引的全文检索技术介绍
 
Lucene In Action
Lucene In ActionLucene In Action
Lucene In Action
 
Lucene2 4学习笔记1
Lucene2 4学习笔记1Lucene2 4学习笔记1
Lucene2 4学习笔记1
 
Lucene2 4 Demo
Lucene2 4 DemoLucene2 4 Demo
Lucene2 4 Demo
 
Lucene 全文检索实践
Lucene 全文检索实践Lucene 全文检索实践
Lucene 全文检索实践
 
Lucene 3[1] 0 原理与代码分析
Lucene 3[1] 0 原理与代码分析Lucene 3[1] 0 原理与代码分析
Lucene 3[1] 0 原理与代码分析
 
7 面向对象设计原则
7 面向对象设计原则7 面向对象设计原则
7 面向对象设计原则
 
10 团队开发
10  团队开发10  团队开发
10 团队开发
 
9 对象持久化与数据建模
9  对象持久化与数据建模9  对象持久化与数据建模
9 对象持久化与数据建模
 
8 Uml构架建模
8  Uml构架建模8  Uml构架建模
8 Uml构架建模
 

Ebay架构原则

  • 1. Best Practices for Scaling Websites Lessons from eBay Randy Shoup eBay Distinguished Architect QCon Asia 2009
  • 2. Challenges at Internet Scale • eBay manages … – 86.3 million active users worldwide – 120 million items for sale in 50,000 categories – Over 2 billion page views per day – eBay users trade over $2000 in goods every second -- $60 billion per year – eBay site stores over 2 PB of data – eBay processes 50 TB of new, incremental data per day – eBay Data Warehouse analyzes 50 PB per day • In a dynamic environment – 300+ features per quarter – We roll 100,000+ lines of code every two weeks • In 39 countries, in 8 languages, 24x7x365 >48 Billion SQL executions/day! © 2009 eBay Inc.
  • 3. Architectural Forces at Internet Scale • Scalability – Resource usage should increase linearly (or better!) with load – Design for 10x growth in data, traffic, users, etc. • Availability – Resilience to failure (MTBF) – Rapid recoverability from failure (MTTR) – Graceful degradation • Latency – User experience latency – Data latency • Manageability – Simplicity – Maintainability – Diagnostics • Cost – Development effort and complexity – Operational cost (TCO) © 2009 eBay Inc.
  • 4. Best Practices for Scaling 1. Partition Everything 2. Asynchrony Everywhere 3. Automate Everything 4. Remember Everything Fails 5. Embrace Inconsistency © 2009 eBay Inc.
  • 5. Best Practice 1: Partition Everything • Split every problem into manageable chunks – By data, load, and/or usage pattern – “If you can’t split it, you can’t scale it” • Motivations – Scalability: can scale horizontally and independently – Availability: can isolate failures – Manageability: can decouple different segments and functional areas – Cost: can use less expensive hardware © 2009 eBay Inc.
  • 6. Best Practice 1: Partition Everything Pattern: Functional Segmentation Selling Search View Item – Segment processing into pools, services, and stages – Segment data along usage boundaries Bidding Checkout Feedback Pattern: Horizontal Split – Load-balance processing • Within a pool, all servers are created equal – Split (or “shard”) data along primary access path User Item Transaction • Partition by range, modulo of a key, lookup, etc. Product Account Feedback Corollary: No Session State – User session flow moves through multiple application pools – Absolutely no session state in application tier © 2009 eBay Inc.
  • 7. Best Practice 2: Asynchrony Everywhere • Prefer Asynchronous Processing – Move as much processing as possible to asynchronous flows – Where possible, integrate disparate components asynchronously • Motivations – Scalability: can scale components independently – Availability • Can decouple availability state • Can retry operations – Latency • Can significantly improve user experience latency at cost of data/execution latency • Can allocate more time to processing than user would tolerate – Cost: can spread peak load over time © 2009 eBay Inc.
  • 8. Best Practice 2: Asynchrony Everywhere Pattern: Event Queue – Primary use-case produces event • Create event (ITEM.NEW, ITEM.SOLD) transactionally with primary insert/update – Consumers subscribe to event • At least once delivery • No guaranteed order • Idempotency and readback Pattern: Message Multicast – Search Feeder publishes item updates • Reads item updates from primary database • Publishes sequenced updates via SRM-inspired protocol – Nodes listen to assigned subset of messages • Update in-memory index in real time • Request recovery (NAK) when messages are missed © 2009 eBay Inc.
  • 9. Best Practice 3: Automate Everything • Prefer Adaptive / Automated Systems to Manual Systems • Motivations – Scalability • Can scale with machines, not humans – Availability / Latency • Can adapt to changing environment more rapidly – Cost • Machines are far less expensive than humans • Can learn / improve / adjust over time without manual effort © 2009 eBay Inc.
  • 10. Best Practice 3: Automate Everything Pattern: Adaptive Configuration SLA 15 seconds Consumer A – Define SLA for a given logical consumer • E.g., 99% of events processed in 15 seconds SLA 30 seconds – Dynamically adjust config to meet defined SLA Consumer B SLA 5 minutes Consumer C Pattern: Machine Learning – Dynamically adapt search experience • Determine best inventory and assemble optimal page for that user and context – Feedback loop enables system to learn and improve over time • Collect user behavior • Aggregate and analyze offline • Deploy updated metadata • Decide and serve appropriate experience – Perturbation and dampening © 2009 eBay Inc.
  • 11. Best Practice 4: Remember Everything Fails • Build all systems to be tolerant of failure – Assume every operation will fail and every resource will be unavailable – Detect failure as rapidly as possible – Recover from failure as rapidly as possible – Do as much as possible during failure • Motivation – Availability © 2009 eBay Inc.
  • 12. Best Practice 4: Remember Everything Fails Pattern: Failure Detection – Servers log all requests • Log all application activity, database and service calls on multicast message bus • Over 2TB of log messages per day – Listeners automate failure detection and notification Pattern: Rollback – Absolutely no changes to the site which cannot be undone (!) – Every feature has on / off state driven by central configuration • Feature can be immediately turned off for operational or business reasons • Features can be deployed “wired-off” to unroll dependencies Pattern: Graceful Degradation – Application “marks down” an unavailable or distressed resource – Non-critical functionality is removed or ignored – Critical functionality is retried or deferred © 2009 eBay Inc.
  • 13. Best Practice 5: Embrace Inconsistency • Brewer’s CAP Theorem – Any shared-data system can have at most two of the following properties: • Consistency: All clients see the same data, even in the presence of updates • Availability: All clients will get a response, even in the presence of failures • Partition-tolerance: The system properties hold even when the network is partitioned – This trade-off is fundamental to all distributed systems © 2009 eBay Inc.
  • 14. Best Practice 5: Embrace Inconsistency Choose Appropriate Consistency Guarantees – To guarantee availability and partition-tolerance, we trade off immediate consistency – Most real-world systems (even financial systems!) do not require immediate consistency – Consistency is a spectrum – Prefer eventual consistency to immediate consistency Immediate Eventual No Consistency Consistency Consistency Bids, Purchases Search Engine, Billing System, etc. Preferences Avoid Distributed Transactions – eBay does absolutely no distributed transactions – no two-phase commit – Minimize inconsistency through state machines and careful ordering of operations – Eventual consistency through asynchronous event or reconciliation batch © 2009 eBay Inc.
  • 15. Recap: Best Practices for Scaling 1. Partition Everything 2. Asynchrony Everywhere 3. Automate Everything 4. Remember Everything Fails 5. Embrace Inconsistency © 2009 eBay Inc.
  • 16. Questions? About the Presenter Randy Shoup has been the primary architect for eBay's search infrastructure since 2004. Prior to eBay, Randy was Chief Architect and Technical Fellow at Tumbleweed Communications, and has also held a variety of software development and architecture roles at Oracle and Informatica. rshoup@ebay.com © 2009 eBay Inc.