WorthPoint is planning to scale its website to handle 3 million page views per day by the end of 2008. This will require scaling various systems including the Drupal content management system, databases, application and image servers, and search servers. WorthPoint will focus on optimizing Drupal modules, developing custom modules, database sharding, load balancing servers, caching, and using content delivery networks. While additional analysis is needed, WorthPoint's current software should be able to scale to millions of daily page views through hardware additions. The challenges will be scaling cost effectively and replicating systems across multiple data centers worldwide.
1. D
R
A
F Scaling the WorthPoint web site
T
To 3,000,000 page views a day
2. Starting Point
• WorthPoint is built on the Drupal Content
Management System
D
R • Drupal is
A – Open source
F – LAMP (Linux, Apache, MySQL, PHP)
T – Used by The Onion, MTV-UK, and LifetimeTV,
among many others
• Professional support available from Acquia starting in
the latter half of 2008
3. Background
• Drupal is an open source product that is being
positioned to compete with enterprise class CMS
D
products
R
• The Drupal community is working on multiple
A
performance and scaling tasks
F
• WorthPoint is planning for 5M unique web pages and
T
3M page views (30/70 mix of dynamic/static) a day by
the end of 2008
– This positions WorthPoint as one of the larger
Drupal web sites
4. Solution Clusters
• Drupal core / module rewrites & updates
D • WorthPoint specific module development
R • Database server scaling
A • Application server scaling
F
• Image server scaling
T
• Search server scaling
• NOTE: These solutions are for a single data center
based system
5. Drupal Core & Modules
• WorthPoint uses 35 core modules and 225 community
contributed modules
D
R • WorthPoint currently uses Drupal v5.1
A – V6.0 has been released but the community has not
F updated many of the modules used by WorthPoint
T – Acquia is working on “Carbon”, a fully tested and
certified version of the 35 core modules
• Many Drupal v5.1 modules need tweaks to work at the
current level of WorthPoint content and traffic
6. WorthPoint Module Development
• A vast majority of WorthPoint content and page views
are in the Worthopedia, Auctions, Classified, and
D
Taxonomy areas
R
• Ground up modules designed and developed by
A
WorthPoint for these four areas would significantly
F reduce the load on the WorthPoint servers
T
• Initial design work is in progress
– Database?
– Language – C, Java, PHP?
7. Database Server Scaling
• MySQL reliably supports Master-Slave replication
– Master is the INSERT, UPDATE, DELETE database; Slaves
D
are SELECT-only
R
– Current WorthPoint code allows one database slave to
A support 50,000 page views a day1; end of 2008 goal is
F 100,000 page views a day per DB slave2
T – This means WorthPoint will have roughly 30 DB slaves at
3M page views a day with the current Drupal code
• Beginning to partition / shard the database
1. At the current mix of 20% dynamic and 80% static page views
2. At an anticipated mix of 30% dynamic and 70% static page views
8. Application Server Scaling
• Load balance multiple application servers
D • Move from Zend to Quercus
R • Simple web objects cached with Squid
A • Complex web objects cached with memcached
F
• Misc
T
– User sessions moved to memory
– Database connection pooling
– Use content delivery network for CSS & Javascript
9. Image Server Scaling
• Images on a SAN
D • Two load balanced image servers
R • Use a content delivery network
A – Limelight (http://www.limelightnetworks.com/)
F
T
11. Wildcards that may help with performance
• Cloud based solutions
D • Falcon Storage Engine for MySQL – may be ready
R for prime time before the end of the year
A • A major computer industry player makes a
F commitment to Drupal and brings significant
T resources to bear
12. Summary
• While there is some additional analysis and work to
D be done, WorthPoint’s current software will scale to
R millions of page views a day with the addition of
A hardware
F • The current challenge is to cost effectively scale
T • The next challenge is to replicate and load balance
the WorthPoint systems across multiple
geographically distributed data centers e.g. U.S. East
Coast, U.S. West Coast, Europe, the Far East, etc
13. Contact
Andy Forbes, CTO
D
andy.forbes@worthpoint.com
R
A
Marc Benton, Director of Product Development
F
marc.benton@worthpoint.com
T
Arman Anwar, Director of Systems Development
arman.anwar@worthpoint.com