Swift at Scale: The IBM SoftLayer Story

Swift at Scale: 
The IBM SoftLayer Story
Brian Cline, Object Storage Development Lead
OpenStack Summit • Ocata series
2016.10.25 Barcelona, Spain
twitter/irc: @briancline

Our History with Public Object Storage
• 2012 — First three clusters go live (DAL, AMS, SNG)
• 2014 — Dedicated development team
• 2014 — Launch 11 clusters in new datacenters
• 2015 — Launch 5 clusters in new DCs
• 2015 — Product integrations with IBM Bluemix
• 2016 — Launch 3 clusters in new DCs 
(and expand an existing cluster into multiple DCs)

2012: When things were [mostly] simpler…
• 7-10 nodes in each cluster
• Two node types
• Proxy
• Data - account, container, object services
• Load balancer
• FreeBSD with ZFS ⚠ Do not attempt.
• No centralized logs
• No log analysis tools

2016: Adjusted for scale (blood, sweat, tears, dreams, starlight…)
• Up to hundreds of nodes per cluster
• Three node types
• Proxy
• Meta - account and container services
• Data - object services
• Load balancer cluster
• Debian Linux
• Centralized and searchable logs
• Analytics via Spark and Hadoop

22 Swift Clusters
24 Datacenters
16 Countries

Tens of Billions of Objects
7 Million Containers
Hundreds of thousands of Swift Accounts

90 PB of Capacity
Thousands of Nodes
40,000+ Disks

Tens of thousands requests per second
GET HEAD PUT DELETE
(with notable variability between clusters)

Hardware we like
• Supermicro 36-disk chassis
• 12-16 physical cores 
(24-32 HT cores)
• 128GB RAM for proxies
• 256GB RAM for data nodes
• 10Gbps NICs (separate API vs.
storage/replication networks)
• 3 - 4 TB disks
• Controller card
• 2 disks for OS (RAID1)
• 1 disk for OS hotswap
• 4 disks for SSD caching
• 29 disks for data storage
• Usually expand by ½-row or a 
full row at a time

Our Stack — Software
OS Debian
Base Swift (duh) — sometimes with backports
Authentication
Swauth — some internal patches and enhancements
Keystone (APIv3) — starting with Bluemix accounts
Metadata Search Elasticsearch
Monitoring &
Logging
collectd, Nagios, Capacity Dashboard
Logstash, Kibana, Graphite, Grafana
slogging
Automation Chef, Jenkins, Fabric

Our Stack — Custom Middlewares
• CDN operations (purge, load, CNAMEs, TTL, compression, etc.)
• CDN origin pull
• Search indexer (on successful PUT/POST/DELETE)
• Search query operations
• Checkpoint (account enable/disable/etc. abilities for resellers)
• Internal management (sysmeta read/write, proxy-level recon)

Lessons Learned: Automation
• Make automation a must-have, day-one deliverable
• Never launch something new without test/deploy automation
• Must work across all environments (dev, QA, UAT/staging, prod)
• Automation needs tests and metrics, too — it is code!
• Functional testing should be an automated part of every deploy
• Remember your orchestration (knowledge of Swift zones)

Lessons Learned: Monitoring
• Scale test any monitoring/logging infrastructure you put into place
• Very obvious stuff:
• Space and IOPS, errors from SMART/XFS/kernel/controller, etc.
• HTTP response code aggregates, latency aggregates by verb, etc.
• Swift metrics:
• If nothing else, async pendings
• Replicator failures and partitions/sec rates
• Replicator last completion timestamp vs. ring push timestamp

Lessons Learned: Monitoring
• Any middleware you create needs to emit ops metrics
• New features benefit from emitting usage metrics
• Don’t forget debug-level log messages
• Automatic checks for precipitating conditions that lead to failures 
(not just for the error log lines that result from them afterwards)

Lessons Learned: Rebalancing
• Keep tabs on your rebalance times 
(and keep them small when possible)
• Coordinate rebalances around node/cluster maintenance
• Don’t let IOPS levels grow too high before expanding capacity
• Customer IOPS vs. Replicator & Auditor IOPS — know your limits

Lessons Learned: Swift
• Use 256 byte inode sizes (or the smallest you can get away with)
• Using swauth? Use an SSD storage policy for AUTH_.auth containers
• Namespace any custom API additions (and be consistent)
• When possible, ask community about new middleware thoughts
• Upstream is important! Stay involved and give back when possible

Swift at Scale: The IBM SoftLayer Story

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (17)

Similaire à Swift at Scale: The IBM SoftLayer Story

Similaire à Swift at Scale: The IBM SoftLayer Story (20)

Dernier

Dernier (20)

Swift at Scale: The IBM SoftLayer Story