1. Ceph at Salesforce
Sameer Tiwari - Principal Architect, Storage Cloud
stiwari@salesforce.com
@techsameer
https://www.linkedin.com/in/sameer-tiwari-1961311/
3/17/2017 - Ceph Day at San Jose
2. Data Types
Structured Customer Data: Mostly transactional data on RDBMS
Unstructured Customer Data: Immutable blobs on home grown distributed storage system
SAN usage across multiple use cases
Backups: Both commercial solutions and internal systems
Caching : Immutable structured blobs
Events : On HDFS (plus other systems along the way)
Logs : On HDFS (plus other systems along the way)
3. Storage Technologies Used
File Storage
NOSQL
HBase
HDFS
SAN
SDS (Software
Designed Store)
on scale-out
commodity
hardware
4. Uses for Ceph
Block Store
Backend for RDBMs (Maybe with BK for journal)
Various size (to >> local disk) mountable disk on the cloud
Re-mountable storage for VMs
Replace some SAN scenarios
Blob Store
General purpose blob store
Sharing of data across users
Examples : VM/Container images, Core Dumps, Large file transfer, Customer Data, IoT
6. Current Status
Experimenting with multiple small test clusters (~100 nodes)
Machines are generally with lots of RAM, few SSDs and a bunch of HDDs
Currently on a single 10G Network, moving to much bigger
Machines are spread across lots of racks, but in a single room (very little over provisioning)
Testing only rbd
Simple crushmap mods for creating SSD only pools, and availability zones
Very high magnitude of scale: multiple clusters, across multiple DC, each multi-tenant
Operationalize for a very different and challenging requirement
7. Performance numbers (using fio to provide test load)
SSD only pool with 12 machines, 2X12 CPU, 128G, 2X480G SSD
Random R / W for 8K blocks, 70/30 ratio
8. Performance numbers (using fio to provide test load)
SSD only pool with 12 machines, 2X12 CPU, 128G, 2X480G SSD
Sequential Write for 128K blocks
9. Performance numbers (using fio to provide test load)
SSD only pool with 12 machines, 2X12 CPU, 128G, 2X480G SSD
Random Read for 8K blocks
10. Experiments
Pre-work: Hookup metrics, logs and alerts to Salesforce Infrastructure
Fio perf on mounted client side block device with XFS
Testing lots and lots of failure scenarios (think chaos monkey)
More focus on slow devices (network, host, disk)
Crushmap settings for heterogenous environments (will build a tool to generate this
automatically)
Set up a CI/CD pipeline
Running Ceph in a dockerized environment with Kubernetes
Ability to patch a deployed cluster (OS, Docker, Ceph)
Going over the code, line by line
11. Future
Read from any replica (inconsistent reads should help in tail latency)
Can reads search the journal (should help in tail latency)
Need pluggability in RGW, there is a pre_exec() in rgw_op.cc OR
Extend the RGWHandler class, or use the pre_exec() call in RGWOp class
12. Challenges of Storage Services at Salesforce
Scale brings problems all its own - more hardware to fail or act funny, regular cap add, hw
changes
Multiple dimensions of multi-tenancy
External Customers (isolation, auth/encryption, security, perf, availability, durability, etc.)
Service supporting many many use cases and internal platforms
Running large # of clusters in large # of data centers