Designing for the Cloud: A Practical Guide

Designing for the Cloud:A Tutorial Stuart Charlton, CTO, Elastra

Tutorial Objectives What has cloud computing done to IT systems design & architecture? “The future is already here, it’s just not very evenly distributed” (Gibson) How should new systems be designed with the new constraints? Such as: parallelism, availability, on demand infra Where can I find are practical frameworks, tools, and techniques, and what are the tradeoffs? Hadoop, Cassandra, Parallel DBs, Actors, Caches, Containers, and Configuration Management

About Your Presenter Stuart Charlton ,[object Object],CTO, Elastra ,[object Object],In prior lives... ,[object Object],RESTafarian and Data geek Stu Says Stuffhttp://stucharlton.com/blog

Tutorial Agenda, in 4 Words Clouds Service Data Control 4

Agenda – Part 1 Clouds: Fear of a Fluffy Planet What has changed, and what remains the same? Designing applications in this world A Cloud Design Reference Architecture (aka. A cheat sheet to categorize thinking in the clouds) Service: Foundations for Systems Solving Big Problems vs. Little Problems Amdahl’s Law & The Universal Scalability Law Actor-Based Concurrency: Dr. Strangelanguage, (or How I Learned to Stop Worrying and Love Erlang)

Agenda – Part 2 Data: Management & Access Contrasting Philosophies Persistence vs. Management; Scale-Up vs. Scale-Out Shared Disk vs. Shared Nothing A survey of solutions (from clustered DBMS to K/V stores) Consistency, Availability, Partitioning (CAP) Tradeoffs Deep dig into what these really imply Control: Containers, Configuration & Modeling The Dev/Ops Tennis Match The Evolution of Automation From Scripts to Runbooks to FSMs to HTNs

Caveats Audience Assumption: IT Devs & Architects Some exposure to cloud, but not necessarily advanced The technology is a fast moving target Especially state of the specific tools & frameworks Theory vs. practice I try to balance the two; both are essential Time is limited Only scratching the surface of certain topics Missing topics are usually full tutorials in their own right Much of the subject matter is up for debate And, this is a tutorial, not a workshop…. 

Clouds Fear of a Fluffy Planet 8

(court (Courtesy of browsertoolkit.com)

The Freedom! On Demand Infrastructure via API calls Inside or outside my data centres (Private / Public Cloud) Pay-per-use pricing models Great for temporary growth needs Platform-as-a-Service Scalability without Skill, Availability without Avarice Large Scale, Always On New opportunities due to cheaper scale & availability

The Horror! Hype Overdrive Cloud Running Shoes! Cloud Chewing Gum! GOOG!Werner Vogels Action Figures! (well, not quite yet) Standards Support So many to choose from! OCCI, vCloud + OVF, EC2, WBEM, WS-Management Platform-as-a-Service What color would you like for your locked trunk’s interior? Crazy Talk No SQL! Eventual Consistency! Infrastructure as Code!

Will the Real Slim Cloudy Please Stand Up? “I, for one, welcome our new outsourced overlords” Finer-grained outsourcing Metered resource usage APIs & self-service UIs … but isn’t outsourcing often a shell game? See Distributed Computing Economics, Jim Gray (2003) “Scale without skill, availability without avarice” Insert constrained code [here] Magically scalable & available GAE, Azure (some day) … but aren’t you locked in?

Will the Real Slim Cloudy Please Stand Up? “I like Big *aaS and I cannot lie” “My name is… what? Slim Cloudy!” Private, Public, or Community Clouds Multiple stack levels “Real” SOA, not just web services … haven’t I heard this before? Reduced lead times to change Agile Operations / Lean IT Revolution in systems management … can we really change IT?

Designing Applications in this World Distributed & networked systems have triumphed The fallacies must be taken seriously now Network is unreliable, latency > 0, bandwidth is finite, topology might change, etc. Scale-out & fault tolerance: the new design center Versus productive business logic, data management, etc. What’s old is new Some challengers to mainstream ideas are old ideas being reapplied e.g. Erlang, Map/Reduce, distributed file systems, replication

Designing Applications in this World Autonomous services constitute most systems Full-stack services, not just bits of code Design for constant operations Interdependence + Distribution + Autonomy = Pain FCAPS (Fault, Configuration, Accounting, Performance & Security Management) Security & Privacy Multi-tenancy, data-in-transit vs. data-at-rest, etc.

Solving for one’s own problems Mainstream tools, platforms, and servers have not consistently caught up LOTS of software experimentation in: Web servers, containers, caches, databases, network configuration, systems management The danger is to view new solutions as the better way of doing things in general It’s possible; but stuff is changing quickly New territory always involves a level of reinvention The tech world has not rebooted due to cloud computing Beware Fanbois/Fangrrls, Pundits & The Press

A Cloud Design Reference Architecture Web – WebArch & REST Service, Data,& Control – this tutorial Resource –virtualization,management &infrastructure clouds WEB SERVICE DATA CONTROL RESOURCE

Service Organizing your computing domain for fault scale management WEB SERVICE DATA CONTROL RESOURCE

Data Storage, retrieval,integrity, recovery given Distributed systems Large scale High availability (possible) Multi-tenancy WEB SERVICE DATA CONTROL RESOURCE

Control Provision, configuration, governance, and optimization of infrastructure Resource brokerage Policy constraints Dependency management Software configuration Authorization & Auditability WEB SERVICE DATA CONTROL RESOURCE

Service Foundation for Systems

Designing a Service, circa 1998-2008 Multi-Tier Hybrid Architecture Some stateless, some stateful computing Session state is replicated Independent servers / applications Low-level redundancy (RAID, 2x NICs, etc.) “Put your eggs into a small number of baskets, and watch those baskets” General assumptions Failure at the service layer shouldn’t lead to downtime Failure at the data layer may be catastrophic

Designing a Service, circa 2008+ Autonomous services Divide system into areas of functional responsibility (tiers irrelevant) Interdependent servers / applications Software-level redundancy andfault handling “Many, many servers breaking big problems down or distributinglots of little problems around” New realities Partial failure is a regular, normal occurrence; no excuse for downtime from any service

Breaking or bridging a problem across resources Big Problems (Parallel) Theory:Amdahl’s lawShared memory or disk vs. Shared nothing New Practice:MapReduce (e.g. Hadoop), Spaces, Master/Worker Retro: Linda, MPI, OpenMP, IPC or Threads Little Problems (Concurrent) Theory: Actor-model & process calculi New Practice: Lightweight Messaging, Spaces, Erlang & Scala Actors Retro: IPC, Thread pools,Components (COM+/EJB),Big Messaging (MQ, TIB, JMS)

Case Study in “Big Problem” Solving:MapReduce & Apache Hadoop Input Read your data from files as a K/V map Distribute Mapping Function Input one (k,v) pair returns new K/V list Partition & Sort Handled by framework (eg. Hadoop) Provide a comparator Distribute Reduce Function Input one (k, list of values) pair Return a list of output values Output Save the list as a file

….But how fast can I get?Theory Interlude: Amdahl’s Law How fast can I speed up a sequential process? Time = Serial part + Parallel part Thus, the speed up is Where P is the % of the program that can be parallel N is the number of processors What happens when P is 95%? -- Maximum of 20x How about 99.99%?

Gunther’s Universal Scalability Law It gets worse… Most scale-outexperiencesretrogradebehavior at peak loads Capacity(N) = N 1 + α (N − 1) + β N (N − 1) α is the contention β is the coherency delay http://www.perfdynamics.com/Manifesto/gcaprules.html

Case study in solving “little problems”Actors: The Basic Idea Programmable entities are concurrent, share nothing, communicate through messages Actors can Send messages Create other actors Specify how it responds to messages Very lightweight (actors = objects) Usually no ordering guarantees At the language level

ErlangSupervisors: Assuming failure will occur Failures require cleanup & restart Supervisor relationships canensure the systemtolerates faults Hot-swap patches Fundamentally inthe language libraries

What kinds of failures? A Simplification. Exceptional Conditions Conditions that a programmer did not or should not handle Tolerated through replication, fast failure, and/or restart(s) Examples Hardware failures, network outages, “Heisenbugs”, rare software conditions Conditions that the programmer can handle Handled through cleanup or “catch” code Examples File not found, type conversion, bad arithmetic (divide by zero),malformed input Error Conditions

Evolving the Database: Two Philosophies Data Persistence Systemsand Frameworks Database Management Systems(DBMS) Goal: Store & retrieve data quickly, reliable, with minimal hassle to the programmer Often uses application tools & languages to manage & access data Focused set of features Goal: Manage the access, integrity, security, and reliability of data, independently of applications Hard separation of tools & languages (e.g. SQL, DBA tools) Broad set of features

Scaling the Database: Two Philosophies Scale-Up Scale-Out Concurrent processing & parallelism through hardware SMP, NUMA, MPP RAID Arrays (SAN & NAS) Shared disk or memory Benefit: It worked in the 90s. Drawback: Expensive, often bespoke, forklift upgrades Concurrent processing & parallelism through software Commodity hardware Software provides the engine Shared nothing Benefit: Linear scale, easy to standardize, easy to replicate / upgrade Drawback: Traditionally, the software sucked. 33

… What happens when database clustering software stops sucking? (i.e. now) A flurry of programmer-oriented approaches Persistence engines rule the bleeding edge in 2009 Key/Value Stores, JSON Document stores, etc. Declarative/Imperative impedance mismatch(the “Vietnam” of the software tools industry) gets conflated with distributed data Lots of practical confusion ,[object Object]

Too many choices, with idiosyncratic design histories

When should I share components? Shared Disk Shared Nothing Partition compute across nodes Storage is shared through NAS or SAN Good for: Mixed workload Small random access reads Worst case: Inter-node network chatter caps scalability Disk pings to propagate writes (e.g. Oracle pre-RAC) Partition data across nodes Each node owns its data Good for: Read-mostly Parallel reads of huge data volumes Consistent writes go to one partition Worst case: Repartitioning Hotspot records don’t scale Writes that span partitions

Modern Data Persistence Systems Object Persistence “Navigational databases in Java, Smalltalk, C++” GemStone, Versant, Objectivity Distributed Key-Value Stores “Structured data with lesser need for complex queries” Consistent: BigTable, HBase, Voldemort Eventually Consistent: Dynamo, Cassandra Document and/or Blob Stores “Indexed structured data + binaries/fulltext” CouchDB, BerkeleyDB, MongoDB

Clustered DBMS for Transactions Oracle Real Application Clusters (RAC) Shared disk, Replicated Memory (“Cache Fusion”) Limited by mesh interconnect to disk (partitioning possible) IBM DB2 Data Partitioning Feature Shared nothing database cluster, high number of nodes IBM DB2 pureScale New (Oct 2009) technology that ports IBM DB2 mainframe shared-disk clustering to the DB2 for open systems Microsoft SQL Server 2008 “Federated” Shared Nothing Database a longtime feature

Clustered DBMS for Parallel Queries Teradata The old standard data warehouse, hardware + software Netezza Data warehousing appliance (hw + software) Vertica Column-oriented, shared nothing clustered database Mike Stonebraker’s new company Greenplum Column-oriented, shared nothing clustered database Based on PostgreSQL with MapReduce engine

Scaling to Internet-Scale Single Control Domain One Database Site Consistency is built-in Scalable with tradeoffs among different workloads Scale to the limits of network bandwidth & manageability Main Example: Clustered DBMS Multiple Control Domains Many Database Sites Consistency requires agreement protocol Scalable only if consistency is relaxed Nearly limitless (global) scale Main Examples: DNS The Web 39

How do I make consistency tradeoffs?Theory interlude: The CAP theorem Consistency (A+C in ACID) There’s a total orderingon all operations on the data;i.e. like a sequence Availability Every request onnon-failed servers must havea response Tolerance to Network Partitions All messages might be lost between server nodes Choose at most two of these (as a spectrum).

CAP Tradeoffs: Consistency & Availability ,[object Object]

Fault tolerance through replicas & fast fail + fast recovery ,[object Object]

network outage between servers might halt the system

generally requires a single domainof control

Coherence, Gigaspaces & Terracotta,[object Object]

Designing for the Cloud: A Practical Guide

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (14)

Similaire à Designing for the Cloud: A Practical Guide

Similaire à Designing for the Cloud: A Practical Guide (20)

Plus de Stuart Charlton

Plus de Stuart Charlton (14)

Dernier

Dernier (20)

Designing for the Cloud: A Practical Guide