Distributed Erlang Systems In Operation

•

7 j'aime•2,315 vues

Andy Gross

Architectural Goals
• Decentralized (no masters).
• Distributed (asynchronous, nodes use only
local data).
• Homogeneous (all nodes can do anything).
• Fault tolerant (emergent goal).
• Observable

Anti-Goals
• Global state:
• pg2/hot data in mnesia
• globally registered names
• Distributed transactions
• Reliance on physical time

Compromise your
Goals
• Decentralized (no masters).
• Distributed (nodes use only local data).
• Homogeneous (all nodes can do anything).
• No distributed transactions/global state.
• No reliance on physical time.

Systems Design

• Cluster Membership
• Load balancing/naming/resource allocation
• Liveness checking
• Soft Global State

Cluster Membership
• Option 1: Use a conﬁguration ﬁle:
• Requires out-of-band sync of
conﬁguration ﬁle across machines.
• Not “elastic” enough for some use-cases.
• Option II: Contact a seed node to join and
use gossip protocol to propagate state.

Load Balancing and
Resource Allocation
• Static assignment
• Round-robin/Random
• Static hashing: Nodes[hash(Item) mod
length(Nodes)]
• Dynamo/Riak/Cassandra/Voldemort:
Consistent Hashing

Liveness Checking
• nodes() and net_adm:ping() operations can
be too low-level.
• Sometimes you’d like to divert trafﬁc from
a node at the application level while
keeping distributed Erlang up.
• Use net_kernel:monitor_nodes() and an
app-level mechanism for liveness.

Soft State/Gossip
Protocols
• An eventually-consistent alternative to
global state.
• Nodes make changes, gossip to another
node.
• Nodes receive changes, merge with local
state, gossip to another node.
• Requires up-front thought about data
structures, dealing with slightly-stale data.

Running Your System

• Shipping code
• Upgrading code
• Debugging your own systems
• Living with other people’s systems

Shipping Code

• Don’t rely on working Erlang on end-user
machines (many Linux distros are broken
or out of date).
• Ship code with an embedded runtime and
libraries.
• Put version/build info in code.

Upgrading Code
• Hot code loading for small, emergency
ﬁxes.
• For new releases, reboot the node.
• Why not .appups?
• Systems I’ve worked on have changed/
evolved too fast.
• A reboot is a good test of resiliency.

Debugging Running
Systems
• Remote Erlang shells are awesome, except
when distributed Erlang dies (it happens).
• run_erl (or even screen(1)) give you a
backdoor for when -remsh fails.
• rebar (http://hg.basho.com/rebar) makes
this easy.
• What if you don’t have access to the box?

OPS - Other People’s
Systems
• Your Erlang, Enterprise ﬁrewalls.
• Erlang shell is powerful, but scary.
• Provide a debugging module.
• Get data out via HTTP/SMTP/SNMP
• Use disk_log/report_browser.

Questions?
“You know you have [a distributed system]
when the crash of a computer you’ve never
heard of stops you from getting any work
done”

-Leslie Lamport

Resources
• unsplit: http://github.com/uwiger/unsplit

• gen_leader: http://github.com/KirinDave/gen_leader_revival

• Dynamo: http://www.allthingsdistributed.com/2007/10/
amazons_dynamo.html

• Hans Svensson: Distributed Erlang Application Pitfalls and
Recipes: http://www.erlang.org/workshop/2007/proceedings/
06svenss.ppt

• Consistent Hashing and Random Trees: Distributed Caching
Protocols for relieving Hot Spots on the World Wide Web:
http://bit.ly/LewinConsistentHashing

Recommandé

Building Distributed Systems With Riak and Riak CoreAndy Gross

James Turner (Caplin) - Enterprise HTML5 Patternsakqaanoraks

Getting started with Riak in the CloudInes Sombra

Mobius: C# Language Binding For SparkSpark Summit

Using Apache Spark in the Cloud—A Devops Perspective with Telmo OliveiraSpark Summit

Riak at shareaholicfreerobby

SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at LyftChester Chen

Opaque: A Data Analytics Platform with Strong Security: Spark Summit East tal...Spark Summit

Recommandé

Building Distributed Systems With Riak and Riak CoreAndy Gross

James Turner (Caplin) - Enterprise HTML5 Patternsakqaanoraks

Getting started with Riak in the CloudInes Sombra

Mobius: C# Language Binding For SparkSpark Summit

Using Apache Spark in the Cloud—A Devops Perspective with Telmo OliveiraSpark Summit

Riak at shareaholicfreerobby

SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at LyftChester Chen

Opaque: A Data Analytics Platform with Strong Security: Spark Summit East tal...Spark Summit

A Collaborative Data Science Development WorkflowDatabricks

Whirlpools in the Stream with Jayesh LalwaniDatabricks

ELK at LinkedIn - Kafka, scaling, lessons learnedTin Le

Kafka Tiered Storage | Satish Duggana and Sriharsha Chintalapani, UberHostedbyConfluent

Creating Reusable Geospatial PipelinesDatabricks

Streaming SQLJungtaek Lim

Spark Compute as a Service at Paypal with Prabhu KasinathanDatabricks

Gocd – Kubernetes/Nomad Continuous DeploymentLeandro Totino Pereira

Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...confluent

Cooperative Data Exploration with iPython NotebookDataWorks Summit/Hadoop Summit

OSMC 2021 | Use OpenSource monitoring for an Enterprise Grade PlatformNETWAYS

Migrating to Apache Spark at NetflixDatabricks

InfluxEnterprise Architecture Patterns by Tim Hall & Sam DillardInfluxData

Real Time Data Processing With Spark Streaming, Node.js and Redis with Visual...Brandon O'Brien

Standalone Spark Deployment for Stability and PerformanceRomi Kuntsman

Lessons From Sharding Solr At Etsy: Presented by Gregg Donovan, EtsyLucidworks

Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...Data Con LA

#MesosCon 2014: Spark on MesosPaco Nathan

Real-time Data Streaming from Oracle to Apache Kafka confluent

Introducing Kubernetes VikRam S

東京Node学園#8 Let It Crash!?koichik

Concurrency in Elixir with OTPJustin Reese

Contenu connexe

Tendances

A Collaborative Data Science Development WorkflowDatabricks

Whirlpools in the Stream with Jayesh LalwaniDatabricks

ELK at LinkedIn - Kafka, scaling, lessons learnedTin Le

Kafka Tiered Storage | Satish Duggana and Sriharsha Chintalapani, UberHostedbyConfluent

Creating Reusable Geospatial PipelinesDatabricks

Streaming SQLJungtaek Lim

Spark Compute as a Service at Paypal with Prabhu KasinathanDatabricks

Gocd – Kubernetes/Nomad Continuous DeploymentLeandro Totino Pereira

Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...confluent

Cooperative Data Exploration with iPython NotebookDataWorks Summit/Hadoop Summit

OSMC 2021 | Use OpenSource monitoring for an Enterprise Grade PlatformNETWAYS

Migrating to Apache Spark at NetflixDatabricks

InfluxEnterprise Architecture Patterns by Tim Hall & Sam DillardInfluxData

Real Time Data Processing With Spark Streaming, Node.js and Redis with Visual...Brandon O'Brien

Standalone Spark Deployment for Stability and PerformanceRomi Kuntsman

Lessons From Sharding Solr At Etsy: Presented by Gregg Donovan, EtsyLucidworks

Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...Data Con LA

#MesosCon 2014: Spark on MesosPaco Nathan

Real-time Data Streaming from Oracle to Apache Kafka confluent

Introducing Kubernetes VikRam S

Tendances (20)

A Collaborative Data Science Development Workflow

Whirlpools in the Stream with Jayesh Lalwani

ELK at LinkedIn - Kafka, scaling, lessons learned

Kafka Tiered Storage | Satish Duggana and Sriharsha Chintalapani, Uber

Creating Reusable Geospatial Pipelines

Streaming SQL

Spark Compute as a Service at Paypal with Prabhu Kasinathan

Gocd – Kubernetes/Nomad Continuous Deployment

Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...

Cooperative Data Exploration with iPython Notebook

OSMC 2021 | Use OpenSource monitoring for an Enterprise Grade Platform

Migrating to Apache Spark at Netflix

InfluxEnterprise Architecture Patterns by Tim Hall & Sam Dillard

Real Time Data Processing With Spark Streaming, Node.js and Redis with Visual...

Standalone Spark Deployment for Stability and Performance

Lessons From Sharding Solr At Etsy: Presented by Gregg Donovan, Etsy

Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...

#MesosCon 2014: Spark on Mesos

Real-time Data Streaming from Oracle to Apache Kafka

Introducing Kubernetes

En vedette

東京Node学園#8 Let It Crash!?koichik

Concurrency in Elixir with OTPJustin Reese

Erlang vs. JavaArtan Cami

Intro to ErlangKen Pratt

1 hour dive into Erlang/OTPJordi Llonch

Node.jsエンジニア Erlangに入門するの巻Recruit Technologies

Erlang OTPZvi Avraham

Intro To Erlangasceth

リアルタイムサーバー〜Erlang/OTPで作るPubSubサーバー〜 Yugo Shimizu

Imprementation of realtime_networkgameSatoshi Yamafuji

ニコニコ生放送の配信基盤改善takahiro_yachi

NoSQL Databases: Why, what and whenLorenzo Alberton

En vedette (12)

東京Node学園#8 Let It Crash!?

Concurrency in Elixir with OTP

Erlang vs. Java

Intro to Erlang

1 hour dive into Erlang/OTP

Node.jsエンジニア Erlangに入門するの巻

Erlang OTP

Intro To Erlang

リアルタイムサーバー〜Erlang/OTPで作るPubSubサーバー〜

Imprementation of realtime_networkgame

ニコニコ生放送の配信基盤改善

NoSQL Databases: Why, what and when

Similaire à Distributed Erlang Systems In Operation

Deployment of WebObjects applications on CentOS LinuxWO Community

Introduction to multicore .pptRajagopal Nagarajan

Distributed systems and scalability rulesOleg Tsal-Tsalko

Big Data for QAsAhmed Misbah

Sanger, upcoming Openstack for Bio-informaticiansPeter Clapham

Flexible computePeter Clapham

Operating systems (For CBSE School Students)Gaurav Aggarwal

Deployment Strategies (Mongo Austin)MongoDB

How to Build a Compute ClusterRamsay Key

Repeating History...On Purpose...with ElixirBarry Jones

ThreadSyed Zaid Irshad

Systems Design Experiences or Just Some War Stories…Persistent Systems Ltd.

Spil Storage Platform (Erlang) @ EUG-NLThijs Terlouw

Erlang factory SF 2011 "Erlang and the big switch in social games"Paolo Negri

Erlang, the big switch in social gamesWooga

Hadoop Operations: Keeping the Elephant Running SmoothlyMichael Arnold

Considerations when implementing_ha_in_dmfhik_lhz

Metasploit & Windows Kernel ExploitationzeroSteiner

Practical Windows Kernel ExploitationzeroSteiner

Computer system organizationSyed Zaid Irshad

Similaire à Distributed Erlang Systems In Operation (20)

Deployment of WebObjects applications on CentOS Linux

Introduction to multicore .ppt

Distributed systems and scalability rules

Big Data for QAs

Sanger, upcoming Openstack for Bio-informaticians

Flexible compute

Operating systems (For CBSE School Students)

Deployment Strategies (Mongo Austin)

How to Build a Compute Cluster

Repeating History...On Purpose...with Elixir

Thread

Systems Design Experiences or Just Some War Stories…

Spil Storage Platform (Erlang) @ EUG-NL

Erlang factory SF 2011 "Erlang and the big switch in social games"

Erlang, the big switch in social games

Hadoop Operations: Keeping the Elephant Running Smoothly

Considerations when implementing_ha_in_dmf

Metasploit & Windows Kernel Exploitation

Practical Windows Kernel Exploitation

Computer system organization

Distributed Erlang Systems In Operation

1. Distributed Erlang Systems In Operation Andy Gross <andy@basho.com>, @argv0 VP, Engineering Basho Technologies Erlang Factory SF 2010

2. Architectural Goals • Decentralized (no masters). • Distributed (asynchronous, nodes use only local data). • Homogeneous (all nodes can do anything). • Fault tolerant (emergent goal). • Observable

3. Anti-Goals • Global state: • pg2/hot data in mnesia • globally registered names • Distributed transactions • Reliance on physical time

4. Compromise your Goals • Decentralized (no masters). • Distributed (nodes use only local data). • Homogeneous (all nodes can do anything). • No distributed transactions/global state. • No reliance on physical time.

5. Systems Design • Cluster Membership • Load balancing/naming/resource allocation • Liveness checking • Soft Global State

6. Cluster Membership • Option 1: Use a configuration file: • Requires out-of-band sync of configuration file across machines. • Not “elastic” enough for some use-cases. • Option II: Contact a seed node to join and use gossip protocol to propagate state.

7. Load Balancing and Resource Allocation • Static assignment • Round-robin/Random • Static hashing: Nodes[hash(Item) mod length(Nodes)] • Dynamo/Riak/Cassandra/Voldemort: Consistent Hashing

8. Liveness Checking • nodes() and net_adm:ping() operations can be too low-level. • Sometimes you’d like to divert trafﬁc from a node at the application level while keeping distributed Erlang up. • Use net_kernel:monitor_nodes() and an app-level mechanism for liveness.

9. Soft State/Gossip Protocols • An eventually-consistent alternative to global state. • Nodes make changes, gossip to another node. • Nodes receive changes, merge with local state, gossip to another node. • Requires up-front thought about data structures, dealing with slightly-stale data.

10. Running Your System • Shipping code • Upgrading code • Debugging your own systems • Living with other people’s systems

11. Shipping Code • Don’t rely on working Erlang on end-user machines (many Linux distros are broken or out of date). • Ship code with an embedded runtime and libraries. • Put version/build info in code.

12. Upgrading Code • Hot code loading for small, emergency ﬁxes. • For new releases, reboot the node. • Why not .appups? • Systems I’ve worked on have changed/ evolved too fast. • A reboot is a good test of resiliency.

13. Debugging Running Systems • Remote Erlang shells are awesome, except when distributed Erlang dies (it happens). • run_erl (or even screen(1)) give you a backdoor for when -remsh fails. • rebar (http://hg.basho.com/rebar) makes this easy. • What if you don’t have access to the box?

14. OPS - Other People’s Systems • Your Erlang, Enterprise ﬁrewalls. • Erlang shell is powerful, but scary. • Provide a debugging module. • Get data out via HTTP/SMTP/SNMP • Use disk_log/report_browser.

15. Questions? “You know you have [a distributed system] when the crash of a computer you’ve never heard of stops you from getting any work done” -Leslie Lamport

16. Resources • unsplit: http://github.com/uwiger/unsplit • gen_leader: http://github.com/KirinDave/gen_leader_revival • Dynamo: http://www.allthingsdistributed.com/2007/10/ amazons_dynamo.html • Hans Svensson: Distributed Erlang Application Pitfalls and Recipes: http://www.erlang.org/workshop/2007/proceedings/ 06svenss.ppt • Consistent Hashing and Random Trees: Distributed Caching Protocols for relieving Hot Spots on the World Wide Web: http://bit.ly/LewinConsistentHashing