The document describes the SEDA (Staged Event-Driven Architecture) framework. SEDA divides applications into stages connected by queues. Each stage has a thread pool and dynamic resource controllers that adjust thread allocation to meet performance targets. This allows applications built with SEDA to scale responsive and robustly handle varying load. SEDA was implemented in a system called Sandstorm that provides APIs for building networked services using asynchronous I/O and event-driven programming. Example services implemented with SEDA demonstrated improved throughput and response times under heavy load compared to traditional concurrency approaches.
1. SEDA: An Architecture for Well-
Conditioned, Scalable Internet Services
by Welsh, Culler, and Brewer
UC San Diego
CSE 294
Winter Quarter 2007
Barry Demchak
2. 2
The Objective
Services that support millions of users
Responsive
Robust
Highly available
Support large swings in load over time (100x)
General purpose mechanisms
3. 3
A Web Application Example: DNS
Verisign’s root servers
Two 10-high stacked 1U IBM eServers
running Solaris and Red Hat Linux
26 billion requests/day (normal traffic) or
300K-500K requests per second
In 2000, there were 1 billion requests/day
In 2010, predicting 200 billion requests/day
-- InfoWorld, February 16, 2007, DNS Attack
Puts Web Security in Perspecitive,
Roger Grimes
4. 4
SEDA
Problem Background
Survey of existing alternatives
SEDA design
Implementation results
Applicability to ESBs (e.g., Mule)
Applicability to THE picture
5. 5
The Environment
Static content becoming dynamic content
Rapidly changing service logic
Hosting on general purpose platforms
7. 7
Solution
SEDA = Staged Event-Driven Architecture
Applications decompose to stages
Stages have incoming event queues
Dynamic resource throttling
8. 8
Definitions
Well-conditioned
Behaves like a simple pipeline: output rate
scales to input rate
Excessive demand does not degrade pipeline
throughput
Graceful degradation
Under load, response time degrades linearly
with length of queue
Degraded response time constant for all
clients (subject to service policies)
12. 12
Bounded Thread Pools
Initial thread pool subject to expansion
Fixed thread pools cause unfairness to clients
because of long waits when threads are
blocked
Policies of reordering/prioritizing threads
based on expense of request are difficult
16. 16
SEDA Goals
Support massive concurrency
Simplify construction of well-conditioned
services
Enable introspection
Support self-tuning resource management
17. 17
SEDA Terms
Stage = fundamental processing unit
Event handler
Incoming event queue
Thread pool
Controller = scheduler for a stage
29. 29
SEDA Summary
Establishes principles toward Internet-style
operating environments
Stages ease concurrency complexity and
encourage modularity
Dynamic controllers enable novel scheduling
and resource management strategies
Challenges: detecting cause of overload
conditions, and control strategy to cure
overload
30. 30
Mule
From “Implementing an ESB using Mule” by
Ross Mason for JavaZone 2005
http://mule.mulesource.org/wiki/download/attachments/223/javazone-2005-mule-real-world-old.ppt?version=1
ESB
Loosely coupled components
Event driven
Highly distributed
Intelligent Routing
Data transformation
Multiprotocol message bus …
Internet hits translate to several I/O and network requests -> enormous load on underlying resources
As of 2001: Yahoo gets 1.2 billion hits per day, AOL gets over 10 billion page views per day
SEDA = Staged Event-Driven Architecture
Static -> dynamic: Extensive computation and I/O
Changing service logic: increasing engineering complexity and deployment
General purpose platforms: Not specially engineered platforms
Replication – cannot scale to orders of magnitude
Traditional OS and concurrency – brittle under large loads
Traditional OS focuses on giving providing transparency through virtual machines. Thread switching has high overhead, threads have large memory overhead.
Internet applications need massive concurrency and better control over resource usage … better control makes a big difference at the margin of excessive load.
Stages are robust building blocks subject to thresholding and filtering according to load
Allows informed scheduling and resource-management decisions, including request reordering, filtering, and aggregation.
Dynamic resource throttling allows control over resource allocation and scheduling of components
Demonstration: High performance HTTP server (… remember CSE222a?? … multiple threads vs Select statement?)
Well-conditioned: output latency is determined by the length of the queue/pipeline
This property holds regardless of the number of stages in the pipeline, subject to queueing/dequeing times
In a non-pipelined design, clients wait for entire operation to complete before moving on to next one.
Overheads: cache and TLB misses, scheduling overhead, and lock contention
Threads = multiprogramming … virtualization hides global resource management
SPIN, Exokernel, Nemesis = examples of OS attempts to solve this
Apache, IIS, Netscape Enterprise Server, BEA Weblogic, IBM WebSphere
Bad case: cached static pages (cheap) vs large pages not in cache (expensive)
Flash, thttpd, Zeus, JAWS
Complex scheduler … hard to maintain … complex FSM maintenance, too … modularity difficult to achieve
Needs helper threads that do blocked I/O
Has well-conditioned, graceful degradation --
Sets of event queues … promote modularity
Seems to be a generic class of solutions
Support massive concurrency = event-driven execution where possible
Simplify construction of well-conditioned services = shields the application from details of scheduling and resource management … supports modular construction, support for debugging and profiling
Enable introspection = applications analyze the stream to adapt behavior to load … prioritize and filter services to support degraded service under load
Support self-tuning resource management = tune resource parameters to load … e.g., allocate threads to a stage based on load instead of hard-coding it apriori
Threads pull events from queue, schedule events on downstream queues, and waits for more
Controller adjusts resource allocation and scheduler dynamically
Note that threads are stage resources, not task resources. Thread count can be dynamically allocated based on load.
Stages can run in serial or parallel, or both.
Event handler can implement its own scheduling policy irrespective of how the system filled the queue
Set of stages separated by queues
Private thread pool per stage
Each state can be independently managed
Stages can be run in serial or parallel
Each stage can be independently load conditioned … more threads for heavy loads (thresholding)
Important point: queues can be finite, which means that a stage can fail to queue an event … meaning next stage is very busy. Stage can make a decision to block (backpressure), or drop event (load shedding) and take some remedial action
Question: should two modules communicate via method call or queue? A queue system promotes modularity, isolation, and load management … cost: latency
Important point: debugging, billing, memory usage, and queue profiling can occur by attaching processors between stages and queues
Thread pool controller
Adjusts the number of threads executing within each stage
Periodically examines queue and adds threads if queue exceeds some threshold … or removes them if they’re idle for some amount of time
Batching controller
Adjusts the number of events processed by each invocation of the event handler (i.e., a batching factor)
This increases throughput due to cache locality
Tries to strike a balance … large batching factors can degrade *overall* performance … so try to select a small batching factor … kind of like a PLL
*** Controllers are a great way of enforcing performance policy ***
This is for the Haboob web server
Thread pool adjusted based on length of corresponding queue
Queue length sampled every 2 seconds, thread added to pool if queue exceeded 100 entries … max 20 threads
Threads are removed from the pool idle for more than 5 seconds
AsyncFile used a threshold of 10 queue entries to exaggerate the behavior
Shows single stage generating events at an oscillating rate
When the output rate increases, controller decreases batching factor … and visa versa
*** The point: controllers allow application to adapt to changing conditions regardless of the particular algorithms used by the operating system … or the application ***
They’re eating their own dog food.
Completed I/O goes back into the next stage’s queue
Clients issue bursts of 8KB packets
Server responds with ACK for every 1000 packets
Assume GB ethernet and all Linux boxes
Slight degradation for SEDA is due to non-scalability in Linux network stack
Threaded implementation stops at 512 connections due to Linux thread limitations
SEDA implementation had 120 threads to handle socket writes.
Apache and Flash often perform faster, but the tails are fierce. Haboob gives consistent response time under load