SlideShare une entreprise Scribd logo
1  sur  189
Télécharger pour lire hors ligne
Resiliency,
Failures vs Errors,
Isolation, Delegation
and Replication
in Reactive Systems
Jonas Bonér
CTO TypEsafe
@jboner
“But it ain’t how hard you’re hit;
it’s about how hard you can get
hit, and keep moving forward.
How much you can take, and
keep moving forward. That’s
how winning is done.”
- Rocky Balboa
“But it ain’t how hard you’re hit;
it’s about how hard you can get
hit, and keep moving forward.
How much you can take, and
keep moving forward. That’s
how winning is done.”
- Rocky Balboa
This is Fault Tolerance
Resilience
is Beyond
Fault Tolerance
Resilience
“The ability of a substance or
object to spring back into shape.
The capacity to recover quickly
from difficulties.”
-Merriam Webster
Without Resilience
Nothing Else Matters
Without Resilience
Nothing Else Matters
“We can model and understand in isolation. 

But, when released into competitive nominally
regulated societies, their connections proliferate, 

their interactions and interdependencies multiply, 

their complexities mushroom. 

And we are caught short.”
- Sidney Dekker
Drift into Failure - Sidney Dekker
We Need to Study
Resilience in
Complex Systems
Complicated System
Complicated System
Complex System
Complex System
Complex System
Complicated ≠Complex
“Counterintuitive. That’s [Jay] Forrester’s word
to describe complex systems. Leverage points
are not intuitive. Or if they are, we intuitively
use them backward, systematically worsening
whatever problems we are trying to solve.”
- Donella Meadows
Leverage Points: Places to Intervene in a System - Donella Meadows
‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen
Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013
Operating at the Edge of Failure
‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen
Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013
Economic
Failure
Boundary
Operating at the Edge of Failure
‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen
Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013
Economic
Failure
Boundary
Unacceptable
Workload
Boundary
Operating at the Edge of Failure
‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen
Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013
Economic
Failure
Boundary
Unacceptable
Workload
Boundary
Accident
Boundary
Operating at the Edge of Failure
‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen
Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013
Economic
Failure
Boundary
Unacceptable
Workload
Boundary
Operating Point
Accident
Boundary
Operating at the Edge of Failure
‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen
Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013
Economic
Failure
Boundary
Unacceptable
Workload
Boundary
Accident
Boundary
Operating at the Edge of Failure
‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen
Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013
Economic
Failure
Boundary
Unacceptable
Workload
Boundary
FAILURE
Accident
Boundary
Operating at the Edge of Failure
Economic
Failure
Boundary
Unacceptable
Workload
Boundary
Accident
Boundary
‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen
Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013
Operating at the Edge of Failure
Economic
Failure
Boundary
Unacceptable
Workload
Boundary
Accident
Boundary
Management Pressure
Towards Economic Efficiency
‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen
Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013
Operating at the Edge of Failure
Economic
Failure
Boundary
Unacceptable
Workload
Boundary
Accident
Boundary
Management Pressure
Towards Economic Efficiency
Gradient Towards
Least Effort
‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen
Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013
Operating at the Edge of Failure
Economic
Failure
Boundary
Unacceptable
Workload
Boundary
Accident
Boundary
Management Pressure
Towards Economic Efficiency
Gradient Towards
Least Effort
‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen
Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013
Operating at the Edge of Failure
Economic
Failure
Boundary
Unacceptable
Workload
Boundary
Accident
Boundary
Management Pressure
Towards Economic Efficiency
Gradient Towards
Least Effort
Counter Gradient
For More Resilience
‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen
Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013
Operating at the Edge of Failure
‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen
Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013
Economic
Failure
Boundary
Unacceptable
Workload
Boundary
Accident
Boundary
Operating at the Edge of Failure
‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen
Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013
Economic
Failure
Boundary
Unacceptable
Workload
Boundary
Accident
Boundary
Error Margin
Marginal
Boundary
Operating at the Edge of Failure
‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen
Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013
Economic
Failure
Boundary
Unacceptable
Workload
Boundary
Accident
Boundary
Error Margin
Marginal
Boundary
Operating at the Edge of Failure
‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen
Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013
Economic
Failure
Boundary
Unacceptable
Workload
Boundary
Accident
Boundary
Error Margin
Marginal
Boundary
Operating at the Edge of Failure
‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen
Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013
Accident
Boundary
Marginal
Boundary
Operating at the Edge of Failure
‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen
Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013
Marginal
Boundary
?
Operating at the Edge of Failure
‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen
Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013
Operating at the Edge of Failure
Accident
Boundary
Marginal
Boundary
‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen
Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013
Operating at the Edge of Failure
Accident
Boundary
Marginal
Boundary
‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen
Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013
Operating at the Edge of Failure
Accident
Boundary
Marginal
Boundary
‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen
Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013
Operating at the Edge of Failure
Accident
Boundary
Marginal
Boundary
Embrace Failure
Resilience in
Social Systems
Simple Critical Infrastructure Map
Understanding vital services, and how they keep you safe
6ways to die
3sets of essential services
7layers of PROTECTION
Dealing in Security - Mike Bennet, Vinay Gupta
7 Principles for Building
Resilience in Social Systems
1. Maintain diversity & Redundancy
2. Manage connectivity
3. Manage slow variables & feedback
4. Foster complex adaptive systems thinking
5. Encourage learning
6. Broaden participation
7. Promote polycentric governance
Principles for Building Resilience: Sustaining Ecosystem Services in Social-Ecological Systems - Reinette Biggs et. al.
Resilience in
Biological Systems
MeerkatsPuppies! Now that I’ve got your attention, complexity theory - Nicolas Perony, TED talk
What We Can Learn
From Biological Systems
1. Feature Diversity and redundancy
2. Inter-Connected network structure
3. Wide distribution across all scales
4. Capacity to self-adapt & self-organize
Toward Resilient Architectures 1: Biology Lessons - Michael Mehaffy, Nikos A. Salingaros
“Animals show extraordinary social complexity,
and this allows them to adapt and 

respond to changes in their environment.
In three words, in the animal kingdom,
simplicity leads to complexity 

which leads to resilience.”
- Nicolas Perony
Puppies! Now that I’ve got your attention, complexity theory - Nicolas Perony, TED talk
Resilience in
Computer Systems
“Complex systems run in degraded mode.”
“Complex systems run as broken systems.”
- richard Cook
How Complex Systems Fail - Richard Cook
Resilience
is by
Design
Photo courtesy of FEMA/Joselyne Augustino
We Need to
Manage Failure
“Post-accident attribution to a 

‘root cause’ is fundamentally wrong: 

Because overt failure requires multiple faults,
there is no isolated ‘cause’ of an accident.”
- richard Cook
How Complex Systems Fail - Richard Cook
There is No
Root Cause
Crash Only
Software
Crash-Only Software - George Candea, Armando Fox
Stop = Crash Safely
Start = Recover Fast
“To make a system of interconnected components
crash-only, it must be designed so that components
can tolerate the crashes and temporary unavailability
of their peers. This means we require: [1] strong
modularity with relatively impermeable component
boundaries, [2] timeout-based communication and
lease-based resource allocation, and [3] self-
describing requests that carry a time-to-live and
information on whether they are idempotent.”
- George Candea, Armando Fox
Crash-Only Software - George Candea, Armando Fox
Recursive Restartability
Turning the Crash-Only Sledgehammer into a Scalpel
Recursive Restartability: Turning the Reboot Sledgehammer into a Scalpel - George Candea, Armando Fox
Services need to accept
NO for an answer
"Software components should be designed such
that they can deny service for any request or call.
Then, if an underlying component can say No,
apps must be designed to take No for an answer
and decide how to proceed: give up, wait and
retry, reduce fidelity, etc.”
- George Candea, Armando Fox
Recursive Restartability: Turning the Reboot Sledgehammer into a Scalpel - George Candea, Armando Fox
Learn to take
NO for an answer
“The explosive growth of software has
added greatly to systems’ interactive
complexity. With software, the possible
states that a system can end up in become
mind-boggling.”
- Sidney Dekker
Drift into Failure - Sidney Dekker
Classification
of State
• Static Data
• Scratch Data
• Dynamic Data
• Recomputable
• not recomputable
Classification
of State
• Static Data
• Scratch Data
• Dynamic Data
• Recomputable
• not recomputable
Critical
We Need a
Way Out of the
State Tar Pit
Out of the Tar Pit - Ben Moseley , Peter Marks
Essential
State
We Need a
Way Out of the
State Tar Pit
Out of the Tar Pit - Ben Moseley , Peter Marks
Essential
State
We Need a
Way Out of the
State Tar Pit
Out of the Tar Pit - Ben Moseley , Peter Marks
Essential
Logic
Essential
State
We Need a
Way Out of the
State Tar Pit
Out of the Tar Pit - Ben Moseley , Peter Marks
Essential
Logic
Accidental
State and
Control
Traditional
State Management
Object
Critical state
that needs protection
Client
Thread boundary
Traditional
State Management
Object
Critical state
that needs protection
Client
Thread boundary
Traditional
State Management
Object
Critical state
that needs protection
Client
Thread boundary
Traditional
State Management
Object
Critical state
that needs protection
Client
Thread boundary
Synchronous dispatch Thread boundary
Traditional
State Management
Object
Critical state
that needs protection
Client
Thread boundary
Synchronous dispatch Thread boundary
Traditional
State Management
Object
Critical state
that needs protection
Client
Thread boundary
Synchronous dispatch Thread boundary
?
Traditional
State Management
Object
Critical state
that needs protection
Client
Thread boundary
Synchronous dispatch Thread boundary
?
Utterly broken
Requirements for a
Sane Failure Mode
1. Contained
2. Reified—as messages
3. Signalled—Asynchronously
4. Observed—by 1-N
5. Managed
Failures need to be
Akka Actors
Akka Actors
case class Greeting(who: String)
class GreetingActor extends Actor with ActorLogging {
def receive = {
case Greeting(who) => log.info(s”Hello ${who}")
}
}
Akka Actors
case class Greeting(who: String)
class GreetingActor extends Actor with ActorLogging {
def receive = {
case Greeting(who) => log.info(s”Hello ${who}")
}
}
Define the message(s) the Actor
should be able to respond to
Akka Actors
case class Greeting(who: String)
class GreetingActor extends Actor with ActorLogging {
def receive = {
case Greeting(who) => log.info(s”Hello ${who}")
}
}
Define the message(s) the Actor
should be able to respond to
Define the Actor class
Akka Actors
case class Greeting(who: String)
class GreetingActor extends Actor with ActorLogging {
def receive = {
case Greeting(who) => log.info(s”Hello ${who}")
}
}
Define the message(s) the Actor
should be able to respond to
Define the Actor class
Define the Actor’s behavior
Akka Actors
case class Greeting(who: String)
class GreetingActor extends Actor with ActorLogging {
def receive = {
case Greeting(who) => log.info(s”Hello ${who}")
}
}
Akka Actors
case class Greeting(who: String)
class GreetingActor extends Actor with ActorLogging {
def receive = {
case Greeting(who) => log.info(s”Hello ${who}")
}
}
val system = ActorSystem("MySystem")
val greeter = system.actorOf(Props[GreetingActor], "greeter")
Akka Actors
case class Greeting(who: String)
class GreetingActor extends Actor with ActorLogging {
def receive = {
case Greeting(who) => log.info(s”Hello ${who}")
}
}
val system = ActorSystem("MySystem")
val greeter = system.actorOf(Props[GreetingActor], "greeter")
Create an Actor system
Akka Actors
case class Greeting(who: String)
class GreetingActor extends Actor with ActorLogging {
def receive = {
case Greeting(who) => log.info(s”Hello ${who}")
}
}
val system = ActorSystem("MySystem")
val greeter = system.actorOf(Props[GreetingActor], "greeter")
Create an Actor system
Actor configuration
Akka Actors
case class Greeting(who: String)
class GreetingActor extends Actor with ActorLogging {
def receive = {
case Greeting(who) => log.info(s”Hello ${who}")
}
}
val system = ActorSystem("MySystem")
val greeter = system.actorOf(Props[GreetingActor], "greeter")
Give it a name
Create an Actor system
Actor configuration
Akka Actors
case class Greeting(who: String)
class GreetingActor extends Actor with ActorLogging {
def receive = {
case Greeting(who) => log.info(s”Hello ${who}")
}
}
val system = ActorSystem("MySystem")
val greeter = system.actorOf(Props[GreetingActor], "greeter")
Give it a nameCreate the Actor
Create an Actor system
Actor configuration
Akka Actors
case class Greeting(who: String)
class GreetingActor extends Actor with ActorLogging {
def receive = {
case Greeting(who) => log.info(s”Hello ${who}")
}
}
val system = ActorSystem("MySystem")
val greeter = system.actorOf(Props[GreetingActor], "greeter")
Give it a nameCreate the ActorYou get an ActorRef back
Create an Actor system
Actor configuration
Akka Actors
case class Greeting(who: String)
class GreetingActor extends Actor with ActorLogging {
def receive = {
case Greeting(who) => log.info(s”Hello ${who}")
}
}
val system = ActorSystem("MySystem")
val greeter = system.actorOf(Props[GreetingActor], "greeter")
Akka Actors
case class Greeting(who: String)
class GreetingActor extends Actor with ActorLogging {
def receive = {
case Greeting(who) => log.info(s”Hello ${who}")
}
}
val system = ActorSystem("MySystem")
val greeter = system.actorOf(Props[GreetingActor], "greeter")
greeter ! Greeting("Charlie Parker")
Akka Actors
case class Greeting(who: String)
class GreetingActor extends Actor with ActorLogging {
def receive = {
case Greeting(who) => log.info(s”Hello ${who}")
}
}
val system = ActorSystem("MySystem")
val greeter = system.actorOf(Props[GreetingActor], "greeter")
greeter ! Greeting("Charlie Parker")
Send the message asynchronously
Akka Actors
case class Greeting(who: String)
class GreetingActor extends Actor with ActorLogging {
def receive = {
case Greeting(who) => log.info(s”Hello ${who}")
}
}
val system = ActorSystem("MySystem")
val greeter = system.actorOf(Props[GreetingActor], "greeter")
greeter ! Greeting("Charlie Parker")
Bulkhead
Pattern
Bulkhead
Pattern
Bulkhead
Pattern
Enter Supervision
Enter Supervision
The
Vending
Machine
Pattern
Think Vending Machine
Coffee
Machine
Programmer
Think Vending Machine
Coffee
Machine
Programmer
Inserts coins
Think Vending Machine
Coffee
Machine
Programmer
Inserts coins
Add more coins
Think Vending Machine
Coffee
Machine
Programmer
Inserts coins
Gets coffee
Add more coins
Think Vending Machine
Programmer
Coffee
Machine
Think Vending Machine
Programmer
Inserts coins
Coffee
Machine
Think Vending Machine
Programmer
Inserts coins
Out of coffee beans error Coffee
Machine
Think Vending Machine
Programmer
Inserts coins
Out of coffee beans error
WRONG
Coffee
Machine
Think Vending Machine
Programmer
Inserts coins
Coffee
Machine
Think Vending Machine
Programmer
Inserts coins
Out of
coffee beans
failure
Coffee
Machine
Think Vending Machine
Programmer
Service
Guy
Inserts coins
Out of
coffee beans
failure
Coffee
Machine
Think Vending Machine
Programmer
Service
Guy
Inserts coins
Out of
coffee beans
failure
Adds
more
beans
Coffee
Machine
Think Vending Machine
Programmer
Service
Guy
Inserts coins
Gets coffee
Out of
coffee beans
failure
Adds
more
beans
Coffee
Machine
Think Vending Machine
ServiceClient
Think Vending Machine
ServiceClient
Request
Think Vending Machine
ServiceClient
Request
Response
Think Vending Machine
ServiceClient
Request
Response
Validation Error
Think Vending Machine
ServiceClient
Request
Response
Validation Error
Application
Failure
Think Vending Machine
ServiceClient
Supervisor
Request
Response
Validation Error
Application
Failure
Think Vending Machine
ServiceClient
Supervisor
Request
Response
Validation Error
Application
Failure
Manages
Failure
“Accidents come from relationships 
not broken parts.”
- Sidney dekker
Drift into Failure - Sidney Dekker
Error Kernel
Pattern
Onion-layered state & Failure management
Making reliable distributed systems in the presence of software errors - Joe Armstrong
On Erlang, State and Crashes - Jesper Louis Andersen
Onion Layered
State Management
Object
Critical state
that needs protection
Client
Thread boundary
Onion Layered
State Management
Object
Critical state
that needs protection
Client
Thread boundary
Onion Layered
State Management
Error Kernel
Object
Critical state
that needs protection
Client
Thread boundary
Onion Layered
State Management
Error Kernel
Object
Critical state
that needs protection
Client
Thread boundary
Onion Layered
State Management
Error Kernel
Object
Critical state
that needs protection
Client
Supervision
Thread boundary
Onion Layered
State Management
Error Kernel
Object
Critical state
that needs protection
Client
Supervision
Supervision
Thread boundary
Onion Layered
State Management
Error Kernel
Object
Critical state
that needs protection
Client
Supervision
Supervision
Thread boundary
Onion Layered
State Management
Error Kernel
Object
Critical state
that needs protection
Client
Supervision
Supervision
Thread boundary
Onion Layered
State Management
Error Kernel
Object
Critical state
that needs protection
Client
Supervision
Supervision
Thread boundary
Onion Layered
State Management
Error Kernel
Object
Critical state
that needs protection
Client
Supervision
Supervision
Thread boundary
Onion Layered
State Management
Error Kernel
Object
Critical state
that needs protection
Client
Supervision
Supervision
Thread boundary
Onion Layered
State Management
Error Kernel
Object
Critical state
that needs protection
Client
Supervision
Supervision
Thread boundary
Onion Layered
State Management
Error Kernel
Object
Critical state
that needs protection
Client
Supervision
Supervision
Thread boundary
Onion Layered
State Management
Error Kernel
Object
Critical state
that needs protection
Client
Supervision
Supervision
Thread boundary
Supervision in Akka
Every actor has a default supervisor strategy.
Which can, and often should, be overridden.
class Supervisor extends Actor {
override val supervisorStrategy =
OneForOneStrategy(maxNrOfRetries = 10, withinTimeRange = 1 minute) {
case _: ArithmeticException => Resume
case _: NullPointerException => Restart
case _: Exception => Escalate
}
val worker = context.actorOf(Props[Worker], name = "worker")
def receive = {
case number: Int => worker.forward(number)
}
}
Supervision in Akka
Every actor has a default supervisor strategy.
Which can, and often should, be overridden.
class Supervisor extends Actor {
override val supervisorStrategy =
OneForOneStrategy(maxNrOfRetries = 10, withinTimeRange = 1 minute) {
case _: ArithmeticException => Resume
case _: NullPointerException => Restart
case _: Exception => Escalate
}
val worker = context.actorOf(Props[Worker], name = "worker")
def receive = {
case number: Int => worker.forward(number)
}
}
Parent actor
Supervision in Akka
Every actor has a default supervisor strategy.
Which can, and often should, be overridden.
class Supervisor extends Actor {
override val supervisorStrategy =
OneForOneStrategy(maxNrOfRetries = 10, withinTimeRange = 1 minute) {
case _: ArithmeticException => Resume
case _: NullPointerException => Restart
case _: Exception => Escalate
}
val worker = context.actorOf(Props[Worker], name = "worker")
def receive = {
case number: Int => worker.forward(number)
}
}
Parent actor
All its children have their life-cycle managed
through this declarative supervision strategy
Supervision in Akka
Every actor has a default supervisor strategy.
Which can, and often should, be overridden.
class Supervisor extends Actor {
override val supervisorStrategy =
OneForOneStrategy(maxNrOfRetries = 10, withinTimeRange = 1 minute) {
case _: ArithmeticException => Resume
case _: NullPointerException => Restart
case _: Exception => Escalate
}
val worker = context.actorOf(Props[Worker], name = "worker")
def receive = {
case number: Int => worker.forward(number)
}
}
Create a supervised child actor
Parent actor
All its children have their life-cycle managed
through this declarative supervision strategy
Monitor through
Death Watch
class Watcher extends Actor {
val child = context.actorOf(Props.empty, "child")
context.watch(child)
def receive = {
case Terminated(`child`) => … // handle child termination
}
}
Monitor through
Death Watch
class Watcher extends Actor {
val child = context.actorOf(Props.empty, "child")
context.watch(child)
def receive = {
case Terminated(`child`) => … // handle child termination
}
}
Create a child actor
Monitor through
Death Watch
class Watcher extends Actor {
val child = context.actorOf(Props.empty, "child")
context.watch(child)
def receive = {
case Terminated(`child`) => … // handle child termination
}
}
Create a child actor
Watch it
Monitor through
Death Watch
class Watcher extends Actor {
val child = context.actorOf(Props.empty, "child")
context.watch(child)
def receive = {
case Terminated(`child`) => … // handle child termination
}
}
Create a child actor
Watch it
Receive termination signal
Maintain Diversity
and Redundancy
Maintain Diversity
and Redundancy
Akka Routing
akka {
actor {
deployment {
/service/router {
router = round-robin-pool
resizer {
lower-bound = 12
upper-bound = 15
}
}
}
}
}
Akka Cluster
akka {
actor {
deployment {
/service/router {
router = round-robin-pool
resizer {
lower-bound = 12
upper-bound = 15
}
}
}
provider = "akka.cluster.ClusterActorRefProvider"
}
  
cluster {
seed-nodes = [
“akka.tcp://ClusterSystem@127.0.0.1:2551",
“akka.tcp://ClusterSystem@127.0.0.1:2552"
]
  }
}
The Network
is Reliable
The Network
is Reliable
NAT
We are living in the
Looming
Shadowof
Impossibility
Theorems
We are living in the
Looming
Shadowof
Impossibility
Theorems
CAP: Consistency is impossible
We are living in the
Looming
Shadowof
Impossibility
Theorems
CAP: Consistency is impossible
FLP: Consensus is impossible
Strong
Consistency
Is the wrong default
We need
Decoupling IN
Time and Space
Resilient Protocols
Depend on
Asynchronous Communication
Eventual Consistency
Resilient Protocols
• are tolerant to
• Message loss
• Message reordering
• Message duplication
Depend on
Asynchronous Communication
Eventual Consistency
Resilient Protocols
• are tolerant to
• Message loss
• Message reordering
• Message duplication
• Embrace ACID 2.0
• Associative
• Commutative
• Idempotent
• Distributed
Depend on
Asynchronous Communication
Eventual Consistency
Akka Distributed Data
class DataBot extends Actor with ActorLogging {
val replicator = DistributedData(context.system).replicator
implicit val node = Cluster(context.system)
val tickTask = context.system.scheduler.schedule(
5.seconds, 5.seconds, self, Add)
val DataKey = ORSetKey[String]("key")
replicator ! Subscribe(DataKey, self)
Akka Distributed Data
class DataBot extends Actor with ActorLogging {
val replicator = DistributedData(context.system).replicator
implicit val node = Cluster(context.system)
val tickTask = context.system.scheduler.schedule(
5.seconds, 5.seconds, self, Add)
val DataKey = ORSetKey[String]("key")
replicator ! Subscribe(DataKey, self)
def receive = {
case Tick =>
val s = ThreadLocalRandom.current().nextInt(97, 123).toChar.toString
if (ThreadLocalRandom.current().nextBoolean())
replicator ! Update(DataKey, ORSet.empty[String], WriteLocal)(_ + s)
else
replicator ! Update(DataKey, ORSet.empty[String], WriteLocal)(_ - s)
case c @ Changed(DataKey) =>
val data = c.get(DataKey)
case _: UpdateResponse[_] => // ignore
}
}
“An escalator can never break: it can only
become stairs. You should never see an
Escalator Temporarily Out Of Order sign, just
Escalator Temporarily Stairs. Sorry for the
convenience.”
- Mitch Hedberg
Graceful
Degradation
Always Rely on
Asynchronous
Communication
Always Rely on
Asynchronous
Communication
…And If not pOssiblE
Always use
Timeouts
Circuit Breaker
Circuit Breaker in Akka
val breaker =
new CircuitBreaker(
context.system.scheduler,
maxFailures = 5,
callTimeout = 10.seconds,
resetTimeout = 1.minute
).onOpen(…)
.onHalfOpen(…)
val result = breaker.withCircuitBreaker(Future(dangerousCall()))
Little’s Law
W: Response Time
L: Queue Length
Little’s Law
Queue Length = Arrival Rate * Response Time
W: Response Time
L: Queue Length
Little’s Law
Response Time = Queue Length / Arrival Rate
W: Response Time
L: Queue Length
Flow Control
Flow Control
Always Apply Back Pressure
Akka Streams
Akka Streams
Akka Streams
in ~> f1 ~> bcast ~> f2 ~> merge ~> f3 ~> out
bcast ~> f4 ~> merge
Akka Streams
in ~> f1 ~> bcast ~> f2 ~> merge ~> f3 ~> out
bcast ~> f4 ~> merge
val g = FlowGraph.closed() {
implicit builder: FlowGraph.Builder =>
import FlowGraph.Implicits._
val in = Source(1 to 10)
val out = Sink.ignore
val bcast = builder.add(Broadcast[Int](2))
val merge = builder.add(Merge[Int](2))
val f1, f2, f3, f4 = Flow[Int].map(_ + 10)
}
Akka Streams
in ~> f1 ~> bcast ~> f2 ~> merge ~> f3 ~> out
bcast ~> f4 ~> merge
val g = FlowGraph.closed() {
implicit builder: FlowGraph.Builder =>
import FlowGraph.Implicits._
val in = Source(1 to 10)
val out = Sink.ignore
val bcast = builder.add(Broadcast[Int](2))
val merge = builder.add(Merge[Int](2))
val f1, f2, f3, f4 = Flow[Int].map(_ + 10)
}
Set up the context
Akka Streams
in ~> f1 ~> bcast ~> f2 ~> merge ~> f3 ~> out
bcast ~> f4 ~> merge
val g = FlowGraph.closed() {
implicit builder: FlowGraph.Builder =>
import FlowGraph.Implicits._
val in = Source(1 to 10)
val out = Sink.ignore
val bcast = builder.add(Broadcast[Int](2))
val merge = builder.add(Merge[Int](2))
val f1, f2, f3, f4 = Flow[Int].map(_ + 10)
}
Set up the context
Create the
Source and Sink
Akka Streams
in ~> f1 ~> bcast ~> f2 ~> merge ~> f3 ~> out
bcast ~> f4 ~> merge
val g = FlowGraph.closed() {
implicit builder: FlowGraph.Builder =>
import FlowGraph.Implicits._
val in = Source(1 to 10)
val out = Sink.ignore
val bcast = builder.add(Broadcast[Int](2))
val merge = builder.add(Merge[Int](2))
val f1, f2, f3, f4 = Flow[Int].map(_ + 10)
}
Set up the context
Create the
Source and Sink
Create the fan out
and fan in stages
Akka Streams
in ~> f1 ~> bcast ~> f2 ~> merge ~> f3 ~> out
bcast ~> f4 ~> merge
val g = FlowGraph.closed() {
implicit builder: FlowGraph.Builder =>
import FlowGraph.Implicits._
val in = Source(1 to 10)
val out = Sink.ignore
val bcast = builder.add(Broadcast[Int](2))
val merge = builder.add(Merge[Int](2))
val f1, f2, f3, f4 = Flow[Int].map(_ + 10)
}
Set up the context
Create the
Source and Sink
Create the fan out
and fan in stages
Create a set of
transformations
Akka Streams
in ~> f1 ~> bcast ~> f2 ~> merge ~> f3 ~> out
bcast ~> f4 ~> merge
val g = FlowGraph.closed() {
implicit builder: FlowGraph.Builder =>
import FlowGraph.Implicits._
val in = Source(1 to 10)
val out = Sink.ignore
val bcast = builder.add(Broadcast[Int](2))
val merge = builder.add(Merge[Int](2))
val f1, f2, f3, f4 = Flow[Int].map(_ + 10)
}
Set up the context
Create the
Source and Sink
Create the fan out
and fan in stages
Create a set of
transformations
Define the graph
processing blueprint
Testing
What can we learn from Arnold?
What can we learn from Arnold?
What can we learn from Arnold?
Blow things up
Shoot
Your App
Down
Pull the Plug
…and see what happens
Executive
Summary
“Complex systems run in degraded mode.”
“Complex systems run as broken systems.”
- richard Cook
How Complex Systems Fail - Richard Cook
Resilience
is by
Design
Photo courtesy of FEMA/Joselyne Augustino
ReferencesAntifragile: Things That Gain from Disorder - http://www.amazon.com/Antifragile-Things-that-Gain-Disorder-ebook/dp/B009K6DKTS 

Drift into Failure - http://www.amazon.com/Drift-into-Failure-Components-Understanding-ebook/dp/B009KOKXKY

How Complex Systems Fail - http://web.mit.edu/2.75/resources/random/How%20Complex%20Systems%20Fail.pdf

Leverage Points: Places to Intervene in a System - http://www.donellameadows.org/archives/leverage-points-places-to-intervene-in-a-system/
Going Solid: A Model of System Dynamics and Consequences for Patient Safety - http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1743994/

Resilience in Complex Adaptive Systems: Operating at the Edge of Failure - https://www.youtube.com/watch?v=PGLYEDpNu60

Dealing in Security - http://resiliencemaps.org/files/Dealing_in_Security.July2010.en.pdf

Principles for Building Resilience: Sustaining Ecosystem Services in Social-Ecological Systems - http://www.amazon.com/Principles-
Building-Resilience-Sustaining-Social-Ecological/dp/110708265X 

Puppies! Now that I’ve got your attention, Complexity Theory - https://www.ted.com/talks/
nicolas_perony_puppies_now_that_i_ve_got_your_attention_complexity_theory

How Bacteria Becomes Resistant - http://www.abc.net.au/science/slab/antibiotics/resistance.htm

Towards Resilient Architectures: Biology Lessons - http://www.metropolismag.com/Point-of-View/March-2013/Toward-Resilient-
Architectures-1-Biology-Lessons/

Crash-Only Software - https://www.usenix.org/legacy/events/hotos03/tech/full_papers/candea/candea.pdf

Recursive Restartability: Turning the Reboot Sledgehammer into a Scalpel - http://roc.cs.berkeley.edu/papers/recursive_restartability.pdf

Out of the Tar Pit - http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.93.8928

Bulkhead Pattern - http://skife.org/architecture/fault-tolerance/2009/12/31/bulkheads.html

Making Reliable Distributed Systems in the Presence of Software Errors - http://www.erlang.org/download/armstrong_thesis_2003.pdf

On Erlang, State and Crashes - http://jlouisramblings.blogspot.be/2010/11/on-erlang-state-and-crashes.html

Akka Supervision - http://doc.akka.io/docs/akka/snapshot/general/supervision.html

Release It!: Design and Deploy Production-Ready Software - https://pragprog.com/book/mnee/release-it

Hystrix - https://github.com/Netflix/Hystrix

Akka Circuit Breaker - http://doc.akka.io/docs/akka/snapshot/common/circuitbreaker.html 

Reactive Streams - http://reactive-streams.org

Akka Streams - http://doc.akka.io/docs/akka-stream-and-http-experimental/1.0/scala/stream-introduction.html

RxJava - https://github.com/ReactiveX/RxJava

Feedback Control for Computer Systems - http://www.amazon.com/Feedback-Control-Computer-Systems-Philipp/dp/1449361692

Simian Army - https://github.com/Netflix/SimianArmy

Gatling - http://gatling.io

Akka MultiNode Testing - http://doc.akka.io/docs/akka/snapshot/dev/multi-node-testing.html
EXPERT TRAINING
Delivered On-site For Spark, Scala, Akka And Play
Help is just a click away. Get in touch with
Typesafe about our training courses.
• Intro Workshop to Apache Spark
• Fast Track & Advanced Scala
• Fast Track to Akka with Java or Scala
• Fast Track to Play with Java or Scala
• Advanced Akka with Java or Scala
Ask us about local trainings available by 24
Typesafe partners in 14 countries around
the world.
CONTACT US Learn more about on-site training
Thank
You

Contenu connexe

Similaire à Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

Resilience of Critical Infrastructures to Climate Change
Resilience of Critical Infrastructures to Climate ChangeResilience of Critical Infrastructures to Climate Change
Resilience of Critical Infrastructures to Climate Changeeu-circle
 
Resilience of Critical Infrastructures to Climate Change (old)
Resilience of Critical Infrastructures to Climate Change (old)Resilience of Critical Infrastructures to Climate Change (old)
Resilience of Critical Infrastructures to Climate Change (old)eu-circle
 
Software Availability by Resiliency
Software Availability by ResiliencySoftware Availability by Resiliency
Software Availability by ResiliencyReza Samei
 
Better by Measure: Two Tales of Disruption (Class 3, SVA Products of Design 2...
Better by Measure: Two Tales of Disruption (Class 3, SVA Products of Design 2...Better by Measure: Two Tales of Disruption (Class 3, SVA Products of Design 2...
Better by Measure: Two Tales of Disruption (Class 3, SVA Products of Design 2...Rebecca Gard Silver
 

Similaire à Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems (6)

Black Sky Thinking: Wide-Area Power Failure
Black Sky Thinking: Wide-Area Power FailureBlack Sky Thinking: Wide-Area Power Failure
Black Sky Thinking: Wide-Area Power Failure
 
Resilience of Critical Infrastructures to Climate Change
Resilience of Critical Infrastructures to Climate ChangeResilience of Critical Infrastructures to Climate Change
Resilience of Critical Infrastructures to Climate Change
 
Resilience of Critical Infrastructures to Climate Change (old)
Resilience of Critical Infrastructures to Climate Change (old)Resilience of Critical Infrastructures to Climate Change (old)
Resilience of Critical Infrastructures to Climate Change (old)
 
Software Availability by Resiliency
Software Availability by ResiliencySoftware Availability by Resiliency
Software Availability by Resiliency
 
Better by Measure: Two Tales of Disruption (Class 3, SVA Products of Design 2...
Better by Measure: Two Tales of Disruption (Class 3, SVA Products of Design 2...Better by Measure: Two Tales of Disruption (Class 3, SVA Products of Design 2...
Better by Measure: Two Tales of Disruption (Class 3, SVA Products of Design 2...
 
Accountability 2010
Accountability 2010Accountability 2010
Accountability 2010
 

Plus de Legacy Typesafe (now Lightbend)

The How and Why of Fast Data Analytics with Apache Spark
The How and Why of Fast Data Analytics with Apache SparkThe How and Why of Fast Data Analytics with Apache Spark
The How and Why of Fast Data Analytics with Apache SparkLegacy Typesafe (now Lightbend)
 
Typesafe Reactive Platform: Monitoring 1.0, Commercial features and more
Typesafe Reactive Platform: Monitoring 1.0, Commercial features and moreTypesafe Reactive Platform: Monitoring 1.0, Commercial features and more
Typesafe Reactive Platform: Monitoring 1.0, Commercial features and moreLegacy Typesafe (now Lightbend)
 
Akka 2.4 plus new commercial features in Typesafe Reactive Platform
Akka 2.4 plus new commercial features in Typesafe Reactive PlatformAkka 2.4 plus new commercial features in Typesafe Reactive Platform
Akka 2.4 plus new commercial features in Typesafe Reactive PlatformLegacy Typesafe (now Lightbend)
 
Akka 2.4 plus commercial features in Typesafe Reactive Platform
Akka 2.4 plus commercial features in Typesafe Reactive PlatformAkka 2.4 plus commercial features in Typesafe Reactive Platform
Akka 2.4 plus commercial features in Typesafe Reactive PlatformLegacy Typesafe (now Lightbend)
 
Reactive Revealed Part 2: Scalability, Elasticity and Location Transparency i...
Reactive Revealed Part 2: Scalability, Elasticity and Location Transparency i...Reactive Revealed Part 2: Scalability, Elasticity and Location Transparency i...
Reactive Revealed Part 2: Scalability, Elasticity and Location Transparency i...Legacy Typesafe (now Lightbend)
 
Microservices 101: Exploiting Reality's Constraints with Technology
Microservices 101: Exploiting Reality's Constraints with TechnologyMicroservices 101: Exploiting Reality's Constraints with Technology
Microservices 101: Exploiting Reality's Constraints with TechnologyLegacy Typesafe (now Lightbend)
 
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and DatabricksFour Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and DatabricksLegacy Typesafe (now Lightbend)
 
A Deeper Look Into Reactive Streams with Akka Streams 1.0 and Slick 3.0
A Deeper Look Into Reactive Streams with Akka Streams 1.0 and Slick 3.0A Deeper Look Into Reactive Streams with Akka Streams 1.0 and Slick 3.0
A Deeper Look Into Reactive Streams with Akka Streams 1.0 and Slick 3.0Legacy Typesafe (now Lightbend)
 
Modernizing Your Aging Architecture: What Enterprise Architects Need To Know ...
Modernizing Your Aging Architecture: What Enterprise Architects Need To Know ...Modernizing Your Aging Architecture: What Enterprise Architects Need To Know ...
Modernizing Your Aging Architecture: What Enterprise Architects Need To Know ...Legacy Typesafe (now Lightbend)
 
Reactive Streams 1.0.0 and Why You Should Care (webinar)
Reactive Streams 1.0.0 and Why You Should Care (webinar)Reactive Streams 1.0.0 and Why You Should Care (webinar)
Reactive Streams 1.0.0 and Why You Should Care (webinar)Legacy Typesafe (now Lightbend)
 
[Sneak Preview] Apache Spark: Preparing for the next wave of Reactive Big Data
[Sneak Preview] Apache Spark: Preparing for the next wave of Reactive Big Data[Sneak Preview] Apache Spark: Preparing for the next wave of Reactive Big Data
[Sneak Preview] Apache Spark: Preparing for the next wave of Reactive Big DataLegacy Typesafe (now Lightbend)
 

Plus de Legacy Typesafe (now Lightbend) (16)

The How and Why of Fast Data Analytics with Apache Spark
The How and Why of Fast Data Analytics with Apache SparkThe How and Why of Fast Data Analytics with Apache Spark
The How and Why of Fast Data Analytics with Apache Spark
 
Reactive Design Patterns
Reactive Design PatternsReactive Design Patterns
Reactive Design Patterns
 
Revitalizing Aging Architectures with Microservices
Revitalizing Aging Architectures with MicroservicesRevitalizing Aging Architectures with Microservices
Revitalizing Aging Architectures with Microservices
 
Typesafe Reactive Platform: Monitoring 1.0, Commercial features and more
Typesafe Reactive Platform: Monitoring 1.0, Commercial features and moreTypesafe Reactive Platform: Monitoring 1.0, Commercial features and more
Typesafe Reactive Platform: Monitoring 1.0, Commercial features and more
 
Akka 2.4 plus new commercial features in Typesafe Reactive Platform
Akka 2.4 plus new commercial features in Typesafe Reactive PlatformAkka 2.4 plus new commercial features in Typesafe Reactive Platform
Akka 2.4 plus new commercial features in Typesafe Reactive Platform
 
How to deploy Apache Spark 
to Mesos/DCOS
How to deploy Apache Spark 
to Mesos/DCOSHow to deploy Apache Spark 
to Mesos/DCOS
How to deploy Apache Spark 
to Mesos/DCOS
 
Akka 2.4 plus commercial features in Typesafe Reactive Platform
Akka 2.4 plus commercial features in Typesafe Reactive PlatformAkka 2.4 plus commercial features in Typesafe Reactive Platform
Akka 2.4 plus commercial features in Typesafe Reactive Platform
 
Reactive Revealed Part 2: Scalability, Elasticity and Location Transparency i...
Reactive Revealed Part 2: Scalability, Elasticity and Location Transparency i...Reactive Revealed Part 2: Scalability, Elasticity and Location Transparency i...
Reactive Revealed Part 2: Scalability, Elasticity and Location Transparency i...
 
Microservices 101: Exploiting Reality's Constraints with Technology
Microservices 101: Exploiting Reality's Constraints with TechnologyMicroservices 101: Exploiting Reality's Constraints with Technology
Microservices 101: Exploiting Reality's Constraints with Technology
 
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and DatabricksFour Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
 
A Deeper Look Into Reactive Streams with Akka Streams 1.0 and Slick 3.0
A Deeper Look Into Reactive Streams with Akka Streams 1.0 and Slick 3.0A Deeper Look Into Reactive Streams with Akka Streams 1.0 and Slick 3.0
A Deeper Look Into Reactive Streams with Akka Streams 1.0 and Slick 3.0
 
Modernizing Your Aging Architecture: What Enterprise Architects Need To Know ...
Modernizing Your Aging Architecture: What Enterprise Architects Need To Know ...Modernizing Your Aging Architecture: What Enterprise Architects Need To Know ...
Modernizing Your Aging Architecture: What Enterprise Architects Need To Know ...
 
Reactive Streams 1.0.0 and Why You Should Care (webinar)
Reactive Streams 1.0.0 and Why You Should Care (webinar)Reactive Streams 1.0.0 and Why You Should Care (webinar)
Reactive Streams 1.0.0 and Why You Should Care (webinar)
 
Going Reactive in Java with Typesafe Reactive Platform
Going Reactive in Java with Typesafe Reactive PlatformGoing Reactive in Java with Typesafe Reactive Platform
Going Reactive in Java with Typesafe Reactive Platform
 
Why Play Framework is fast
Why Play Framework is fastWhy Play Framework is fast
Why Play Framework is fast
 
[Sneak Preview] Apache Spark: Preparing for the next wave of Reactive Big Data
[Sneak Preview] Apache Spark: Preparing for the next wave of Reactive Big Data[Sneak Preview] Apache Spark: Preparing for the next wave of Reactive Big Data
[Sneak Preview] Apache Spark: Preparing for the next wave of Reactive Big Data
 

Dernier

Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf31events.com
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfStefano Stabellini
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfFerryKemperman
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsChristian Birchler
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Mater
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 

Dernier (20)

Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdf
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdf
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 

Reactive Revealed Part 3 of 3: Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems

  • 1. Resiliency, Failures vs Errors, Isolation, Delegation and Replication in Reactive Systems Jonas Bonér CTO TypEsafe @jboner
  • 2. “But it ain’t how hard you’re hit; it’s about how hard you can get hit, and keep moving forward. How much you can take, and keep moving forward. That’s how winning is done.” - Rocky Balboa
  • 3. “But it ain’t how hard you’re hit; it’s about how hard you can get hit, and keep moving forward. How much you can take, and keep moving forward. That’s how winning is done.” - Rocky Balboa This is Fault Tolerance
  • 5. Resilience “The ability of a substance or object to spring back into shape. The capacity to recover quickly from difficulties.” -Merriam Webster
  • 8. “We can model and understand in isolation. 
 But, when released into competitive nominally regulated societies, their connections proliferate, 
 their interactions and interdependencies multiply, 
 their complexities mushroom. 
 And we are caught short.” - Sidney Dekker Drift into Failure - Sidney Dekker
  • 9. We Need to Study Resilience in Complex Systems
  • 16. “Counterintuitive. That’s [Jay] Forrester’s word to describe complex systems. Leverage points are not intuitive. Or if they are, we intuitively use them backward, systematically worsening whatever problems we are trying to solve.” - Donella Meadows Leverage Points: Places to Intervene in a System - Donella Meadows
  • 17. ‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013 Operating at the Edge of Failure
  • 18. ‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013 Economic Failure Boundary Operating at the Edge of Failure
  • 19. ‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013 Economic Failure Boundary Unacceptable Workload Boundary Operating at the Edge of Failure
  • 20. ‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013 Economic Failure Boundary Unacceptable Workload Boundary Accident Boundary Operating at the Edge of Failure
  • 21. ‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013 Economic Failure Boundary Unacceptable Workload Boundary Operating Point Accident Boundary Operating at the Edge of Failure
  • 22. ‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013 Economic Failure Boundary Unacceptable Workload Boundary Accident Boundary Operating at the Edge of Failure
  • 23. ‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013 Economic Failure Boundary Unacceptable Workload Boundary FAILURE Accident Boundary Operating at the Edge of Failure
  • 24. Economic Failure Boundary Unacceptable Workload Boundary Accident Boundary ‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013 Operating at the Edge of Failure
  • 25. Economic Failure Boundary Unacceptable Workload Boundary Accident Boundary Management Pressure Towards Economic Efficiency ‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013 Operating at the Edge of Failure
  • 26. Economic Failure Boundary Unacceptable Workload Boundary Accident Boundary Management Pressure Towards Economic Efficiency Gradient Towards Least Effort ‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013 Operating at the Edge of Failure
  • 27. Economic Failure Boundary Unacceptable Workload Boundary Accident Boundary Management Pressure Towards Economic Efficiency Gradient Towards Least Effort ‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013 Operating at the Edge of Failure
  • 28. Economic Failure Boundary Unacceptable Workload Boundary Accident Boundary Management Pressure Towards Economic Efficiency Gradient Towards Least Effort Counter Gradient For More Resilience ‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013 Operating at the Edge of Failure
  • 29. ‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013 Economic Failure Boundary Unacceptable Workload Boundary Accident Boundary Operating at the Edge of Failure
  • 30. ‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013 Economic Failure Boundary Unacceptable Workload Boundary Accident Boundary Error Margin Marginal Boundary Operating at the Edge of Failure
  • 31. ‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013 Economic Failure Boundary Unacceptable Workload Boundary Accident Boundary Error Margin Marginal Boundary Operating at the Edge of Failure
  • 32. ‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013 Economic Failure Boundary Unacceptable Workload Boundary Accident Boundary Error Margin Marginal Boundary Operating at the Edge of Failure
  • 33. ‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013 Accident Boundary Marginal Boundary Operating at the Edge of Failure
  • 34. ‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013 Marginal Boundary ? Operating at the Edge of Failure
  • 35. ‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013 Operating at the Edge of Failure Accident Boundary Marginal Boundary
  • 36. ‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013 Operating at the Edge of Failure Accident Boundary Marginal Boundary
  • 37. ‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013 Operating at the Edge of Failure Accident Boundary Marginal Boundary
  • 38. ‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013 Operating at the Edge of Failure Accident Boundary Marginal Boundary
  • 41. Simple Critical Infrastructure Map Understanding vital services, and how they keep you safe 6ways to die 3sets of essential services 7layers of PROTECTION Dealing in Security - Mike Bennet, Vinay Gupta
  • 42. 7 Principles for Building Resilience in Social Systems 1. Maintain diversity & Redundancy 2. Manage connectivity 3. Manage slow variables & feedback 4. Foster complex adaptive systems thinking 5. Encourage learning 6. Broaden participation 7. Promote polycentric governance Principles for Building Resilience: Sustaining Ecosystem Services in Social-Ecological Systems - Reinette Biggs et. al.
  • 44. MeerkatsPuppies! Now that I’ve got your attention, complexity theory - Nicolas Perony, TED talk
  • 45. What We Can Learn From Biological Systems 1. Feature Diversity and redundancy 2. Inter-Connected network structure 3. Wide distribution across all scales 4. Capacity to self-adapt & self-organize Toward Resilient Architectures 1: Biology Lessons - Michael Mehaffy, Nikos A. Salingaros
  • 46. “Animals show extraordinary social complexity, and this allows them to adapt and 
 respond to changes in their environment. In three words, in the animal kingdom, simplicity leads to complexity 
 which leads to resilience.” - Nicolas Perony Puppies! Now that I’ve got your attention, complexity theory - Nicolas Perony, TED talk
  • 48.
  • 49. “Complex systems run in degraded mode.” “Complex systems run as broken systems.” - richard Cook How Complex Systems Fail - Richard Cook
  • 50. Resilience is by Design Photo courtesy of FEMA/Joselyne Augustino
  • 51. We Need to Manage Failure
  • 52. “Post-accident attribution to a 
 ‘root cause’ is fundamentally wrong: 
 Because overt failure requires multiple faults, there is no isolated ‘cause’ of an accident.” - richard Cook How Complex Systems Fail - Richard Cook
  • 54. Crash Only Software Crash-Only Software - George Candea, Armando Fox Stop = Crash Safely Start = Recover Fast
  • 55. “To make a system of interconnected components crash-only, it must be designed so that components can tolerate the crashes and temporary unavailability of their peers. This means we require: [1] strong modularity with relatively impermeable component boundaries, [2] timeout-based communication and lease-based resource allocation, and [3] self- describing requests that carry a time-to-live and information on whether they are idempotent.” - George Candea, Armando Fox Crash-Only Software - George Candea, Armando Fox
  • 56. Recursive Restartability Turning the Crash-Only Sledgehammer into a Scalpel Recursive Restartability: Turning the Reboot Sledgehammer into a Scalpel - George Candea, Armando Fox
  • 57. Services need to accept NO for an answer
  • 58. "Software components should be designed such that they can deny service for any request or call. Then, if an underlying component can say No, apps must be designed to take No for an answer and decide how to proceed: give up, wait and retry, reduce fidelity, etc.” - George Candea, Armando Fox Recursive Restartability: Turning the Reboot Sledgehammer into a Scalpel - George Candea, Armando Fox Learn to take NO for an answer
  • 59. “The explosive growth of software has added greatly to systems’ interactive complexity. With software, the possible states that a system can end up in become mind-boggling.” - Sidney Dekker Drift into Failure - Sidney Dekker
  • 60. Classification of State • Static Data • Scratch Data • Dynamic Data • Recomputable • not recomputable
  • 61. Classification of State • Static Data • Scratch Data • Dynamic Data • Recomputable • not recomputable Critical
  • 62. We Need a Way Out of the State Tar Pit Out of the Tar Pit - Ben Moseley , Peter Marks
  • 63. Essential State We Need a Way Out of the State Tar Pit Out of the Tar Pit - Ben Moseley , Peter Marks
  • 64. Essential State We Need a Way Out of the State Tar Pit Out of the Tar Pit - Ben Moseley , Peter Marks Essential Logic
  • 65. Essential State We Need a Way Out of the State Tar Pit Out of the Tar Pit - Ben Moseley , Peter Marks Essential Logic Accidental State and Control
  • 66. Traditional State Management Object Critical state that needs protection Client Thread boundary
  • 67. Traditional State Management Object Critical state that needs protection Client Thread boundary
  • 68. Traditional State Management Object Critical state that needs protection Client Thread boundary
  • 69. Traditional State Management Object Critical state that needs protection Client Thread boundary Synchronous dispatch Thread boundary
  • 70. Traditional State Management Object Critical state that needs protection Client Thread boundary Synchronous dispatch Thread boundary
  • 71. Traditional State Management Object Critical state that needs protection Client Thread boundary Synchronous dispatch Thread boundary ?
  • 72. Traditional State Management Object Critical state that needs protection Client Thread boundary Synchronous dispatch Thread boundary ? Utterly broken
  • 73. Requirements for a Sane Failure Mode 1. Contained 2. Reified—as messages 3. Signalled—Asynchronously 4. Observed—by 1-N 5. Managed Failures need to be
  • 75. Akka Actors case class Greeting(who: String) class GreetingActor extends Actor with ActorLogging { def receive = { case Greeting(who) => log.info(s”Hello ${who}") } }
  • 76. Akka Actors case class Greeting(who: String) class GreetingActor extends Actor with ActorLogging { def receive = { case Greeting(who) => log.info(s”Hello ${who}") } } Define the message(s) the Actor should be able to respond to
  • 77. Akka Actors case class Greeting(who: String) class GreetingActor extends Actor with ActorLogging { def receive = { case Greeting(who) => log.info(s”Hello ${who}") } } Define the message(s) the Actor should be able to respond to Define the Actor class
  • 78. Akka Actors case class Greeting(who: String) class GreetingActor extends Actor with ActorLogging { def receive = { case Greeting(who) => log.info(s”Hello ${who}") } } Define the message(s) the Actor should be able to respond to Define the Actor class Define the Actor’s behavior
  • 79. Akka Actors case class Greeting(who: String) class GreetingActor extends Actor with ActorLogging { def receive = { case Greeting(who) => log.info(s”Hello ${who}") } }
  • 80. Akka Actors case class Greeting(who: String) class GreetingActor extends Actor with ActorLogging { def receive = { case Greeting(who) => log.info(s”Hello ${who}") } } val system = ActorSystem("MySystem") val greeter = system.actorOf(Props[GreetingActor], "greeter")
  • 81. Akka Actors case class Greeting(who: String) class GreetingActor extends Actor with ActorLogging { def receive = { case Greeting(who) => log.info(s”Hello ${who}") } } val system = ActorSystem("MySystem") val greeter = system.actorOf(Props[GreetingActor], "greeter") Create an Actor system
  • 82. Akka Actors case class Greeting(who: String) class GreetingActor extends Actor with ActorLogging { def receive = { case Greeting(who) => log.info(s”Hello ${who}") } } val system = ActorSystem("MySystem") val greeter = system.actorOf(Props[GreetingActor], "greeter") Create an Actor system Actor configuration
  • 83. Akka Actors case class Greeting(who: String) class GreetingActor extends Actor with ActorLogging { def receive = { case Greeting(who) => log.info(s”Hello ${who}") } } val system = ActorSystem("MySystem") val greeter = system.actorOf(Props[GreetingActor], "greeter") Give it a name Create an Actor system Actor configuration
  • 84. Akka Actors case class Greeting(who: String) class GreetingActor extends Actor with ActorLogging { def receive = { case Greeting(who) => log.info(s”Hello ${who}") } } val system = ActorSystem("MySystem") val greeter = system.actorOf(Props[GreetingActor], "greeter") Give it a nameCreate the Actor Create an Actor system Actor configuration
  • 85. Akka Actors case class Greeting(who: String) class GreetingActor extends Actor with ActorLogging { def receive = { case Greeting(who) => log.info(s”Hello ${who}") } } val system = ActorSystem("MySystem") val greeter = system.actorOf(Props[GreetingActor], "greeter") Give it a nameCreate the ActorYou get an ActorRef back Create an Actor system Actor configuration
  • 86. Akka Actors case class Greeting(who: String) class GreetingActor extends Actor with ActorLogging { def receive = { case Greeting(who) => log.info(s”Hello ${who}") } } val system = ActorSystem("MySystem") val greeter = system.actorOf(Props[GreetingActor], "greeter")
  • 87. Akka Actors case class Greeting(who: String) class GreetingActor extends Actor with ActorLogging { def receive = { case Greeting(who) => log.info(s”Hello ${who}") } } val system = ActorSystem("MySystem") val greeter = system.actorOf(Props[GreetingActor], "greeter") greeter ! Greeting("Charlie Parker")
  • 88. Akka Actors case class Greeting(who: String) class GreetingActor extends Actor with ActorLogging { def receive = { case Greeting(who) => log.info(s”Hello ${who}") } } val system = ActorSystem("MySystem") val greeter = system.actorOf(Props[GreetingActor], "greeter") greeter ! Greeting("Charlie Parker") Send the message asynchronously
  • 89. Akka Actors case class Greeting(who: String) class GreetingActor extends Actor with ActorLogging { def receive = { case Greeting(who) => log.info(s”Hello ${who}") } } val system = ActorSystem("MySystem") val greeter = system.actorOf(Props[GreetingActor], "greeter") greeter ! Greeting("Charlie Parker")
  • 102. Think Vending Machine Programmer Inserts coins Out of coffee beans error Coffee Machine
  • 103. Think Vending Machine Programmer Inserts coins Out of coffee beans error WRONG Coffee Machine
  • 105. Think Vending Machine Programmer Inserts coins Out of coffee beans failure Coffee Machine
  • 106. Think Vending Machine Programmer Service Guy Inserts coins Out of coffee beans failure Coffee Machine
  • 107. Think Vending Machine Programmer Service Guy Inserts coins Out of coffee beans failure Adds more beans Coffee Machine
  • 108. Think Vending Machine Programmer Service Guy Inserts coins Gets coffee Out of coffee beans failure Adds more beans Coffee Machine
  • 117.
  • 118. Error Kernel Pattern Onion-layered state & Failure management Making reliable distributed systems in the presence of software errors - Joe Armstrong On Erlang, State and Crashes - Jesper Louis Andersen
  • 119. Onion Layered State Management Object Critical state that needs protection Client Thread boundary
  • 120. Onion Layered State Management Object Critical state that needs protection Client Thread boundary
  • 121. Onion Layered State Management Error Kernel Object Critical state that needs protection Client Thread boundary
  • 122. Onion Layered State Management Error Kernel Object Critical state that needs protection Client Thread boundary
  • 123. Onion Layered State Management Error Kernel Object Critical state that needs protection Client Supervision Thread boundary
  • 124. Onion Layered State Management Error Kernel Object Critical state that needs protection Client Supervision Supervision Thread boundary
  • 125. Onion Layered State Management Error Kernel Object Critical state that needs protection Client Supervision Supervision Thread boundary
  • 126. Onion Layered State Management Error Kernel Object Critical state that needs protection Client Supervision Supervision Thread boundary
  • 127. Onion Layered State Management Error Kernel Object Critical state that needs protection Client Supervision Supervision Thread boundary
  • 128. Onion Layered State Management Error Kernel Object Critical state that needs protection Client Supervision Supervision Thread boundary
  • 129. Onion Layered State Management Error Kernel Object Critical state that needs protection Client Supervision Supervision Thread boundary
  • 130. Onion Layered State Management Error Kernel Object Critical state that needs protection Client Supervision Supervision Thread boundary
  • 131. Onion Layered State Management Error Kernel Object Critical state that needs protection Client Supervision Supervision Thread boundary
  • 132. Onion Layered State Management Error Kernel Object Critical state that needs protection Client Supervision Supervision Thread boundary
  • 133. Supervision in Akka Every actor has a default supervisor strategy. Which can, and often should, be overridden. class Supervisor extends Actor { override val supervisorStrategy = OneForOneStrategy(maxNrOfRetries = 10, withinTimeRange = 1 minute) { case _: ArithmeticException => Resume case _: NullPointerException => Restart case _: Exception => Escalate } val worker = context.actorOf(Props[Worker], name = "worker") def receive = { case number: Int => worker.forward(number) } }
  • 134. Supervision in Akka Every actor has a default supervisor strategy. Which can, and often should, be overridden. class Supervisor extends Actor { override val supervisorStrategy = OneForOneStrategy(maxNrOfRetries = 10, withinTimeRange = 1 minute) { case _: ArithmeticException => Resume case _: NullPointerException => Restart case _: Exception => Escalate } val worker = context.actorOf(Props[Worker], name = "worker") def receive = { case number: Int => worker.forward(number) } } Parent actor
  • 135. Supervision in Akka Every actor has a default supervisor strategy. Which can, and often should, be overridden. class Supervisor extends Actor { override val supervisorStrategy = OneForOneStrategy(maxNrOfRetries = 10, withinTimeRange = 1 minute) { case _: ArithmeticException => Resume case _: NullPointerException => Restart case _: Exception => Escalate } val worker = context.actorOf(Props[Worker], name = "worker") def receive = { case number: Int => worker.forward(number) } } Parent actor All its children have their life-cycle managed through this declarative supervision strategy
  • 136. Supervision in Akka Every actor has a default supervisor strategy. Which can, and often should, be overridden. class Supervisor extends Actor { override val supervisorStrategy = OneForOneStrategy(maxNrOfRetries = 10, withinTimeRange = 1 minute) { case _: ArithmeticException => Resume case _: NullPointerException => Restart case _: Exception => Escalate } val worker = context.actorOf(Props[Worker], name = "worker") def receive = { case number: Int => worker.forward(number) } } Create a supervised child actor Parent actor All its children have their life-cycle managed through this declarative supervision strategy
  • 137. Monitor through Death Watch class Watcher extends Actor { val child = context.actorOf(Props.empty, "child") context.watch(child) def receive = { case Terminated(`child`) => … // handle child termination } }
  • 138. Monitor through Death Watch class Watcher extends Actor { val child = context.actorOf(Props.empty, "child") context.watch(child) def receive = { case Terminated(`child`) => … // handle child termination } } Create a child actor
  • 139. Monitor through Death Watch class Watcher extends Actor { val child = context.actorOf(Props.empty, "child") context.watch(child) def receive = { case Terminated(`child`) => … // handle child termination } } Create a child actor Watch it
  • 140. Monitor through Death Watch class Watcher extends Actor { val child = context.actorOf(Props.empty, "child") context.watch(child) def receive = { case Terminated(`child`) => … // handle child termination } } Create a child actor Watch it Receive termination signal
  • 143. Akka Routing akka { actor { deployment { /service/router { router = round-robin-pool resizer { lower-bound = 12 upper-bound = 15 } } } } }
  • 144. Akka Cluster akka { actor { deployment { /service/router { router = round-robin-pool resizer { lower-bound = 12 upper-bound = 15 } } } provider = "akka.cluster.ClusterActorRefProvider" }    cluster { seed-nodes = [ “akka.tcp://ClusterSystem@127.0.0.1:2551", “akka.tcp://ClusterSystem@127.0.0.1:2552" ]   } }
  • 147. We are living in the Looming Shadowof Impossibility Theorems
  • 148. We are living in the Looming Shadowof Impossibility Theorems CAP: Consistency is impossible
  • 149. We are living in the Looming Shadowof Impossibility Theorems CAP: Consistency is impossible FLP: Consensus is impossible
  • 152. Resilient Protocols Depend on Asynchronous Communication Eventual Consistency
  • 153. Resilient Protocols • are tolerant to • Message loss • Message reordering • Message duplication Depend on Asynchronous Communication Eventual Consistency
  • 154. Resilient Protocols • are tolerant to • Message loss • Message reordering • Message duplication • Embrace ACID 2.0 • Associative • Commutative • Idempotent • Distributed Depend on Asynchronous Communication Eventual Consistency
  • 155. Akka Distributed Data class DataBot extends Actor with ActorLogging { val replicator = DistributedData(context.system).replicator implicit val node = Cluster(context.system) val tickTask = context.system.scheduler.schedule( 5.seconds, 5.seconds, self, Add) val DataKey = ORSetKey[String]("key") replicator ! Subscribe(DataKey, self)
  • 156. Akka Distributed Data class DataBot extends Actor with ActorLogging { val replicator = DistributedData(context.system).replicator implicit val node = Cluster(context.system) val tickTask = context.system.scheduler.schedule( 5.seconds, 5.seconds, self, Add) val DataKey = ORSetKey[String]("key") replicator ! Subscribe(DataKey, self) def receive = { case Tick => val s = ThreadLocalRandom.current().nextInt(97, 123).toChar.toString if (ThreadLocalRandom.current().nextBoolean()) replicator ! Update(DataKey, ORSet.empty[String], WriteLocal)(_ + s) else replicator ! Update(DataKey, ORSet.empty[String], WriteLocal)(_ - s) case c @ Changed(DataKey) => val data = c.get(DataKey) case _: UpdateResponse[_] => // ignore } }
  • 157. “An escalator can never break: it can only become stairs. You should never see an Escalator Temporarily Out Of Order sign, just Escalator Temporarily Stairs. Sorry for the convenience.” - Mitch Hedberg
  • 160. Always Rely on Asynchronous Communication …And If not pOssiblE Always use Timeouts
  • 162. Circuit Breaker in Akka val breaker = new CircuitBreaker( context.system.scheduler, maxFailures = 5, callTimeout = 10.seconds, resetTimeout = 1.minute ).onOpen(…) .onHalfOpen(…) val result = breaker.withCircuitBreaker(Future(dangerousCall()))
  • 163. Little’s Law W: Response Time L: Queue Length
  • 164. Little’s Law Queue Length = Arrival Rate * Response Time W: Response Time L: Queue Length
  • 165. Little’s Law Response Time = Queue Length / Arrival Rate W: Response Time L: Queue Length
  • 167. Flow Control Always Apply Back Pressure
  • 170. Akka Streams in ~> f1 ~> bcast ~> f2 ~> merge ~> f3 ~> out bcast ~> f4 ~> merge
  • 171. Akka Streams in ~> f1 ~> bcast ~> f2 ~> merge ~> f3 ~> out bcast ~> f4 ~> merge val g = FlowGraph.closed() { implicit builder: FlowGraph.Builder => import FlowGraph.Implicits._ val in = Source(1 to 10) val out = Sink.ignore val bcast = builder.add(Broadcast[Int](2)) val merge = builder.add(Merge[Int](2)) val f1, f2, f3, f4 = Flow[Int].map(_ + 10) }
  • 172. Akka Streams in ~> f1 ~> bcast ~> f2 ~> merge ~> f3 ~> out bcast ~> f4 ~> merge val g = FlowGraph.closed() { implicit builder: FlowGraph.Builder => import FlowGraph.Implicits._ val in = Source(1 to 10) val out = Sink.ignore val bcast = builder.add(Broadcast[Int](2)) val merge = builder.add(Merge[Int](2)) val f1, f2, f3, f4 = Flow[Int].map(_ + 10) } Set up the context
  • 173. Akka Streams in ~> f1 ~> bcast ~> f2 ~> merge ~> f3 ~> out bcast ~> f4 ~> merge val g = FlowGraph.closed() { implicit builder: FlowGraph.Builder => import FlowGraph.Implicits._ val in = Source(1 to 10) val out = Sink.ignore val bcast = builder.add(Broadcast[Int](2)) val merge = builder.add(Merge[Int](2)) val f1, f2, f3, f4 = Flow[Int].map(_ + 10) } Set up the context Create the Source and Sink
  • 174. Akka Streams in ~> f1 ~> bcast ~> f2 ~> merge ~> f3 ~> out bcast ~> f4 ~> merge val g = FlowGraph.closed() { implicit builder: FlowGraph.Builder => import FlowGraph.Implicits._ val in = Source(1 to 10) val out = Sink.ignore val bcast = builder.add(Broadcast[Int](2)) val merge = builder.add(Merge[Int](2)) val f1, f2, f3, f4 = Flow[Int].map(_ + 10) } Set up the context Create the Source and Sink Create the fan out and fan in stages
  • 175. Akka Streams in ~> f1 ~> bcast ~> f2 ~> merge ~> f3 ~> out bcast ~> f4 ~> merge val g = FlowGraph.closed() { implicit builder: FlowGraph.Builder => import FlowGraph.Implicits._ val in = Source(1 to 10) val out = Sink.ignore val bcast = builder.add(Broadcast[Int](2)) val merge = builder.add(Merge[Int](2)) val f1, f2, f3, f4 = Flow[Int].map(_ + 10) } Set up the context Create the Source and Sink Create the fan out and fan in stages Create a set of transformations
  • 176. Akka Streams in ~> f1 ~> bcast ~> f2 ~> merge ~> f3 ~> out bcast ~> f4 ~> merge val g = FlowGraph.closed() { implicit builder: FlowGraph.Builder => import FlowGraph.Implicits._ val in = Source(1 to 10) val out = Sink.ignore val bcast = builder.add(Broadcast[Int](2)) val merge = builder.add(Merge[Int](2)) val f1, f2, f3, f4 = Flow[Int].map(_ + 10) } Set up the context Create the Source and Sink Create the fan out and fan in stages Create a set of transformations Define the graph processing blueprint
  • 178. What can we learn from Arnold?
  • 179. What can we learn from Arnold?
  • 180. What can we learn from Arnold? Blow things up
  • 182. Pull the Plug …and see what happens
  • 183.
  • 185. “Complex systems run in degraded mode.” “Complex systems run as broken systems.” - richard Cook How Complex Systems Fail - Richard Cook
  • 186. Resilience is by Design Photo courtesy of FEMA/Joselyne Augustino
  • 187. ReferencesAntifragile: Things That Gain from Disorder - http://www.amazon.com/Antifragile-Things-that-Gain-Disorder-ebook/dp/B009K6DKTS Drift into Failure - http://www.amazon.com/Drift-into-Failure-Components-Understanding-ebook/dp/B009KOKXKY How Complex Systems Fail - http://web.mit.edu/2.75/resources/random/How%20Complex%20Systems%20Fail.pdf Leverage Points: Places to Intervene in a System - http://www.donellameadows.org/archives/leverage-points-places-to-intervene-in-a-system/ Going Solid: A Model of System Dynamics and Consequences for Patient Safety - http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1743994/ Resilience in Complex Adaptive Systems: Operating at the Edge of Failure - https://www.youtube.com/watch?v=PGLYEDpNu60 Dealing in Security - http://resiliencemaps.org/files/Dealing_in_Security.July2010.en.pdf Principles for Building Resilience: Sustaining Ecosystem Services in Social-Ecological Systems - http://www.amazon.com/Principles- Building-Resilience-Sustaining-Social-Ecological/dp/110708265X Puppies! Now that I’ve got your attention, Complexity Theory - https://www.ted.com/talks/ nicolas_perony_puppies_now_that_i_ve_got_your_attention_complexity_theory How Bacteria Becomes Resistant - http://www.abc.net.au/science/slab/antibiotics/resistance.htm Towards Resilient Architectures: Biology Lessons - http://www.metropolismag.com/Point-of-View/March-2013/Toward-Resilient- Architectures-1-Biology-Lessons/ Crash-Only Software - https://www.usenix.org/legacy/events/hotos03/tech/full_papers/candea/candea.pdf Recursive Restartability: Turning the Reboot Sledgehammer into a Scalpel - http://roc.cs.berkeley.edu/papers/recursive_restartability.pdf Out of the Tar Pit - http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.93.8928 Bulkhead Pattern - http://skife.org/architecture/fault-tolerance/2009/12/31/bulkheads.html Making Reliable Distributed Systems in the Presence of Software Errors - http://www.erlang.org/download/armstrong_thesis_2003.pdf On Erlang, State and Crashes - http://jlouisramblings.blogspot.be/2010/11/on-erlang-state-and-crashes.html Akka Supervision - http://doc.akka.io/docs/akka/snapshot/general/supervision.html Release It!: Design and Deploy Production-Ready Software - https://pragprog.com/book/mnee/release-it Hystrix - https://github.com/Netflix/Hystrix Akka Circuit Breaker - http://doc.akka.io/docs/akka/snapshot/common/circuitbreaker.html Reactive Streams - http://reactive-streams.org Akka Streams - http://doc.akka.io/docs/akka-stream-and-http-experimental/1.0/scala/stream-introduction.html RxJava - https://github.com/ReactiveX/RxJava Feedback Control for Computer Systems - http://www.amazon.com/Feedback-Control-Computer-Systems-Philipp/dp/1449361692 Simian Army - https://github.com/Netflix/SimianArmy Gatling - http://gatling.io Akka MultiNode Testing - http://doc.akka.io/docs/akka/snapshot/dev/multi-node-testing.html
  • 188. EXPERT TRAINING Delivered On-site For Spark, Scala, Akka And Play Help is just a click away. Get in touch with Typesafe about our training courses. • Intro Workshop to Apache Spark • Fast Track & Advanced Scala • Fast Track to Akka with Java or Scala • Fast Track to Play with Java or Scala • Advanced Akka with Java or Scala Ask us about local trainings available by 24 Typesafe partners in 14 countries around the world. CONTACT US Learn more about on-site training