Part 3: What you should know about Resiliency, Errors vs Failures, Isolation (and Containment), Delegation and Replication in Reactive systems
In the final webinar with live Q/A in the Reactive Revealed series, we explore the way that Reactive systems maintain resiliency with an infrastructural approach designed to welcome failure often and recover gracefully. Presented by Reactive Manifesto co-author, Akka creator and CTO at Typesafe, Inc., Jonas Bonér explores what you should know about:
What you should know about maintaining resiliency with monolithic systems compared to distributed systems
How Reactive systems handle errors and prevents catastrophic failures with isolation and containment, delegation and replication
How isolation (and containment) of error state and behavior works to block the ripple effect of cascading failures
How delegation of failure management and replication lets Reactive systems continue running in the face of failures using a different error handling context, on a different thread or thread pool, in a different process, or on a different network node or computing center
Previous
Part 1 - Asynchronous I/O, Back-pressure and the Message-driven vs. Event-driven approach in Reactive systems | presented by Konrad Malawski
Part 2 - Elasticity, Scalability and Location Transparency in Reactive Systems | presented by Viktor Klang
2. “But it ain’t how hard you’re hit;
it’s about how hard you can get
hit, and keep moving forward.
How much you can take, and
keep moving forward. That’s
how winning is done.”
- Rocky Balboa
3. “But it ain’t how hard you’re hit;
it’s about how hard you can get
hit, and keep moving forward.
How much you can take, and
keep moving forward. That’s
how winning is done.”
- Rocky Balboa
This is Fault Tolerance
8. “We can model and understand in isolation.
But, when released into competitive nominally
regulated societies, their connections proliferate,
their interactions and interdependencies multiply,
their complexities mushroom.
And we are caught short.”
- Sidney Dekker
Drift into Failure - Sidney Dekker
16. “Counterintuitive. That’s [Jay] Forrester’s word
to describe complex systems. Leverage points
are not intuitive. Or if they are, we intuitively
use them backward, systematically worsening
whatever problems we are trying to solve.”
- Donella Meadows
Leverage Points: Places to Intervene in a System - Donella Meadows
17. ‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen
Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013
Operating at the Edge of Failure
18. ‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen
Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013
Economic
Failure
Boundary
Operating at the Edge of Failure
19. ‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen
Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013
Economic
Failure
Boundary
Unacceptable
Workload
Boundary
Operating at the Edge of Failure
20. ‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen
Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013
Economic
Failure
Boundary
Unacceptable
Workload
Boundary
Accident
Boundary
Operating at the Edge of Failure
21. ‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen
Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013
Economic
Failure
Boundary
Unacceptable
Workload
Boundary
Operating Point
Accident
Boundary
Operating at the Edge of Failure
22. ‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen
Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013
Economic
Failure
Boundary
Unacceptable
Workload
Boundary
Accident
Boundary
Operating at the Edge of Failure
23. ‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen
Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013
Economic
Failure
Boundary
Unacceptable
Workload
Boundary
FAILURE
Accident
Boundary
Operating at the Edge of Failure
29. ‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen
Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013
Economic
Failure
Boundary
Unacceptable
Workload
Boundary
Accident
Boundary
Operating at the Edge of Failure
30. ‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen
Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013
Economic
Failure
Boundary
Unacceptable
Workload
Boundary
Accident
Boundary
Error Margin
Marginal
Boundary
Operating at the Edge of Failure
31. ‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen
Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013
Economic
Failure
Boundary
Unacceptable
Workload
Boundary
Accident
Boundary
Error Margin
Marginal
Boundary
Operating at the Edge of Failure
32. ‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen
Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013
Economic
Failure
Boundary
Unacceptable
Workload
Boundary
Accident
Boundary
Error Margin
Marginal
Boundary
Operating at the Edge of Failure
33. ‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen
Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013
Accident
Boundary
Marginal
Boundary
Operating at the Edge of Failure
34. ‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen
Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013
Marginal
Boundary
?
Operating at the Edge of Failure
35. ‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen
Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013
Operating at the Edge of Failure
Accident
Boundary
Marginal
Boundary
36. ‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen
Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013
Operating at the Edge of Failure
Accident
Boundary
Marginal
Boundary
37. ‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen
Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013
Operating at the Edge of Failure
Accident
Boundary
Marginal
Boundary
38. ‘‘Going solid’’: a model of system dynamics and consequences for patient safety - R Cook, J Rasmussen
Resilience in complex adaptive systems: Operating at the Edge of Failure - Richard Cook - Talk at Velocity NY 2013
Operating at the Edge of Failure
Accident
Boundary
Marginal
Boundary
41. Simple Critical Infrastructure Map
Understanding vital services, and how they keep you safe
6ways to die
3sets of essential services
7layers of PROTECTION
Dealing in Security - Mike Bennet, Vinay Gupta
42. 7 Principles for Building
Resilience in Social Systems
1. Maintain diversity & Redundancy
2. Manage connectivity
3. Manage slow variables & feedback
4. Foster complex adaptive systems thinking
5. Encourage learning
6. Broaden participation
7. Promote polycentric governance
Principles for Building Resilience: Sustaining Ecosystem Services in Social-Ecological Systems - Reinette Biggs et. al.
45. What We Can Learn
From Biological Systems
1. Feature Diversity and redundancy
2. Inter-Connected network structure
3. Wide distribution across all scales
4. Capacity to self-adapt & self-organize
Toward Resilient Architectures 1: Biology Lessons - Michael Mehaffy, Nikos A. Salingaros
46. “Animals show extraordinary social complexity,
and this allows them to adapt and
respond to changes in their environment.
In three words, in the animal kingdom,
simplicity leads to complexity
which leads to resilience.”
- Nicolas Perony
Puppies! Now that I’ve got your attention, complexity theory - Nicolas Perony, TED talk
52. “Post-accident attribution to a
‘root cause’ is fundamentally wrong:
Because overt failure requires multiple faults,
there is no isolated ‘cause’ of an accident.”
- richard Cook
How Complex Systems Fail - Richard Cook
55. “To make a system of interconnected components
crash-only, it must be designed so that components
can tolerate the crashes and temporary unavailability
of their peers. This means we require: [1] strong
modularity with relatively impermeable component
boundaries, [2] timeout-based communication and
lease-based resource allocation, and [3] self-
describing requests that carry a time-to-live and
information on whether they are idempotent.”
- George Candea, Armando Fox
Crash-Only Software - George Candea, Armando Fox
56. Recursive Restartability
Turning the Crash-Only Sledgehammer into a Scalpel
Recursive Restartability: Turning the Reboot Sledgehammer into a Scalpel - George Candea, Armando Fox
58. "Software components should be designed such
that they can deny service for any request or call.
Then, if an underlying component can say No,
apps must be designed to take No for an answer
and decide how to proceed: give up, wait and
retry, reduce fidelity, etc.”
- George Candea, Armando Fox
Recursive Restartability: Turning the Reboot Sledgehammer into a Scalpel - George Candea, Armando Fox
Learn to take
NO for an answer
59. “The explosive growth of software has
added greatly to systems’ interactive
complexity. With software, the possible
states that a system can end up in become
mind-boggling.”
- Sidney Dekker
Drift into Failure - Sidney Dekker
73. Requirements for a
Sane Failure Mode
1. Contained
2. Reified—as messages
3. Signalled—Asynchronously
4. Observed—by 1-N
5. Managed
Failures need to be
75. Akka Actors
case class Greeting(who: String)
class GreetingActor extends Actor with ActorLogging {
def receive = {
case Greeting(who) => log.info(s”Hello ${who}")
}
}
76. Akka Actors
case class Greeting(who: String)
class GreetingActor extends Actor with ActorLogging {
def receive = {
case Greeting(who) => log.info(s”Hello ${who}")
}
}
Define the message(s) the Actor
should be able to respond to
77. Akka Actors
case class Greeting(who: String)
class GreetingActor extends Actor with ActorLogging {
def receive = {
case Greeting(who) => log.info(s”Hello ${who}")
}
}
Define the message(s) the Actor
should be able to respond to
Define the Actor class
78. Akka Actors
case class Greeting(who: String)
class GreetingActor extends Actor with ActorLogging {
def receive = {
case Greeting(who) => log.info(s”Hello ${who}")
}
}
Define the message(s) the Actor
should be able to respond to
Define the Actor class
Define the Actor’s behavior
79. Akka Actors
case class Greeting(who: String)
class GreetingActor extends Actor with ActorLogging {
def receive = {
case Greeting(who) => log.info(s”Hello ${who}")
}
}
80. Akka Actors
case class Greeting(who: String)
class GreetingActor extends Actor with ActorLogging {
def receive = {
case Greeting(who) => log.info(s”Hello ${who}")
}
}
val system = ActorSystem("MySystem")
val greeter = system.actorOf(Props[GreetingActor], "greeter")
81. Akka Actors
case class Greeting(who: String)
class GreetingActor extends Actor with ActorLogging {
def receive = {
case Greeting(who) => log.info(s”Hello ${who}")
}
}
val system = ActorSystem("MySystem")
val greeter = system.actorOf(Props[GreetingActor], "greeter")
Create an Actor system
82. Akka Actors
case class Greeting(who: String)
class GreetingActor extends Actor with ActorLogging {
def receive = {
case Greeting(who) => log.info(s”Hello ${who}")
}
}
val system = ActorSystem("MySystem")
val greeter = system.actorOf(Props[GreetingActor], "greeter")
Create an Actor system
Actor configuration
83. Akka Actors
case class Greeting(who: String)
class GreetingActor extends Actor with ActorLogging {
def receive = {
case Greeting(who) => log.info(s”Hello ${who}")
}
}
val system = ActorSystem("MySystem")
val greeter = system.actorOf(Props[GreetingActor], "greeter")
Give it a name
Create an Actor system
Actor configuration
84. Akka Actors
case class Greeting(who: String)
class GreetingActor extends Actor with ActorLogging {
def receive = {
case Greeting(who) => log.info(s”Hello ${who}")
}
}
val system = ActorSystem("MySystem")
val greeter = system.actorOf(Props[GreetingActor], "greeter")
Give it a nameCreate the Actor
Create an Actor system
Actor configuration
85. Akka Actors
case class Greeting(who: String)
class GreetingActor extends Actor with ActorLogging {
def receive = {
case Greeting(who) => log.info(s”Hello ${who}")
}
}
val system = ActorSystem("MySystem")
val greeter = system.actorOf(Props[GreetingActor], "greeter")
Give it a nameCreate the ActorYou get an ActorRef back
Create an Actor system
Actor configuration
86. Akka Actors
case class Greeting(who: String)
class GreetingActor extends Actor with ActorLogging {
def receive = {
case Greeting(who) => log.info(s”Hello ${who}")
}
}
val system = ActorSystem("MySystem")
val greeter = system.actorOf(Props[GreetingActor], "greeter")
87. Akka Actors
case class Greeting(who: String)
class GreetingActor extends Actor with ActorLogging {
def receive = {
case Greeting(who) => log.info(s”Hello ${who}")
}
}
val system = ActorSystem("MySystem")
val greeter = system.actorOf(Props[GreetingActor], "greeter")
greeter ! Greeting("Charlie Parker")
88. Akka Actors
case class Greeting(who: String)
class GreetingActor extends Actor with ActorLogging {
def receive = {
case Greeting(who) => log.info(s”Hello ${who}")
}
}
val system = ActorSystem("MySystem")
val greeter = system.actorOf(Props[GreetingActor], "greeter")
greeter ! Greeting("Charlie Parker")
Send the message asynchronously
89. Akka Actors
case class Greeting(who: String)
class GreetingActor extends Actor with ActorLogging {
def receive = {
case Greeting(who) => log.info(s”Hello ${who}")
}
}
val system = ActorSystem("MySystem")
val greeter = system.actorOf(Props[GreetingActor], "greeter")
greeter ! Greeting("Charlie Parker")
118. Error Kernel
Pattern
Onion-layered state & Failure management
Making reliable distributed systems in the presence of software errors - Joe Armstrong
On Erlang, State and Crashes - Jesper Louis Andersen
133. Supervision in Akka
Every actor has a default supervisor strategy.
Which can, and often should, be overridden.
class Supervisor extends Actor {
override val supervisorStrategy =
OneForOneStrategy(maxNrOfRetries = 10, withinTimeRange = 1 minute) {
case _: ArithmeticException => Resume
case _: NullPointerException => Restart
case _: Exception => Escalate
}
val worker = context.actorOf(Props[Worker], name = "worker")
def receive = {
case number: Int => worker.forward(number)
}
}
134. Supervision in Akka
Every actor has a default supervisor strategy.
Which can, and often should, be overridden.
class Supervisor extends Actor {
override val supervisorStrategy =
OneForOneStrategy(maxNrOfRetries = 10, withinTimeRange = 1 minute) {
case _: ArithmeticException => Resume
case _: NullPointerException => Restart
case _: Exception => Escalate
}
val worker = context.actorOf(Props[Worker], name = "worker")
def receive = {
case number: Int => worker.forward(number)
}
}
Parent actor
135. Supervision in Akka
Every actor has a default supervisor strategy.
Which can, and often should, be overridden.
class Supervisor extends Actor {
override val supervisorStrategy =
OneForOneStrategy(maxNrOfRetries = 10, withinTimeRange = 1 minute) {
case _: ArithmeticException => Resume
case _: NullPointerException => Restart
case _: Exception => Escalate
}
val worker = context.actorOf(Props[Worker], name = "worker")
def receive = {
case number: Int => worker.forward(number)
}
}
Parent actor
All its children have their life-cycle managed
through this declarative supervision strategy
136. Supervision in Akka
Every actor has a default supervisor strategy.
Which can, and often should, be overridden.
class Supervisor extends Actor {
override val supervisorStrategy =
OneForOneStrategy(maxNrOfRetries = 10, withinTimeRange = 1 minute) {
case _: ArithmeticException => Resume
case _: NullPointerException => Restart
case _: Exception => Escalate
}
val worker = context.actorOf(Props[Worker], name = "worker")
def receive = {
case number: Int => worker.forward(number)
}
}
Create a supervised child actor
Parent actor
All its children have their life-cycle managed
through this declarative supervision strategy
137. Monitor through
Death Watch
class Watcher extends Actor {
val child = context.actorOf(Props.empty, "child")
context.watch(child)
def receive = {
case Terminated(`child`) => … // handle child termination
}
}
138. Monitor through
Death Watch
class Watcher extends Actor {
val child = context.actorOf(Props.empty, "child")
context.watch(child)
def receive = {
case Terminated(`child`) => … // handle child termination
}
}
Create a child actor
139. Monitor through
Death Watch
class Watcher extends Actor {
val child = context.actorOf(Props.empty, "child")
context.watch(child)
def receive = {
case Terminated(`child`) => … // handle child termination
}
}
Create a child actor
Watch it
140. Monitor through
Death Watch
class Watcher extends Actor {
val child = context.actorOf(Props.empty, "child")
context.watch(child)
def receive = {
case Terminated(`child`) => … // handle child termination
}
}
Create a child actor
Watch it
Receive termination signal
153. Resilient Protocols
• are tolerant to
• Message loss
• Message reordering
• Message duplication
Depend on
Asynchronous Communication
Eventual Consistency
154. Resilient Protocols
• are tolerant to
• Message loss
• Message reordering
• Message duplication
• Embrace ACID 2.0
• Associative
• Commutative
• Idempotent
• Distributed
Depend on
Asynchronous Communication
Eventual Consistency
155. Akka Distributed Data
class DataBot extends Actor with ActorLogging {
val replicator = DistributedData(context.system).replicator
implicit val node = Cluster(context.system)
val tickTask = context.system.scheduler.schedule(
5.seconds, 5.seconds, self, Add)
val DataKey = ORSetKey[String]("key")
replicator ! Subscribe(DataKey, self)
156. Akka Distributed Data
class DataBot extends Actor with ActorLogging {
val replicator = DistributedData(context.system).replicator
implicit val node = Cluster(context.system)
val tickTask = context.system.scheduler.schedule(
5.seconds, 5.seconds, self, Add)
val DataKey = ORSetKey[String]("key")
replicator ! Subscribe(DataKey, self)
def receive = {
case Tick =>
val s = ThreadLocalRandom.current().nextInt(97, 123).toChar.toString
if (ThreadLocalRandom.current().nextBoolean())
replicator ! Update(DataKey, ORSet.empty[String], WriteLocal)(_ + s)
else
replicator ! Update(DataKey, ORSet.empty[String], WriteLocal)(_ - s)
case c @ Changed(DataKey) =>
val data = c.get(DataKey)
case _: UpdateResponse[_] => // ignore
}
}
157. “An escalator can never break: it can only
become stairs. You should never see an
Escalator Temporarily Out Of Order sign, just
Escalator Temporarily Stairs. Sorry for the
convenience.”
- Mitch Hedberg
170. Akka Streams
in ~> f1 ~> bcast ~> f2 ~> merge ~> f3 ~> out
bcast ~> f4 ~> merge
171. Akka Streams
in ~> f1 ~> bcast ~> f2 ~> merge ~> f3 ~> out
bcast ~> f4 ~> merge
val g = FlowGraph.closed() {
implicit builder: FlowGraph.Builder =>
import FlowGraph.Implicits._
val in = Source(1 to 10)
val out = Sink.ignore
val bcast = builder.add(Broadcast[Int](2))
val merge = builder.add(Merge[Int](2))
val f1, f2, f3, f4 = Flow[Int].map(_ + 10)
}
172. Akka Streams
in ~> f1 ~> bcast ~> f2 ~> merge ~> f3 ~> out
bcast ~> f4 ~> merge
val g = FlowGraph.closed() {
implicit builder: FlowGraph.Builder =>
import FlowGraph.Implicits._
val in = Source(1 to 10)
val out = Sink.ignore
val bcast = builder.add(Broadcast[Int](2))
val merge = builder.add(Merge[Int](2))
val f1, f2, f3, f4 = Flow[Int].map(_ + 10)
}
Set up the context
173. Akka Streams
in ~> f1 ~> bcast ~> f2 ~> merge ~> f3 ~> out
bcast ~> f4 ~> merge
val g = FlowGraph.closed() {
implicit builder: FlowGraph.Builder =>
import FlowGraph.Implicits._
val in = Source(1 to 10)
val out = Sink.ignore
val bcast = builder.add(Broadcast[Int](2))
val merge = builder.add(Merge[Int](2))
val f1, f2, f3, f4 = Flow[Int].map(_ + 10)
}
Set up the context
Create the
Source and Sink
174. Akka Streams
in ~> f1 ~> bcast ~> f2 ~> merge ~> f3 ~> out
bcast ~> f4 ~> merge
val g = FlowGraph.closed() {
implicit builder: FlowGraph.Builder =>
import FlowGraph.Implicits._
val in = Source(1 to 10)
val out = Sink.ignore
val bcast = builder.add(Broadcast[Int](2))
val merge = builder.add(Merge[Int](2))
val f1, f2, f3, f4 = Flow[Int].map(_ + 10)
}
Set up the context
Create the
Source and Sink
Create the fan out
and fan in stages
175. Akka Streams
in ~> f1 ~> bcast ~> f2 ~> merge ~> f3 ~> out
bcast ~> f4 ~> merge
val g = FlowGraph.closed() {
implicit builder: FlowGraph.Builder =>
import FlowGraph.Implicits._
val in = Source(1 to 10)
val out = Sink.ignore
val bcast = builder.add(Broadcast[Int](2))
val merge = builder.add(Merge[Int](2))
val f1, f2, f3, f4 = Flow[Int].map(_ + 10)
}
Set up the context
Create the
Source and Sink
Create the fan out
and fan in stages
Create a set of
transformations
176. Akka Streams
in ~> f1 ~> bcast ~> f2 ~> merge ~> f3 ~> out
bcast ~> f4 ~> merge
val g = FlowGraph.closed() {
implicit builder: FlowGraph.Builder =>
import FlowGraph.Implicits._
val in = Source(1 to 10)
val out = Sink.ignore
val bcast = builder.add(Broadcast[Int](2))
val merge = builder.add(Merge[Int](2))
val f1, f2, f3, f4 = Flow[Int].map(_ + 10)
}
Set up the context
Create the
Source and Sink
Create the fan out
and fan in stages
Create a set of
transformations
Define the graph
processing blueprint
187. ReferencesAntifragile: Things That Gain from Disorder - http://www.amazon.com/Antifragile-Things-that-Gain-Disorder-ebook/dp/B009K6DKTS
Drift into Failure - http://www.amazon.com/Drift-into-Failure-Components-Understanding-ebook/dp/B009KOKXKY
How Complex Systems Fail - http://web.mit.edu/2.75/resources/random/How%20Complex%20Systems%20Fail.pdf
Leverage Points: Places to Intervene in a System - http://www.donellameadows.org/archives/leverage-points-places-to-intervene-in-a-system/
Going Solid: A Model of System Dynamics and Consequences for Patient Safety - http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1743994/
Resilience in Complex Adaptive Systems: Operating at the Edge of Failure - https://www.youtube.com/watch?v=PGLYEDpNu60
Dealing in Security - http://resiliencemaps.org/files/Dealing_in_Security.July2010.en.pdf
Principles for Building Resilience: Sustaining Ecosystem Services in Social-Ecological Systems - http://www.amazon.com/Principles-
Building-Resilience-Sustaining-Social-Ecological/dp/110708265X
Puppies! Now that I’ve got your attention, Complexity Theory - https://www.ted.com/talks/
nicolas_perony_puppies_now_that_i_ve_got_your_attention_complexity_theory
How Bacteria Becomes Resistant - http://www.abc.net.au/science/slab/antibiotics/resistance.htm
Towards Resilient Architectures: Biology Lessons - http://www.metropolismag.com/Point-of-View/March-2013/Toward-Resilient-
Architectures-1-Biology-Lessons/
Crash-Only Software - https://www.usenix.org/legacy/events/hotos03/tech/full_papers/candea/candea.pdf
Recursive Restartability: Turning the Reboot Sledgehammer into a Scalpel - http://roc.cs.berkeley.edu/papers/recursive_restartability.pdf
Out of the Tar Pit - http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.93.8928
Bulkhead Pattern - http://skife.org/architecture/fault-tolerance/2009/12/31/bulkheads.html
Making Reliable Distributed Systems in the Presence of Software Errors - http://www.erlang.org/download/armstrong_thesis_2003.pdf
On Erlang, State and Crashes - http://jlouisramblings.blogspot.be/2010/11/on-erlang-state-and-crashes.html
Akka Supervision - http://doc.akka.io/docs/akka/snapshot/general/supervision.html
Release It!: Design and Deploy Production-Ready Software - https://pragprog.com/book/mnee/release-it
Hystrix - https://github.com/Netflix/Hystrix
Akka Circuit Breaker - http://doc.akka.io/docs/akka/snapshot/common/circuitbreaker.html
Reactive Streams - http://reactive-streams.org
Akka Streams - http://doc.akka.io/docs/akka-stream-and-http-experimental/1.0/scala/stream-introduction.html
RxJava - https://github.com/ReactiveX/RxJava
Feedback Control for Computer Systems - http://www.amazon.com/Feedback-Control-Computer-Systems-Philipp/dp/1449361692
Simian Army - https://github.com/Netflix/SimianArmy
Gatling - http://gatling.io
Akka MultiNode Testing - http://doc.akka.io/docs/akka/snapshot/dev/multi-node-testing.html
188. EXPERT TRAINING
Delivered On-site For Spark, Scala, Akka And Play
Help is just a click away. Get in touch with
Typesafe about our training courses.
• Intro Workshop to Apache Spark
• Fast Track & Advanced Scala
• Fast Track to Akka with Java or Scala
• Fast Track to Play with Java or Scala
• Advanced Akka with Java or Scala
Ask us about local trainings available by 24
Typesafe partners in 14 countries around
the world.
CONTACT US Learn more about on-site training