SlideShare une entreprise Scribd logo
1  sur  68
Télécharger pour lire hors ligne
Resilient Functional Service Design
The usually forgotten parts of resilient software design

Uwe Friedrichsen – codecentric AG – 2015-2017
@ufried
Uwe Friedrichsen | uwe.friedrichsen@codecentric.de | http://slideshare.net/ufried | http://ufried.tumblr.com
What’s that “resilience” thing?
Business
Production
Availability
(Almost) every system is a distributed system

Chas Emerick






http://www.infoq.com/presentations/problems-distributed-systems
A distributed system is one in which the failure
of a computer you didn't even know existed
can render your own computer unusable.

Leslie Lamport
Failures in todays complex, distributed and
interconnected systems are not the exception.

•  They are the normal case

•  They are not predictable
… and it’s getting “worse”


•  Cloud-based systems
•  Microservices
•  Zero Downtime
•  Mobile & IoT
•  Social Web

à Ever-increasing complexity and connectivity
Do not try to avoid failures. Embrace them.
resilience (IT)

the ability of a system to handle unexpected situations
-  without the user noticing it (best case)
-  with a graceful degradation of service (worst case)
Beware of the “100% available” trap!
Designing for resilience
The pitfall
First, you learn about resilience …
Complement
Core
Detect
Prevent
Recover
Mitigate
Treat
Core
Detect
 Treat
Prevent
Recover
Mitigate
 Complement
Supporting
patterns
Redundancy
Stateless
Idempotency
Escalation
Zero downtime
deployment
Location
transparency
Relaxed
temporal
constraints
Fallback
Shed load
Share load
Marked data
Queue for
resources
Bounded queue
Finish work
in progress
Fresh work
before stale
Deferrable work
Communication
paradigm
Isolation
Bulkhead
System level
Monitor
Watchdog
Heartbeat
Acknowledgement
Either level
Voting
Synthetic
transaction
Leaky
bucket
 Routine

checks
Health
check
Fail fast
Let sleeping

dogs lie
Small

releases
Hot

deployments
Routine
maintenance
Backup

request
Anti-fragility
Diversity
 Jitter
Error
injection
Spread
the news
Anti-entropy
Backpressure
Retry
Limit retries
Rollback
 Roll-forward
Checkpoint
 Safe point
Failover
Read repair
Error
handler
Reset
Restart
Reconnect
Fail silently
Default value
Node level
Timeout
Circuit
breaker
Complete
parameter
checking
Checksum
Statically
Dynamically
Confinement
... then, you digest the stuff just learned
Confinement
Core
Detect
 Treat
Prevent
Recover
Mitigate
 Complement
Supporting
patterns
Redundancy
Stateless
Idempotency
Escalation
Zero downtime
deployment
Location
transparency
Relaxed
temporal
constraints
Fallback
Shed load
Share load
Marked data
Queue for
resources
Bounded queue
Finish work
in progress
Fresh work
before stale
Deferrable work
Communication
paradigm
Isolation
Bulkhead
System level
Monitor
Watchdog
Heartbeat
Acknowledgement
Either level
Voting
Synthetic
transaction
Leaky
bucket
 Routine

checks
Health
check
Fail fast
Let sleeping

dogs lie
Small

releases
Hot

deployments
Routine
maintenance
Backup

request
Anti-fragility
Diversity
 Jitter
Error
injection
Spread
the news
Anti-entropy
Backpressure
Retry
Limit retries
Rollback
 Roll-forward
Checkpoint
 Safe point
Failover
Read repair
Error
handler
Reset
Restart
Reconnect
Fail silently
Default value
Node level
Timeout
Circuit
breaker
Complete
parameter
checking
Checksum
Statically
Dynamically
Oh, my!

Theoretical blah!
Uncool!


Know that anyway
for eons. So, let’s
move on to the cool
parts …
Ah, now we’re talkin’! Here’s the cool stuff!

That‘s practical, applicable. Don‘t you have more code examples?
Or even better: Can‘t we turn that all into a live hacking session?
Offline
activities?


Hmm, let‘s
focus on the
other stuff.
Uh, sounds like
one-off, tough
stuff …


Better start with the
easier stuff, best
with library support
Yeah, more cool stuff!

Aren‘t there more libs like Hystrix
that we can drag into our projects
with a line of configuration?
Well, neat …

I’ll come back to
that stuff whenever
I really need it
Core
Detect
Recover
Mitigate
Treat
Prevent
Complement
Developer
priority
Relevance for
application robustness
Ye be warned!

If you don’t get this part
right, nothing else matters
Here be dragons!

This is extremely hard
and poorly understood
Let’s recap …
The core parts are



•  extremely important
•  poorly understood
•  massively underestimated
Houston, we have a problem!
Let’s have a closer look at the core parts
Complement
Core
Detect
Prevent
Recover
Mitigate
Treat
Core
Detect
 Treat
Prevent
Recover
Mitigate
 Complement
Isolation
Isolation
•  System must not fail as a whole
•  Split system in parts and isolate parts against each other
•  Avoid cascading failures
•  Foundation of resilient software design
Core
Detect
 Treat
Prevent
Recover
Mitigate
 Complement
Isolation
Bulkhead
Bulkheads are not about thread pools!
Bulkheads

•  Core isolation pattern (a.k.a. “failure units” or “units of mitigation”)
•  Diverse implementation choices available, e.g., µservice, actor, scs, ...
•  Implementation choice impacts system and resilience design a lot
•  Shaping good bulkheads is extremely hard (pure design issue)
Sounds easy. Where is the problem?
Service A
 Service B
Request
Due to functional design, Service A
always needs backing from Service B
to be able to answer a client request,

i.e. the isolation is broken by design
How do we avoid this …
Service
Request
Due to functional design we need
to call a lot of services to be able
to answer a client request,

i.e. availability is broken by design
... and this ...
Service
Service
Service
 Service
Service
Service
Service
Service
Service
Service
Service
Service
Mothership Service

(a.k.a. Monolith)
Request
By trying to avoid the aforementioned
issues we ended up with cramming all
required functionality in one big service

i.e. the isolation is broken by design
... without ending up with this?
Let’s use the well-known best practices

•  Divide & conquer a.k.a. functional decomposition
•  DRY (Don’t Repeat Yourself)
•  Design for reusability
•  Layered architecture
•  …
Unfortunately, …
Service A
 Service B
Request
Due to functional design, Service A
always needs backing from Service B
to be able to answer a client request,

i.e. the isolation is broken by design
... this usually leads to this ...
Service
Request
Due to functional design we need
to call a lot of services to be able
to answer a client request,

i.e. availability is broken by design
... and this ...
Service
Service
Service
 Service
Service
Service
Service
Service
Service
Service
Service
Service
Mothership Service

(a.k.a. Monolith)
Request
By trying to avoid the aforementioned
issues we ended up with cramming all
required functionality in one big service

i.e. the isolation is broken by design
... and in the end also often to this.
Welcome to distributed hell!
Caches to the rescue!
Service A
 Service B
Request
Due to functional design, Service A
always needs backing from Service B
to be able to answer a client request,

i.e. the isolation is broken by design
CacheofB
 Break tight service coupling
by caching data/responses
of downstream service
Caches to the rescue?
Do you really think

that copying stale data all over your system
is a suitable measure

to fix an inherently broken design?
We have to re-learn design
for distributed systems!
A works-out-of-the-box-in-all-contexts,
just-add-water-and-stir,
three-bullet-point panacea
for designing perfect bulkheads
You need lots of those …
... maybe some of those
Then it is a lot of hard work …
... and there is no silver bullet
Yet, a few guiding thoughts

about bulkhead design …
Foundations of design

•  High cohesion, low coupling
•  Separation of concerns
•  Crucial across process boundaries
•  Still poorly understood issue
•  Start with
•  Understanding organizational boundaries
•  Understanding use cases and flows
•  Identifying functional domains (à DDD)
•  Finding areas that change independently
•  Do not start with a data model!
Short activation paths


•  Long activation paths affect availability
•  Increase latency and likelihood of failures
•  Minimize remote calls per request
•  Need to balance opposing forces
•  Avoid monolith à clear separation of concerns
•  Minimize requests à cluster functionality & data
•  Caches sometimes help, but stale data as trade-off
Dismiss reusability


•  Reusability increases coupling
•  Reusability leads to bad service design
•  Reusability compromises availability
•  Reusability rarely pays
•  Do not strive for reuse
•  Strive for replaceability instead
Broadening the options ...
Core
Detect
 Treat
Prevent
Recover
Mitigate
 Complement
Isolation
Communication
paradigm
Bulkhead
Communication paradigm

•  Request-response <-> messaging <-> events <-> …
•  Heavily influences resilience patterns to be used
•  Also heavily influences functional bulkhead design
•  Very fundamental decision which is often underestimated
µS
Request/Response : Horizontal slicing
Flow / Process
µS
 µS
µS
 µS
 µS
µS
Event-driven : Vertical slicing
µS
 µS
µS
µS
 µS
Flow / Process
Synchronous R/R vs. asynchronous events

•  Decomposition
•  Vertically divide-and-conquer vs. horizontally go-with-the-flow
•  Coordination
•  Coordination logic/services and orchestration vs. event chains and choreography
•  Transactions
•  Built-in transaction handling vs. external supervision
•  Error handling
•  Built into service vs. escalation/supervision strategy
•  Separation of concerns
•  Multiple responsibilities service vs. single responsibility services
•  Encapsulation
•  Domain logic distributed across services vs. domain logic in one place
•  Reusability vs. Replaceability
•  Complexity
•  A draw …
The communication paradigm influences

the functional service design a lot

and also the resilience patterns to be used
Example: order fulfillment

•  Simple order, credit card, non-digital items
•  Add coupons incl. validation
•  Add promotions incl. usage notification
•  Add bonus card incl. purchase notification
•  Customer accounts as payment type
•  PayPal as payment type
•  Integrate digital music library
•  Integrate digital video library
•  Integrate e-book library
Design exercise – Part 1


Create a bulkhead design for the case study

•  Use one communication paradigm
•  Synchronous request/response (e.g., REST)
•  Asynchronous messaging (e.g., Akka)
•  Asynchronous events (e.g., Pub/Sub)
•  Assume incremental requirements
•  How many services do you need to touch
•  What about the functional isolation of the services
•  How big/maintainable are the resulting services
•  Take a few notes
Online Shop Checkout
Credit Card
Provider
Warehouse
System
Coupon
Management
Campaign
Management
PayPal
Loyalty
Management
Accounts
Receivables
Music Library
E-Book
Library
Video Library
E-Mail Server
Customer
pressed
“Buy now”
?
Order Fulfillment

Service
Online Shop
Payment

Service
Credit Card
Provider
Shipment

Service
Warehouse
System
<Foreign Service>
<Own Service>
Coupon
Management
Promotion
Campaign
Management
Loyalty
Account
Service
Payment
Provider
PayPal
Loyalty
Management
Accounts
Receivables
Music Library
E-Book
Library
Video Library
E-Mail Server
Coupon
Credit Card
Coordinate
Warehouse
Coordinate
Assets
Notify Cust.
PayPal
Coordinate
Order confirmed
Online Shop
Credit Card
Provider
Warehouse
System
<Foreign Service>
<Own Service>
Coupon
Management
Campaign
Management
Account
service
Credit Card
Service
Loyalty
Management
Accounts
Receivables
Music Library
E-Book
Library
Video Library
 E-Mail Server
PayPal
PayPal
Service
Warehouse
Service
Promotion
Service
Bonus Card
Service
Coupon
Service
Music Library
Service
Video Library
Service
E-Book Library
Service
Notification
Service
Payment authorized
 Digital asset provisioned
Payment failed
<Event>
Order fulfillment
supervisor
Track flow of events
Reschedule events
in case of failure
Services are responsible to eventually succeed
or fail for good, usually incorporating a
supervision/escalation hierarchy for that
Do not limit your design options upfront
without an important reason
Wrap-up

•  Today’s systems are distributed
•  Failures are not avoidable, nor predictable
•  Resilient software design needed
•  Bulkhead design is
•  crucial for application robustness
•  poorly understood
•  massively underrated
•  different from traditional design best practices
•  Communication paradigms broaden

your bulkhead design options
We have to re-learn design
for distributed systems
@ufried
Uwe Friedrichsen | uwe.friedrichsen@codecentric.de | http://slideshare.net/ufried | http://ufried.tumblr.com
Resilient Functional Service Design

Contenu connexe

Tendances

Application Resilience Patterns
Application Resilience PatternsApplication Resilience Patterns
Application Resilience PatternsKiran Sama
 
Microservice Architecture | Microservices Tutorial for Beginners | Microservi...
Microservice Architecture | Microservices Tutorial for Beginners | Microservi...Microservice Architecture | Microservices Tutorial for Beginners | Microservi...
Microservice Architecture | Microservices Tutorial for Beginners | Microservi...Edureka!
 
Cloud Adoption Framework Phase one-moving to the cloud
Cloud Adoption Framework Phase one-moving to the cloudCloud Adoption Framework Phase one-moving to the cloud
Cloud Adoption Framework Phase one-moving to the cloudAnthony Clendenen
 
Azure fundamentals-170910113238
Azure fundamentals-170910113238Azure fundamentals-170910113238
Azure fundamentals-170910113238ScottSmith574468
 
Microservices Technology Stack
Microservices Technology StackMicroservices Technology Stack
Microservices Technology StackEberhard Wolff
 
Microservice vs. Monolithic Architecture
Microservice vs. Monolithic ArchitectureMicroservice vs. Monolithic Architecture
Microservice vs. Monolithic ArchitecturePaul Mooney
 
Cloud computing security
Cloud computing security Cloud computing security
Cloud computing security Akhila Param
 
DNS DDoS mitigation using Amazon Route 53 and AWS Shield
DNS DDoS mitigation using Amazon Route 53 and AWS ShieldDNS DDoS mitigation using Amazon Route 53 and AWS Shield
DNS DDoS mitigation using Amazon Route 53 and AWS ShieldAmazon Web Services
 
Cloud Security Architecture.pptx
Cloud Security Architecture.pptxCloud Security Architecture.pptx
Cloud Security Architecture.pptxMoshe Ferber
 
Azure Cloud Adoption Framework + Governance - Sana Khan and Jay Kumar
Azure Cloud Adoption Framework + Governance - Sana Khan and Jay Kumar Azure Cloud Adoption Framework + Governance - Sana Khan and Jay Kumar
Azure Cloud Adoption Framework + Governance - Sana Khan and Jay Kumar Timothy McAliley
 
Understanding MicroSERVICE Architecture with Java & Spring Boot
Understanding MicroSERVICE Architecture with Java & Spring BootUnderstanding MicroSERVICE Architecture with Java & Spring Boot
Understanding MicroSERVICE Architecture with Java & Spring BootKashif Ali Siddiqui
 
Microservices, Containers, Kubernetes, Kafka, Kanban
Microservices, Containers, Kubernetes, Kafka, KanbanMicroservices, Containers, Kubernetes, Kafka, Kanban
Microservices, Containers, Kubernetes, Kafka, KanbanAraf Karsh Hamid
 
Azure Security Overview
Azure Security OverviewAzure Security Overview
Azure Security OverviewAllen Brokken
 

Tendances (20)

MULTI-CLOUD ARCHITECTURE
MULTI-CLOUD ARCHITECTUREMULTI-CLOUD ARCHITECTURE
MULTI-CLOUD ARCHITECTURE
 
Application Resilience Patterns
Application Resilience PatternsApplication Resilience Patterns
Application Resilience Patterns
 
Microservice Architecture | Microservices Tutorial for Beginners | Microservi...
Microservice Architecture | Microservices Tutorial for Beginners | Microservi...Microservice Architecture | Microservices Tutorial for Beginners | Microservi...
Microservice Architecture | Microservices Tutorial for Beginners | Microservi...
 
Cloud Adoption Framework Phase one-moving to the cloud
Cloud Adoption Framework Phase one-moving to the cloudCloud Adoption Framework Phase one-moving to the cloud
Cloud Adoption Framework Phase one-moving to the cloud
 
Azure Security Overview
Azure Security OverviewAzure Security Overview
Azure Security Overview
 
Introduction to Microservices
Introduction to MicroservicesIntroduction to Microservices
Introduction to Microservices
 
Hashicorp Vault ppt
Hashicorp Vault pptHashicorp Vault ppt
Hashicorp Vault ppt
 
Introduction to Microservices
Introduction to MicroservicesIntroduction to Microservices
Introduction to Microservices
 
Azure fundamentals-170910113238
Azure fundamentals-170910113238Azure fundamentals-170910113238
Azure fundamentals-170910113238
 
Microservices Decomposition Patterns
Microservices Decomposition PatternsMicroservices Decomposition Patterns
Microservices Decomposition Patterns
 
Cloud Native In-Depth
Cloud Native In-DepthCloud Native In-Depth
Cloud Native In-Depth
 
Microservices Technology Stack
Microservices Technology StackMicroservices Technology Stack
Microservices Technology Stack
 
Microservice vs. Monolithic Architecture
Microservice vs. Monolithic ArchitectureMicroservice vs. Monolithic Architecture
Microservice vs. Monolithic Architecture
 
Cloud computing security
Cloud computing security Cloud computing security
Cloud computing security
 
DNS DDoS mitigation using Amazon Route 53 and AWS Shield
DNS DDoS mitigation using Amazon Route 53 and AWS ShieldDNS DDoS mitigation using Amazon Route 53 and AWS Shield
DNS DDoS mitigation using Amazon Route 53 and AWS Shield
 
Cloud Security Architecture.pptx
Cloud Security Architecture.pptxCloud Security Architecture.pptx
Cloud Security Architecture.pptx
 
Azure Cloud Adoption Framework + Governance - Sana Khan and Jay Kumar
Azure Cloud Adoption Framework + Governance - Sana Khan and Jay Kumar Azure Cloud Adoption Framework + Governance - Sana Khan and Jay Kumar
Azure Cloud Adoption Framework + Governance - Sana Khan and Jay Kumar
 
Understanding MicroSERVICE Architecture with Java & Spring Boot
Understanding MicroSERVICE Architecture with Java & Spring BootUnderstanding MicroSERVICE Architecture with Java & Spring Boot
Understanding MicroSERVICE Architecture with Java & Spring Boot
 
Microservices, Containers, Kubernetes, Kafka, Kanban
Microservices, Containers, Kubernetes, Kafka, KanbanMicroservices, Containers, Kubernetes, Kafka, Kanban
Microservices, Containers, Kubernetes, Kafka, Kanban
 
Azure Security Overview
Azure Security OverviewAzure Security Overview
Azure Security Overview
 

En vedette

The promises and perils of microservices
The promises and perils of microservicesThe promises and perils of microservices
The promises and perils of microservicesUwe Friedrichsen
 
DevOps is not enough - Embedding DevOps in a broader context
DevOps is not enough - Embedding DevOps in a broader contextDevOps is not enough - Embedding DevOps in a broader context
DevOps is not enough - Embedding DevOps in a broader contextUwe Friedrichsen
 
Towards complex adaptive architectures
Towards complex adaptive architecturesTowards complex adaptive architectures
Towards complex adaptive architecturesUwe Friedrichsen
 
Microservices - stress-free and without increased heart attack risk
Microservices - stress-free and without increased heart attack riskMicroservices - stress-free and without increased heart attack risk
Microservices - stress-free and without increased heart attack riskUwe Friedrichsen
 
Conway's law revisited - Architectures for an effective IT
Conway's law revisited - Architectures for an effective ITConway's law revisited - Architectures for an effective IT
Conway's law revisited - Architectures for an effective ITUwe Friedrichsen
 
Modern times - architectures for a Next Generation of IT
Modern times - architectures for a Next Generation of ITModern times - architectures for a Next Generation of IT
Modern times - architectures for a Next Generation of ITUwe Friedrichsen
 
The Next Generation (of) IT
The Next Generation (of) ITThe Next Generation (of) IT
The Next Generation (of) ITUwe Friedrichsen
 
Why resilience - A primer at varying flight altitudes
Why resilience - A primer at varying flight altitudesWhy resilience - A primer at varying flight altitudes
Why resilience - A primer at varying flight altitudesUwe Friedrichsen
 
Microservices vs. The First Law of Distributed Objects - GOTO Nights Chicago ...
Microservices vs. The First Law of Distributed Objects - GOTO Nights Chicago ...Microservices vs. The First Law of Distributed Objects - GOTO Nights Chicago ...
Microservices vs. The First Law of Distributed Objects - GOTO Nights Chicago ...Phil Calçado
 
XTM; from AEM\Drupal\Sitecore direct connectors to translation vendor subcont...
XTM; from AEM\Drupal\Sitecore direct connectors to translation vendor subcont...XTM; from AEM\Drupal\Sitecore direct connectors to translation vendor subcont...
XTM; from AEM\Drupal\Sitecore direct connectors to translation vendor subcont...TAUS - The Language Data Network
 
Der Business Case für Architektur
Der Business Case für ArchitekturDer Business Case für Architektur
Der Business Case für ArchitekturUwe Friedrichsen
 

En vedette (20)

The promises and perils of microservices
The promises and perils of microservicesThe promises and perils of microservices
The promises and perils of microservices
 
Watch your communication
Watch your communicationWatch your communication
Watch your communication
 
Patterns of resilience
Patterns of resiliencePatterns of resilience
Patterns of resilience
 
Life, IT and everything
Life, IT and everythingLife, IT and everything
Life, IT and everything
 
DevOps is not enough - Embedding DevOps in a broader context
DevOps is not enough - Embedding DevOps in a broader contextDevOps is not enough - Embedding DevOps in a broader context
DevOps is not enough - Embedding DevOps in a broader context
 
Towards complex adaptive architectures
Towards complex adaptive architecturesTowards complex adaptive architectures
Towards complex adaptive architectures
 
Production-ready Software
Production-ready SoftwareProduction-ready Software
Production-ready Software
 
Resilience with Hystrix
Resilience with HystrixResilience with Hystrix
Resilience with Hystrix
 
Microservices - stress-free and without increased heart attack risk
Microservices - stress-free and without increased heart attack riskMicroservices - stress-free and without increased heart attack risk
Microservices - stress-free and without increased heart attack risk
 
Conway's law revisited - Architectures for an effective IT
Conway's law revisited - Architectures for an effective ITConway's law revisited - Architectures for an effective IT
Conway's law revisited - Architectures for an effective IT
 
No stress with state
No stress with stateNo stress with state
No stress with state
 
Fantastic Elastic
Fantastic ElasticFantastic Elastic
Fantastic Elastic
 
Self healing data
Self healing dataSelf healing data
Self healing data
 
Modern times - architectures for a Next Generation of IT
Modern times - architectures for a Next Generation of ITModern times - architectures for a Next Generation of IT
Modern times - architectures for a Next Generation of IT
 
The Next Generation (of) IT
The Next Generation (of) ITThe Next Generation (of) IT
The Next Generation (of) IT
 
Devops for Developers
Devops for DevelopersDevops for Developers
Devops for Developers
 
Why resilience - A primer at varying flight altitudes
Why resilience - A primer at varying flight altitudesWhy resilience - A primer at varying flight altitudes
Why resilience - A primer at varying flight altitudes
 
Microservices vs. The First Law of Distributed Objects - GOTO Nights Chicago ...
Microservices vs. The First Law of Distributed Objects - GOTO Nights Chicago ...Microservices vs. The First Law of Distributed Objects - GOTO Nights Chicago ...
Microservices vs. The First Law of Distributed Objects - GOTO Nights Chicago ...
 
XTM; from AEM\Drupal\Sitecore direct connectors to translation vendor subcont...
XTM; from AEM\Drupal\Sitecore direct connectors to translation vendor subcont...XTM; from AEM\Drupal\Sitecore direct connectors to translation vendor subcont...
XTM; from AEM\Drupal\Sitecore direct connectors to translation vendor subcont...
 
Der Business Case für Architektur
Der Business Case für ArchitekturDer Business Case für Architektur
Der Business Case für Architektur
 

Similaire à Resilient Functional Service Design

Onion Architecture / Clean Architecture
Onion Architecture / Clean ArchitectureOnion Architecture / Clean Architecture
Onion Architecture / Clean ArchitectureAttila Bertók
 
Timeless design in a cloud-native world
Timeless design in a cloud-native worldTimeless design in a cloud-native world
Timeless design in a cloud-native worldUwe Friedrichsen
 
Microservices - stress-free and without increased heart-attack risk - Uwe Fri...
Microservices - stress-free and without increased heart-attack risk - Uwe Fri...Microservices - stress-free and without increased heart-attack risk - Uwe Fri...
Microservices - stress-free and without increased heart-attack risk - Uwe Fri...distributed matters
 
Microservices for Mortals by Bert Ertman at Codemotion Dubai
 Microservices for Mortals by Bert Ertman at Codemotion Dubai Microservices for Mortals by Bert Ertman at Codemotion Dubai
Microservices for Mortals by Bert Ertman at Codemotion DubaiCodemotion Dubai
 
Excavating the knowledge of our ancestors
Excavating the knowledge of our ancestorsExcavating the knowledge of our ancestors
Excavating the knowledge of our ancestorsUwe Friedrichsen
 
Microservices Gone Wrong!
Microservices Gone Wrong!Microservices Gone Wrong!
Microservices Gone Wrong!Bert Ertman
 
Microservices for Mortals
Microservices for MortalsMicroservices for Mortals
Microservices for MortalsBert Ertman
 
Deep Dive into the Idea of Software Architecture
Deep Dive into the Idea of Software ArchitectureDeep Dive into the Idea of Software Architecture
Deep Dive into the Idea of Software ArchitectureMatthew Clarke
 
The Economics of Scale: Promises and Perils of Going Distributed
The Economics of Scale: Promises and Perils of Going DistributedThe Economics of Scale: Promises and Perils of Going Distributed
The Economics of Scale: Promises and Perils of Going DistributedTyler Treat
 
Software Architecture and Architectors: useless VS valuable
Software Architecture and Architectors: useless VS valuableSoftware Architecture and Architectors: useless VS valuable
Software Architecture and Architectors: useless VS valuableComsysto Reply GmbH
 
The "Why", "What" and "How" of Microservices
The "Why", "What" and "How" of Microservices The "Why", "What" and "How" of Microservices
The "Why", "What" and "How" of Microservices INPAY
 
Software Architecture Anti-Patterns
Software Architecture Anti-PatternsSoftware Architecture Anti-Patterns
Software Architecture Anti-PatternsEduards Sizovs
 
Digital Transformation with Kubernetes, Containers, and Microservices
Digital Transformation with Kubernetes, Containers, and MicroservicesDigital Transformation with Kubernetes, Containers, and Microservices
Digital Transformation with Kubernetes, Containers, and MicroservicesLightbend
 
Architectural Decisions: Smoothly and Consistently
Architectural Decisions: Smoothly and ConsistentlyArchitectural Decisions: Smoothly and Consistently
Architectural Decisions: Smoothly and ConsistentlyComsysto Reply GmbH
 
Architectural Decisions: Smoothly and Consistently
Architectural Decisions: Smoothly and ConsistentlyArchitectural Decisions: Smoothly and Consistently
Architectural Decisions: Smoothly and ConsistentlyComsysto Reply GmbH
 
Arch factory - Agile Design: Best Practices
Arch factory - Agile Design: Best PracticesArch factory - Agile Design: Best Practices
Arch factory - Agile Design: Best PracticesIgor Moochnick
 
[WSO2Con EU 2017] Resilience Patterns with Ballerina
[WSO2Con EU 2017] Resilience Patterns with Ballerina[WSO2Con EU 2017] Resilience Patterns with Ballerina
[WSO2Con EU 2017] Resilience Patterns with BallerinaWSO2
 
Smaller is Better - Exploiting Microservice Architectures on AWS - Technical 201
Smaller is Better - Exploiting Microservice Architectures on AWS - Technical 201Smaller is Better - Exploiting Microservice Architectures on AWS - Technical 201
Smaller is Better - Exploiting Microservice Architectures on AWS - Technical 201Amazon Web Services
 
Cloud Native Future
Cloud Native FutureCloud Native Future
Cloud Native FutureJulie Coonce
 

Similaire à Resilient Functional Service Design (20)

Onion Architecture / Clean Architecture
Onion Architecture / Clean ArchitectureOnion Architecture / Clean Architecture
Onion Architecture / Clean Architecture
 
Timeless design in a cloud-native world
Timeless design in a cloud-native worldTimeless design in a cloud-native world
Timeless design in a cloud-native world
 
Microservices - stress-free and without increased heart-attack risk - Uwe Fri...
Microservices - stress-free and without increased heart-attack risk - Uwe Fri...Microservices - stress-free and without increased heart-attack risk - Uwe Fri...
Microservices - stress-free and without increased heart-attack risk - Uwe Fri...
 
Microservices for Mortals by Bert Ertman at Codemotion Dubai
 Microservices for Mortals by Bert Ertman at Codemotion Dubai Microservices for Mortals by Bert Ertman at Codemotion Dubai
Microservices for Mortals by Bert Ertman at Codemotion Dubai
 
Excavating the knowledge of our ancestors
Excavating the knowledge of our ancestorsExcavating the knowledge of our ancestors
Excavating the knowledge of our ancestors
 
Microservices Gone Wrong!
Microservices Gone Wrong!Microservices Gone Wrong!
Microservices Gone Wrong!
 
Microservices for Mortals
Microservices for MortalsMicroservices for Mortals
Microservices for Mortals
 
No silver bullet
No silver bulletNo silver bullet
No silver bullet
 
Deep Dive into the Idea of Software Architecture
Deep Dive into the Idea of Software ArchitectureDeep Dive into the Idea of Software Architecture
Deep Dive into the Idea of Software Architecture
 
The Economics of Scale: Promises and Perils of Going Distributed
The Economics of Scale: Promises and Perils of Going DistributedThe Economics of Scale: Promises and Perils of Going Distributed
The Economics of Scale: Promises and Perils of Going Distributed
 
Software Architecture and Architectors: useless VS valuable
Software Architecture and Architectors: useless VS valuableSoftware Architecture and Architectors: useless VS valuable
Software Architecture and Architectors: useless VS valuable
 
The "Why", "What" and "How" of Microservices
The "Why", "What" and "How" of Microservices The "Why", "What" and "How" of Microservices
The "Why", "What" and "How" of Microservices
 
Software Architecture Anti-Patterns
Software Architecture Anti-PatternsSoftware Architecture Anti-Patterns
Software Architecture Anti-Patterns
 
Digital Transformation with Kubernetes, Containers, and Microservices
Digital Transformation with Kubernetes, Containers, and MicroservicesDigital Transformation with Kubernetes, Containers, and Microservices
Digital Transformation with Kubernetes, Containers, and Microservices
 
Architectural Decisions: Smoothly and Consistently
Architectural Decisions: Smoothly and ConsistentlyArchitectural Decisions: Smoothly and Consistently
Architectural Decisions: Smoothly and Consistently
 
Architectural Decisions: Smoothly and Consistently
Architectural Decisions: Smoothly and ConsistentlyArchitectural Decisions: Smoothly and Consistently
Architectural Decisions: Smoothly and Consistently
 
Arch factory - Agile Design: Best Practices
Arch factory - Agile Design: Best PracticesArch factory - Agile Design: Best Practices
Arch factory - Agile Design: Best Practices
 
[WSO2Con EU 2017] Resilience Patterns with Ballerina
[WSO2Con EU 2017] Resilience Patterns with Ballerina[WSO2Con EU 2017] Resilience Patterns with Ballerina
[WSO2Con EU 2017] Resilience Patterns with Ballerina
 
Smaller is Better - Exploiting Microservice Architectures on AWS - Technical 201
Smaller is Better - Exploiting Microservice Architectures on AWS - Technical 201Smaller is Better - Exploiting Microservice Architectures on AWS - Technical 201
Smaller is Better - Exploiting Microservice Architectures on AWS - Technical 201
 
Cloud Native Future
Cloud Native FutureCloud Native Future
Cloud Native Future
 

Plus de Uwe Friedrichsen

The hitchhiker's guide for the confused developer
The hitchhiker's guide for the confused developerThe hitchhiker's guide for the confused developer
The hitchhiker's guide for the confused developerUwe Friedrichsen
 
Digitization solutions - A new breed of software
Digitization solutions - A new breed of softwareDigitization solutions - A new breed of software
Digitization solutions - A new breed of softwareUwe Friedrichsen
 
Real-world consistency explained
Real-world consistency explainedReal-world consistency explained
Real-world consistency explainedUwe Friedrichsen
 
The truth about "You build it, you run it!"
The truth about "You build it, you run it!"The truth about "You build it, you run it!"
The truth about "You build it, you run it!"Uwe Friedrichsen
 
Dr. Hectic and Mr. Hype - surviving the economic darwinism
Dr. Hectic and Mr. Hype - surviving the economic darwinismDr. Hectic and Mr. Hype - surviving the economic darwinism
Dr. Hectic and Mr. Hype - surviving the economic darwinismUwe Friedrichsen
 
How to survive in a BASE world
How to survive in a BASE worldHow to survive in a BASE world
How to survive in a BASE worldUwe Friedrichsen
 

Plus de Uwe Friedrichsen (9)

Deep learning - a primer
Deep learning - a primerDeep learning - a primer
Deep learning - a primer
 
Life after microservices
Life after microservicesLife after microservices
Life after microservices
 
The hitchhiker's guide for the confused developer
The hitchhiker's guide for the confused developerThe hitchhiker's guide for the confused developer
The hitchhiker's guide for the confused developer
 
Digitization solutions - A new breed of software
Digitization solutions - A new breed of softwareDigitization solutions - A new breed of software
Digitization solutions - A new breed of software
 
Real-world consistency explained
Real-world consistency explainedReal-world consistency explained
Real-world consistency explained
 
The truth about "You build it, you run it!"
The truth about "You build it, you run it!"The truth about "You build it, you run it!"
The truth about "You build it, you run it!"
 
Dr. Hectic and Mr. Hype - surviving the economic darwinism
Dr. Hectic and Mr. Hype - surviving the economic darwinismDr. Hectic and Mr. Hype - surviving the economic darwinism
Dr. Hectic and Mr. Hype - surviving the economic darwinism
 
How to survive in a BASE world
How to survive in a BASE worldHow to survive in a BASE world
How to survive in a BASE world
 
Fault tolerance made easy
Fault tolerance made easyFault tolerance made easy
Fault tolerance made easy
 

Dernier

My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 

Dernier (20)

My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 

Resilient Functional Service Design

  • 1. Resilient Functional Service Design The usually forgotten parts of resilient software design Uwe Friedrichsen – codecentric AG – 2015-2017
  • 2. @ufried Uwe Friedrichsen | uwe.friedrichsen@codecentric.de | http://slideshare.net/ufried | http://ufried.tumblr.com
  • 5. (Almost) every system is a distributed system Chas Emerick http://www.infoq.com/presentations/problems-distributed-systems
  • 6. A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable. Leslie Lamport
  • 7. Failures in todays complex, distributed and interconnected systems are not the exception. •  They are the normal case •  They are not predictable
  • 8. … and it’s getting “worse” •  Cloud-based systems •  Microservices •  Zero Downtime •  Mobile & IoT •  Social Web à Ever-increasing complexity and connectivity
  • 9. Do not try to avoid failures. Embrace them.
  • 10. resilience (IT) the ability of a system to handle unexpected situations -  without the user noticing it (best case) -  with a graceful degradation of service (worst case)
  • 11. Beware of the “100% available” trap!
  • 13. First, you learn about resilience …
  • 15. Core Detect Treat Prevent Recover Mitigate Complement Supporting patterns Redundancy Stateless Idempotency Escalation Zero downtime deployment Location transparency Relaxed temporal constraints Fallback Shed load Share load Marked data Queue for resources Bounded queue Finish work in progress Fresh work before stale Deferrable work Communication paradigm Isolation Bulkhead System level Monitor Watchdog Heartbeat Acknowledgement Either level Voting Synthetic transaction Leaky bucket Routine
 checks Health check Fail fast Let sleeping
 dogs lie Small
 releases Hot
 deployments Routine maintenance Backup
 request Anti-fragility Diversity Jitter Error injection Spread the news Anti-entropy Backpressure Retry Limit retries Rollback Roll-forward Checkpoint Safe point Failover Read repair Error handler Reset Restart Reconnect Fail silently Default value Node level Timeout Circuit breaker Complete parameter checking Checksum Statically Dynamically Confinement
  • 16. ... then, you digest the stuff just learned
  • 17. Confinement Core Detect Treat Prevent Recover Mitigate Complement Supporting patterns Redundancy Stateless Idempotency Escalation Zero downtime deployment Location transparency Relaxed temporal constraints Fallback Shed load Share load Marked data Queue for resources Bounded queue Finish work in progress Fresh work before stale Deferrable work Communication paradigm Isolation Bulkhead System level Monitor Watchdog Heartbeat Acknowledgement Either level Voting Synthetic transaction Leaky bucket Routine
 checks Health check Fail fast Let sleeping
 dogs lie Small
 releases Hot
 deployments Routine maintenance Backup
 request Anti-fragility Diversity Jitter Error injection Spread the news Anti-entropy Backpressure Retry Limit retries Rollback Roll-forward Checkpoint Safe point Failover Read repair Error handler Reset Restart Reconnect Fail silently Default value Node level Timeout Circuit breaker Complete parameter checking Checksum Statically Dynamically Oh, my!
 Theoretical blah! Uncool!
 Know that anyway for eons. So, let’s move on to the cool parts … Ah, now we’re talkin’! Here’s the cool stuff! That‘s practical, applicable. Don‘t you have more code examples? Or even better: Can‘t we turn that all into a live hacking session? Offline activities?
 Hmm, let‘s focus on the other stuff. Uh, sounds like one-off, tough stuff …
 Better start with the easier stuff, best with library support Yeah, more cool stuff! Aren‘t there more libs like Hystrix that we can drag into our projects with a line of configuration? Well, neat … I’ll come back to that stuff whenever I really need it
  • 18. Core Detect Recover Mitigate Treat Prevent Complement Developer priority Relevance for application robustness Ye be warned! If you don’t get this part right, nothing else matters Here be dragons! This is extremely hard and poorly understood
  • 20. The core parts are •  extremely important •  poorly understood •  massively underestimated
  • 21. Houston, we have a problem!
  • 22. Let’s have a closer look at the core parts
  • 25. Isolation •  System must not fail as a whole •  Split system in parts and isolate parts against each other •  Avoid cascading failures •  Foundation of resilient software design
  • 27. Bulkheads are not about thread pools!
  • 28. Bulkheads •  Core isolation pattern (a.k.a. “failure units” or “units of mitigation”) •  Diverse implementation choices available, e.g., µservice, actor, scs, ... •  Implementation choice impacts system and resilience design a lot •  Shaping good bulkheads is extremely hard (pure design issue)
  • 29. Sounds easy. Where is the problem?
  • 30. Service A Service B Request Due to functional design, Service A always needs backing from Service B to be able to answer a client request, i.e. the isolation is broken by design How do we avoid this …
  • 31. Service Request Due to functional design we need to call a lot of services to be able to answer a client request, i.e. availability is broken by design ... and this ... Service Service Service Service Service Service Service Service Service Service Service Service
  • 32. Mothership Service (a.k.a. Monolith) Request By trying to avoid the aforementioned issues we ended up with cramming all required functionality in one big service i.e. the isolation is broken by design ... without ending up with this?
  • 33. Let’s use the well-known best practices •  Divide & conquer a.k.a. functional decomposition •  DRY (Don’t Repeat Yourself) •  Design for reusability •  Layered architecture •  …
  • 35. Service A Service B Request Due to functional design, Service A always needs backing from Service B to be able to answer a client request, i.e. the isolation is broken by design ... this usually leads to this ...
  • 36. Service Request Due to functional design we need to call a lot of services to be able to answer a client request, i.e. availability is broken by design ... and this ... Service Service Service Service Service Service Service Service Service Service Service Service
  • 37. Mothership Service (a.k.a. Monolith) Request By trying to avoid the aforementioned issues we ended up with cramming all required functionality in one big service i.e. the isolation is broken by design ... and in the end also often to this.
  • 39. Caches to the rescue!
  • 40. Service A Service B Request Due to functional design, Service A always needs backing from Service B to be able to answer a client request, i.e. the isolation is broken by design CacheofB Break tight service coupling by caching data/responses of downstream service
  • 41. Caches to the rescue?
  • 42. Do you really think
 that copying stale data all over your system is a suitable measure
 to fix an inherently broken design?
  • 43. We have to re-learn design for distributed systems!
  • 45. You need lots of those …
  • 46. ... maybe some of those
  • 47. Then it is a lot of hard work …
  • 48. ... and there is no silver bullet
  • 49. Yet, a few guiding thoughts
 about bulkhead design …
  • 50. Foundations of design •  High cohesion, low coupling •  Separation of concerns •  Crucial across process boundaries •  Still poorly understood issue •  Start with •  Understanding organizational boundaries •  Understanding use cases and flows •  Identifying functional domains (à DDD) •  Finding areas that change independently •  Do not start with a data model!
  • 51. Short activation paths •  Long activation paths affect availability •  Increase latency and likelihood of failures •  Minimize remote calls per request •  Need to balance opposing forces •  Avoid monolith à clear separation of concerns •  Minimize requests à cluster functionality & data •  Caches sometimes help, but stale data as trade-off
  • 52. Dismiss reusability •  Reusability increases coupling •  Reusability leads to bad service design •  Reusability compromises availability •  Reusability rarely pays •  Do not strive for reuse •  Strive for replaceability instead
  • 55. Communication paradigm •  Request-response <-> messaging <-> events <-> … •  Heavily influences resilience patterns to be used •  Also heavily influences functional bulkhead design •  Very fundamental decision which is often underestimated
  • 56. µS Request/Response : Horizontal slicing Flow / Process µS µS µS µS µS µS Event-driven : Vertical slicing µS µS µS µS µS Flow / Process
  • 57. Synchronous R/R vs. asynchronous events •  Decomposition •  Vertically divide-and-conquer vs. horizontally go-with-the-flow •  Coordination •  Coordination logic/services and orchestration vs. event chains and choreography •  Transactions •  Built-in transaction handling vs. external supervision •  Error handling •  Built into service vs. escalation/supervision strategy •  Separation of concerns •  Multiple responsibilities service vs. single responsibility services •  Encapsulation •  Domain logic distributed across services vs. domain logic in one place •  Reusability vs. Replaceability •  Complexity •  A draw …
  • 58. The communication paradigm influences
 the functional service design a lot and also the resilience patterns to be used
  • 59. Example: order fulfillment •  Simple order, credit card, non-digital items •  Add coupons incl. validation •  Add promotions incl. usage notification •  Add bonus card incl. purchase notification •  Customer accounts as payment type •  PayPal as payment type •  Integrate digital music library •  Integrate digital video library •  Integrate e-book library
  • 60. Design exercise – Part 1 Create a bulkhead design for the case study •  Use one communication paradigm •  Synchronous request/response (e.g., REST) •  Asynchronous messaging (e.g., Akka) •  Asynchronous events (e.g., Pub/Sub) •  Assume incremental requirements •  How many services do you need to touch •  What about the functional isolation of the services •  How big/maintainable are the resulting services •  Take a few notes
  • 61. Online Shop Checkout Credit Card Provider Warehouse System Coupon Management Campaign Management PayPal Loyalty Management Accounts Receivables Music Library E-Book Library Video Library E-Mail Server Customer pressed “Buy now” ?
  • 62. Order Fulfillment
 Service Online Shop Payment
 Service Credit Card Provider Shipment
 Service Warehouse System <Foreign Service> <Own Service> Coupon Management Promotion Campaign Management Loyalty Account Service Payment Provider PayPal Loyalty Management Accounts Receivables Music Library E-Book Library Video Library E-Mail Server Coupon Credit Card Coordinate Warehouse Coordinate Assets Notify Cust. PayPal Coordinate
  • 63. Order confirmed Online Shop Credit Card Provider Warehouse System <Foreign Service> <Own Service> Coupon Management Campaign Management Account service Credit Card Service Loyalty Management Accounts Receivables Music Library E-Book Library Video Library E-Mail Server PayPal PayPal Service Warehouse Service Promotion Service Bonus Card Service Coupon Service Music Library Service Video Library Service E-Book Library Service Notification Service Payment authorized Digital asset provisioned Payment failed <Event> Order fulfillment supervisor Track flow of events Reschedule events in case of failure Services are responsible to eventually succeed or fail for good, usually incorporating a supervision/escalation hierarchy for that
  • 64. Do not limit your design options upfront without an important reason
  • 65. Wrap-up •  Today’s systems are distributed •  Failures are not avoidable, nor predictable •  Resilient software design needed •  Bulkhead design is •  crucial for application robustness •  poorly understood •  massively underrated •  different from traditional design best practices •  Communication paradigms broaden
 your bulkhead design options
  • 66. We have to re-learn design for distributed systems
  • 67. @ufried Uwe Friedrichsen | uwe.friedrichsen@codecentric.de | http://slideshare.net/ufried | http://ufried.tumblr.com