SlideShare une entreprise Scribd logo
1  sur  57
Télécharger pour lire hors ligne
The Resilience Patterns
your Microservices Teams
Should Know
Victor Rentea | https://victorrentea.ro | @victorrentea
👉 victorrentea.ro/training-offer
VictorRentea.ro
👋 I'm Victor Rentea 🇷🇴 PhD(CS), Java Champion
18 years of code
10 years training bright developers at 120+ EU companies:
❤ Clean Code, Tes+ng, Architecture
🛠 Java, Spring, Hibernate, Reac+ve
⚡ Java Performance, Secure Coding
Educa9ve Talks on YouTube.com/vrentea
European So=ware Cra=ers Community (6K devs)
👉 Join for free monthly events at victorrentea.ro/community
Life += 👪 + 🐈 + 🌷garden
3 VictorRentea.ro
Benefits of Microservices
ü Faster Time-to-Market: 😁 Business
ü Lower Cognitive Load: 😁 Developers
ü Technology Upgrade/Change
ü Scalability for the 🔥hot parts that require it
ü Availability, tolerance to partial failures
4 VictorRentea.ro
But we're safe.
We're using HTTP between our services.
😎
5 VictorRentea.ro
Fallacies of Distributed Computing
The network is reliable
Latency is zero
Bandwidth is infinite
Transport cost is zero
The network is secure
Fixed topology
Has one administrator
The network is homogeneous
6 VictorRentea.ro
A distributed system is one in which
the failure of a computer
you didn't even know existed
can render your own computer unusable.
Leslie Lamport
7 VictorRentea.ro
production
deploy
8 VictorRentea.ro
=
𝑢𝑝𝑡𝑖𝑚𝑒
𝑡𝑜𝑡𝑎𝑙_𝑡𝑖𝑚𝑒
𝐴𝑣𝑎𝑖𝑙𝑎𝑏𝑖𝑙𝑖𝑡𝑦
9 VictorRentea.ro
=
𝑀𝑇𝑇𝐹
𝑀𝑇𝑇𝐹 + 𝑀𝑇𝑇𝑅
Mean Time To Failure (crash)
Mean Time To Recovery (downtime)
⬆
Write more tests: unit-, integration-, smoke-, end-to-end-
Also: load-, spike-, resilience- (see ToxyProxy by shopify)
⬆
⬇
Faster Recovery
𝐴𝑣𝑎𝑖𝑙𝑎𝑏𝑖𝑙𝑖𝑡𝑦
10 VictorRentea.ro
Resilience Pa+erns
11 VictorRentea.ro
The ability of a system to handle unexpected situations
- without users noticing it (best case), or
- with a graceful degradation of service
Resilience
12 VictorRentea.ro
👉 A Query reading data can failover to:
- Return an older value, eg. cached 5 minutes ago
- Return a lower quality response: eg. NETFLIX: per-country, not client-tailored >tles
- Call a slower system: eg. search in SQL DB when ES is down
- Send results later: "We'll email you the results when done"
👉 A Command changing data:
- Outbox table paJern: insert it in DB + schedule⏱ a persistent retry
- Send it as a message instead of HTTP
- Log an error raising an alarm🔔 calling for manual interven+on
- Send an error event to a supervisor for automa+c recovery ("saga" paNern)
Graceful Degrada5on
13 VictorRentea.ro
ISOLATION
LOOSE COUPLING
LATENCY CONTROL
SUPERVISION
14 VictorRentea.ro
The enKre system becomes unavailable
↓
Split the system in separate pieces and
isolate those pieces against each other.
Catastrophic Failure
15 VictorRentea.ro
§Long chains of REST calls A à B à C à D à E
§A request causes the instance to restart & client keeps retrying
§A "poison pill" message is retried infinitely, blocking the listener
§Unbounded queues kept in memory causes OutOfMemoryError
§Concurrent massive import/export overloads the DB
☠ Isola7on An7-Pa;erns ☠
16 VictorRentea.ro
Bulkhead
Isolated Failure
17 VictorRentea.ro
Bulkhead
Isolated Failure
18 VictorRentea.ro
Bulkhead
Isolated Failure
The ship does not sink! 🎉
19 VictorRentea.ro
§Core isolaKon paVern
§Pure design issue: What should be independent of what ?
§Used as units of redundancy
§Used as units of scale
Bulkhead
(aka Failure Units)
Microservices Resilience Pa/erns by jRebel
20 VictorRentea.ro
§Key Features: catalog, search, checkout
§Markets / Regions
§Tenants ☁
Isolate them using separate:
§ConnecKon-/ Thread- Pools
§ApplicaKon Instances
§Databases, Queues
Bulkhead Examples
21 VictorRentea.ro
22 VictorRentea.ro
§Limit the load on a server to
- 💥 Prevent a crash: a few 503 are beNer than OutOfMemoryError
- ⏱ Preserve response +me: error is beNer than OK aTer 60 seconds
- ⚠ Protect cri+cal endpoints: place order over export invoices
- ⚖ Ensure fairness: return 429 to greedy clients/tenants
- 💲 Limit auto-scaling to fit budget
§What to throEle:
- Request Rate: max 300 requests/second, eg via @RateLimiter
- Concurrency: max 3 exports in parallel, eg via @BulkHead
- Traffic: max 1 GB/minute, or cost: max 300 credits/minute
Thro=ling
23 VictorRentea.ro
C degraded
B throNled
(stopped)
A untouched
as it's cri>cal
Thro%ling Features
24 VictorRentea.ro
Usage Pa=ern
Spike
25 VictorRentea.ro
Sweet Spot
ß best performance à
Performance Response Curve
throughput
# requests
completed / sec
by one machine
# concurrent requests - load
Enqueuing excess load
can improve overall performance
response_time = queue_waiting + execution_time
Monitored.
Bounded.
💥
26 VictorRentea.ro
§Obvious, yet oen neglected
§Validate data when you see it: client requests & API responses
§But don't go too far: validate only what you care about👉
Full Parameter Check
27 VictorRentea.ro
Be conservative in what you do,
but liberal in what you accept from others.
-- Robustness Principle
(aka Postel's Law)
28 VictorRentea.ro
ISOLATION
LOOSE COUPLING
LATENCY CONTROL
SUPERVISION
Thro%ling
Bulkhead
Complete
Parameter
Checking
Bounded
Queues
29 VictorRentea.ro
Latency Control
30 VictorRentea.ro
§Set a Kmeout every Kme you block👍
§If too large
- Impacts my response >me
- ⚠ Mind the defaults: RestTemplate=1 minute, WebClient=unbounded😱
§If too short
- False errors: the opera>on might succeed later on the server
- Keep above the API measured/SLA response >me (max or 99%th)
- Tailor per endpoint, eg: longer for /export
§Monitoring + Alarms🔔
Timeout
31 VictorRentea.ro
A call failed or ,med out > I want to retry it.
Key ques,ons to ask?🤔
Error cause: 400, 500, Kmeout, {retryable:false} in response?..
Worth retrying?
Max aVempts?
What backoff?
How to monitor?
Is the operaKon idempotent?
32 VictorRentea.ro
An operaEon is idempotent if repeaEng it doesn't change any state
§Get product by id via GET ?
- ✅ YES: no state changes on server
§Cancel Payment by id via HTTP DELETE or message
- ✅ YES: canceling it again won't change anything
§Update Product price by id via PUT or message
- ✅ YES: the price stays unchanged on server
§Place Order { items: [..] } via POST or message
- ❌ NO the retry could create a second order
- ✅ YES, detect duplicates via pastHourOrders = Map<customerId, List<hash(order)>>
§Place Order { id: UUID, items: [..] } (client generated id) via PUT or message
- ✅ YES: a duplicate would cause a PK/UK viola>on
Idempotency
33 VictorRentea.ro
Happy Retrying...
34 VictorRentea.ro
Over the last 5 minutes,
99% requests to a system
failed or timed-out.
What are the chances a new call will succeed?
35 VictorRentea.ro
If you know you're going to fail,
at least fail fast.
36 VictorRentea.ro
Circuit Breaker
🛑 OPEN
✅ CLOSED
🛑 OPEN = flow is stopped
Introduced by Michael Nygard in his book Release It h"ps://www.amazon.com/Release-Produc7on-Ready-So:ware-Pragma7c-Programmers/dp/0978739213
37 VictorRentea.ro
Circuit Breaker
✅ CLOSED
allow all calls, coun;ng
OK, errors, ;med-out
🛑 OPEN
stop all calls for a while
🟡 HALF-OPEN
allow only a few calls (eg: 2),
to test if server is back
any request failed
⏱ delay
all requests=OK
failed % > threshold (eg. >99%)
over last 100 calls/seconds
START
Fail fast to save client resources.
Let server recover.
Hit server gently aTer recovery.
38 VictorRentea.ro
§When calling a fragile/slow systems: SAP, Siebel, external API
§Between services in different bulkheads
§At API Gateway level (entry point to ecosystem)
Where to Implement These?
Timeout
Retry
Circuit
Breaker
ThroYling
Fallback
39 VictorRentea.ro
Bounded
Queues
ISOLATION
LOOSE COUPLING
LATENCY CONTROL
SUPERVISION
Fan out &
quickest reply
Retry
Circuit Breaker
Timeouts
Fail Fast
Thro%ling
Bulkhead
Complete
Parameter
Checking
Idempotency
SLA
40 VictorRentea.ro
Loose Coupling
41 VictorRentea.ro
§Keeping state inside a service can impact:
- Consistency: all nodes must sync it
- Availability: failover nodes must replicate it
- Scalability: new instances must copy it
§Move state out 👍
- To databases
- To clients (browser, mobile) if related to UI flow
- Via request tokens if user metadata (email, roles..)
Stateless Service
42 VictorRentea.ro
Loca,on Transparency
43 VictorRentea.ro
§Guaranteed delivery, even if the receiver is unavailable now
§Prevent cascading failures between bulkheads
§Breaks the call stack paradigm -- thus, apparently hard
§With resilience concerns, REST also becomes scary
§Sender should NOT need a response or status – design change
§DON'T expect that your changes are applied instantaneously
è Eventual Consistency ç
Asynchronous Communica,on (MQ)
44 VictorRentea.ro
You can either be
Consistent or Available
when Partitioned
CAP Theorem
All data is in
perfect sync
You get
an answer
Using
mulEple machines
45 VictorRentea.ro
§Examples:
- One of the items in stock is accidentally broken during handling
- The product page displays stock: 1, but by the Eme the customer clicks
"add to cart", someone else already bought it!
è Relax, and embrace Eventual Consistency
- A"er you understand the busine$$ impact
The Real World is not Consistent
46 VictorRentea.ro
§State moves via events à less REST J
§Lossless capture of history
§Ability to =me-travel by replaying events
§Allows scaling up the read flows separately (CQRS)
Event-Sourcing
47 VictorRentea.ro
48 VictorRentea.ro
§Poison pill message è send to dead-leEer or retry queue
§Duplicated messages è idempotent consumer
§Out-of-order messages è aggregator, Kme-based- on smart-
§Data privacy è Claim Check paVern
§Event versioning è Backward-forward compaPbilty
§MisconfiguraKon of infrastructure
Messaging PiIalls
49 VictorRentea.ro
Bounded
Queues
ISOLATION
LOOSE COUPLING
LATENCY CONTROL
SUPERVISION
Fan out &
quickest reply
Retry
Circuit Breaker
Timeouts
Fail Fast
Thro%ling
Bulkhead
Complete
Parameter
Checking
Error Handler
Asynchronous
Communica;on
Loca;on
Transparency
Idempotency
Event-Driven
Stateless
Eventual
Consistency
Monitor
Escala;on
SLA
Also see: "Pa:erns of resilience" talk by Uwe Friedrichsen
Health Check
50 VictorRentea.ro
Clients cannot / should not
handle downstream errors
51 VictorRentea.ro
People
No matter what a problem seems to be at first,
it's always a people problem.
- First Law of Consulting
52 VictorRentea.ro
To build highly-resilient systems,
you need DevOps teams
53 VictorRentea.ro
Dev Ops
54 VictorRentea.ro
55 VictorRentea.ro
Own Your Produc8on
See metrics & set alarms
Prometheus+Grafana..
Distributed request tracing
OpenTelemetry
Log aggregaPon
ELK...
Time span samples
Zipkin...
SHOW STOPPER
if lacking these
56 VictorRentea.ro
§IdenKfy bulkheads (failure units)
§Sync calls += Kmeout, retry, throVling, circuit breaker, failover
§Slow responses can cause cascading failures
§Slow down to move faster
§If you fail, at least fail fast
§Embrace eventual consistency and favor async messages
Key Points
The Resilience Patterns
your Microservices Teams
Should Know
Victor Rentea | https://victorrentea.ro | @victorrentea
Thank you!
Join my community:

Contenu connexe

Tendances

Massive service basic
Massive service basicMassive service basic
Massive service basicDaeMyung Kang
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large ScaleVerverica
 
ETL With Cassandra Streaming Bulk Loading
ETL With Cassandra Streaming Bulk LoadingETL With Cassandra Streaming Bulk Loading
ETL With Cassandra Streaming Bulk Loadingalex_araujo
 
〈야생의 땅: 듀랑고〉 서버 아키텍처 Vol. 3
〈야생의 땅: 듀랑고〉 서버 아키텍처 Vol. 3〈야생의 땅: 듀랑고〉 서버 아키텍처 Vol. 3
〈야생의 땅: 듀랑고〉 서버 아키텍처 Vol. 3Heungsub Lee
 
강성훈, 실버바인 대기열 서버 설계 리뷰, NDC2019
강성훈, 실버바인 대기열 서버 설계 리뷰, NDC2019강성훈, 실버바인 대기열 서버 설계 리뷰, NDC2019
강성훈, 실버바인 대기열 서버 설계 리뷰, NDC2019devCAT Studio, NEXON
 
jemalloc 세미나
jemalloc 세미나jemalloc 세미나
jemalloc 세미나Jang Hoon
 
게임 분산 서버 구조
게임 분산 서버 구조게임 분산 서버 구조
게임 분산 서버 구조Hyunjik Bae
 
Blockchain Intro to Hyperledger Fabric
Blockchain Intro to Hyperledger Fabric Blockchain Intro to Hyperledger Fabric
Blockchain Intro to Hyperledger Fabric Araf Karsh Hamid
 
Data Loss and Duplication in Kafka
Data Loss and Duplication in KafkaData Loss and Duplication in Kafka
Data Loss and Duplication in KafkaJayesh Thakrar
 
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...DoKC
 
Python 게임서버 안녕하십니까 : RPC framework 편
Python 게임서버 안녕하십니까 : RPC framework 편Python 게임서버 안녕하십니까 : RPC framework 편
Python 게임서버 안녕하십니까 : RPC framework 편준철 박
 
Ndc14 분산 서버 구축의 ABC
Ndc14 분산 서버 구축의 ABCNdc14 분산 서버 구축의 ABC
Ndc14 분산 서버 구축의 ABCHo Gyu Lee
 
파이썬 생존 안내서 (자막)
파이썬 생존 안내서 (자막)파이썬 생존 안내서 (자막)
파이썬 생존 안내서 (자막)Heungsub Lee
 
Kafka Summit 2021 - Apache Kafka meets workflow engines
Kafka Summit 2021 - Apache Kafka meets workflow enginesKafka Summit 2021 - Apache Kafka meets workflow engines
Kafka Summit 2021 - Apache Kafka meets workflow enginesBernd Ruecker
 
Using Camunda on Kubernetes through Operators
Using Camunda on Kubernetes through OperatorsUsing Camunda on Kubernetes through Operators
Using Camunda on Kubernetes through Operatorscamunda services GmbH
 
[야생의 땅: 듀랑고] 서버 아키텍처 - SPOF 없는 분산 MMORPG 서버
[야생의 땅: 듀랑고] 서버 아키텍처 - SPOF 없는 분산 MMORPG 서버[야생의 땅: 듀랑고] 서버 아키텍처 - SPOF 없는 분산 MMORPG 서버
[야생의 땅: 듀랑고] 서버 아키텍처 - SPOF 없는 분산 MMORPG 서버Heungsub Lee
 
[야생의 땅: 듀랑고] 서버 아키텍처 Vol. 2 (자막)
[야생의 땅: 듀랑고] 서버 아키텍처 Vol. 2 (자막)[야생의 땅: 듀랑고] 서버 아키텍처 Vol. 2 (자막)
[야생의 땅: 듀랑고] 서버 아키텍처 Vol. 2 (자막)Heungsub Lee
 
The Dual write problem
The Dual write problemThe Dual write problem
The Dual write problemJeppe Cramon
 
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...Henning Jacobs
 
Grafana Loki: like Prometheus, but for Logs
Grafana Loki: like Prometheus, but for LogsGrafana Loki: like Prometheus, but for Logs
Grafana Loki: like Prometheus, but for LogsMarco Pracucci
 

Tendances (20)

Massive service basic
Massive service basicMassive service basic
Massive service basic
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large Scale
 
ETL With Cassandra Streaming Bulk Loading
ETL With Cassandra Streaming Bulk LoadingETL With Cassandra Streaming Bulk Loading
ETL With Cassandra Streaming Bulk Loading
 
〈야생의 땅: 듀랑고〉 서버 아키텍처 Vol. 3
〈야생의 땅: 듀랑고〉 서버 아키텍처 Vol. 3〈야생의 땅: 듀랑고〉 서버 아키텍처 Vol. 3
〈야생의 땅: 듀랑고〉 서버 아키텍처 Vol. 3
 
강성훈, 실버바인 대기열 서버 설계 리뷰, NDC2019
강성훈, 실버바인 대기열 서버 설계 리뷰, NDC2019강성훈, 실버바인 대기열 서버 설계 리뷰, NDC2019
강성훈, 실버바인 대기열 서버 설계 리뷰, NDC2019
 
jemalloc 세미나
jemalloc 세미나jemalloc 세미나
jemalloc 세미나
 
게임 분산 서버 구조
게임 분산 서버 구조게임 분산 서버 구조
게임 분산 서버 구조
 
Blockchain Intro to Hyperledger Fabric
Blockchain Intro to Hyperledger Fabric Blockchain Intro to Hyperledger Fabric
Blockchain Intro to Hyperledger Fabric
 
Data Loss and Duplication in Kafka
Data Loss and Duplication in KafkaData Loss and Duplication in Kafka
Data Loss and Duplication in Kafka
 
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
 
Python 게임서버 안녕하십니까 : RPC framework 편
Python 게임서버 안녕하십니까 : RPC framework 편Python 게임서버 안녕하십니까 : RPC framework 편
Python 게임서버 안녕하십니까 : RPC framework 편
 
Ndc14 분산 서버 구축의 ABC
Ndc14 분산 서버 구축의 ABCNdc14 분산 서버 구축의 ABC
Ndc14 분산 서버 구축의 ABC
 
파이썬 생존 안내서 (자막)
파이썬 생존 안내서 (자막)파이썬 생존 안내서 (자막)
파이썬 생존 안내서 (자막)
 
Kafka Summit 2021 - Apache Kafka meets workflow engines
Kafka Summit 2021 - Apache Kafka meets workflow enginesKafka Summit 2021 - Apache Kafka meets workflow engines
Kafka Summit 2021 - Apache Kafka meets workflow engines
 
Using Camunda on Kubernetes through Operators
Using Camunda on Kubernetes through OperatorsUsing Camunda on Kubernetes through Operators
Using Camunda on Kubernetes through Operators
 
[야생의 땅: 듀랑고] 서버 아키텍처 - SPOF 없는 분산 MMORPG 서버
[야생의 땅: 듀랑고] 서버 아키텍처 - SPOF 없는 분산 MMORPG 서버[야생의 땅: 듀랑고] 서버 아키텍처 - SPOF 없는 분산 MMORPG 서버
[야생의 땅: 듀랑고] 서버 아키텍처 - SPOF 없는 분산 MMORPG 서버
 
[야생의 땅: 듀랑고] 서버 아키텍처 Vol. 2 (자막)
[야생의 땅: 듀랑고] 서버 아키텍처 Vol. 2 (자막)[야생의 땅: 듀랑고] 서버 아키텍처 Vol. 2 (자막)
[야생의 땅: 듀랑고] 서버 아키텍처 Vol. 2 (자막)
 
The Dual write problem
The Dual write problemThe Dual write problem
The Dual write problem
 
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
 
Grafana Loki: like Prometheus, but for Logs
Grafana Loki: like Prometheus, but for LogsGrafana Loki: like Prometheus, but for Logs
Grafana Loki: like Prometheus, but for Logs
 

Similaire à Microservice Resilience Patterns @VoxxedCern'24

Testing Microservices @DevoxxBE 23.pdf
Testing Microservices @DevoxxBE 23.pdfTesting Microservices @DevoxxBE 23.pdf
Testing Microservices @DevoxxBE 23.pdfVictor Rentea
 
The Real World - Plugging the Enterprise Into It (nodejs)
The Real World - Plugging  the Enterprise Into It (nodejs)The Real World - Plugging  the Enterprise Into It (nodejs)
The Real World - Plugging the Enterprise Into It (nodejs)Aman Kohli
 
Webinar slides: 9 DevOps Tips for Going in Production with Galera Cluster for...
Webinar slides: 9 DevOps Tips for Going in Production with Galera Cluster for...Webinar slides: 9 DevOps Tips for Going in Production with Galera Cluster for...
Webinar slides: 9 DevOps Tips for Going in Production with Galera Cluster for...Severalnines
 
Velocity 2016 - Operational Excellence with Hystrix
Velocity 2016 - Operational Excellence with HystrixVelocity 2016 - Operational Excellence with Hystrix
Velocity 2016 - Operational Excellence with HystrixBilly Yuen
 
3DConsulting_Presentation
3DConsulting_Presentation3DConsulting_Presentation
3DConsulting_PresentationJoseph Baca
 
Effective Akka v2
Effective Akka v2Effective Akka v2
Effective Akka v2shinolajla
 
Distributed Consistency.pdf
Distributed Consistency.pdfDistributed Consistency.pdf
Distributed Consistency.pdfVictor Rentea
 
Azure Cloud Patterns
Azure Cloud PatternsAzure Cloud Patterns
Azure Cloud PatternsTamir Dresher
 
How to build a scalable SNS via Polling & Push
How to build a scalable SNS via Polling & PushHow to build a scalable SNS via Polling & Push
How to build a scalable SNS via Polling & PushMu Chun Wang
 
Evolveum: IDM story for a growing company
Evolveum: IDM story for a growing companyEvolveum: IDM story for a growing company
Evolveum: IDM story for a growing companyEvolveum
 
The Art of Unit Testing - Towards a Testable Design
The Art of Unit Testing - Towards a Testable DesignThe Art of Unit Testing - Towards a Testable Design
The Art of Unit Testing - Towards a Testable DesignVictor Rentea
 
Approaches for application request throttling - Cloud Developer Days Poland
Approaches for application request throttling - Cloud Developer Days PolandApproaches for application request throttling - Cloud Developer Days Poland
Approaches for application request throttling - Cloud Developer Days PolandMaarten Balliauw
 
Idempotency of commands in distributed systems
Idempotency of commands in distributed systemsIdempotency of commands in distributed systems
Idempotency of commands in distributed systemsMax Małecki
 
New Generation Oracle RAC Performance
New Generation Oracle RAC PerformanceNew Generation Oracle RAC Performance
New Generation Oracle RAC PerformanceAnil Nair
 
High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...
High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...
High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...Chris Fregly
 
What's Missing? Microservices Meetup at Cisco
What's Missing? Microservices Meetup at CiscoWhat's Missing? Microservices Meetup at Cisco
What's Missing? Microservices Meetup at CiscoAdrian Cockcroft
 
Continuation_alan_20220503.pdf
Continuation_alan_20220503.pdfContinuation_alan_20220503.pdf
Continuation_alan_20220503.pdfShen yifeng
 
Zero Downtime JEE Architectures
Zero Downtime JEE ArchitecturesZero Downtime JEE Architectures
Zero Downtime JEE ArchitecturesAlexander Penev
 

Similaire à Microservice Resilience Patterns @VoxxedCern'24 (20)

Testing Microservices @DevoxxBE 23.pdf
Testing Microservices @DevoxxBE 23.pdfTesting Microservices @DevoxxBE 23.pdf
Testing Microservices @DevoxxBE 23.pdf
 
The Real World - Plugging the Enterprise Into It (nodejs)
The Real World - Plugging  the Enterprise Into It (nodejs)The Real World - Plugging  the Enterprise Into It (nodejs)
The Real World - Plugging the Enterprise Into It (nodejs)
 
Webinar slides: 9 DevOps Tips for Going in Production with Galera Cluster for...
Webinar slides: 9 DevOps Tips for Going in Production with Galera Cluster for...Webinar slides: 9 DevOps Tips for Going in Production with Galera Cluster for...
Webinar slides: 9 DevOps Tips for Going in Production with Galera Cluster for...
 
Velocity 2016 - Operational Excellence with Hystrix
Velocity 2016 - Operational Excellence with HystrixVelocity 2016 - Operational Excellence with Hystrix
Velocity 2016 - Operational Excellence with Hystrix
 
3DConsulting_Presentation
3DConsulting_Presentation3DConsulting_Presentation
3DConsulting_Presentation
 
Devoxx2017
Devoxx2017Devoxx2017
Devoxx2017
 
Effective Akka v2
Effective Akka v2Effective Akka v2
Effective Akka v2
 
Distributed Consistency.pdf
Distributed Consistency.pdfDistributed Consistency.pdf
Distributed Consistency.pdf
 
Azure Cloud Patterns
Azure Cloud PatternsAzure Cloud Patterns
Azure Cloud Patterns
 
How to build a scalable SNS via Polling & Push
How to build a scalable SNS via Polling & PushHow to build a scalable SNS via Polling & Push
How to build a scalable SNS via Polling & Push
 
Evolveum: IDM story for a growing company
Evolveum: IDM story for a growing companyEvolveum: IDM story for a growing company
Evolveum: IDM story for a growing company
 
Scaling GraphQL Subscriptions
Scaling GraphQL SubscriptionsScaling GraphQL Subscriptions
Scaling GraphQL Subscriptions
 
The Art of Unit Testing - Towards a Testable Design
The Art of Unit Testing - Towards a Testable DesignThe Art of Unit Testing - Towards a Testable Design
The Art of Unit Testing - Towards a Testable Design
 
Approaches for application request throttling - Cloud Developer Days Poland
Approaches for application request throttling - Cloud Developer Days PolandApproaches for application request throttling - Cloud Developer Days Poland
Approaches for application request throttling - Cloud Developer Days Poland
 
Idempotency of commands in distributed systems
Idempotency of commands in distributed systemsIdempotency of commands in distributed systems
Idempotency of commands in distributed systems
 
New Generation Oracle RAC Performance
New Generation Oracle RAC PerformanceNew Generation Oracle RAC Performance
New Generation Oracle RAC Performance
 
High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...
High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...
High Performance TensorFlow in Production - Big Data Spain - Madrid - Nov 15 ...
 
What's Missing? Microservices Meetup at Cisco
What's Missing? Microservices Meetup at CiscoWhat's Missing? Microservices Meetup at Cisco
What's Missing? Microservices Meetup at Cisco
 
Continuation_alan_20220503.pdf
Continuation_alan_20220503.pdfContinuation_alan_20220503.pdf
Continuation_alan_20220503.pdf
 
Zero Downtime JEE Architectures
Zero Downtime JEE ArchitecturesZero Downtime JEE Architectures
Zero Downtime JEE Architectures
 

Plus de Victor Rentea

From Web to Flux @DevoxxBE 2023.pptx
From Web to Flux @DevoxxBE 2023.pptxFrom Web to Flux @DevoxxBE 2023.pptx
From Web to Flux @DevoxxBE 2023.pptxVictor Rentea
 
The tests are trying to tell you something@VoxxedBucharest.pptx
The tests are trying to tell you something@VoxxedBucharest.pptxThe tests are trying to tell you something@VoxxedBucharest.pptx
The tests are trying to tell you something@VoxxedBucharest.pptxVictor Rentea
 
Vertical Slicing Architectures
Vertical Slicing ArchitecturesVertical Slicing Architectures
Vertical Slicing ArchitecturesVictor Rentea
 
Software Craftsmanship @Code Camp Festival 2022.pdf
Software Craftsmanship @Code Camp Festival 2022.pdfSoftware Craftsmanship @Code Camp Festival 2022.pdf
Software Craftsmanship @Code Camp Festival 2022.pdfVictor Rentea
 
Unit testing - 9 design hints
Unit testing - 9 design hintsUnit testing - 9 design hints
Unit testing - 9 design hintsVictor Rentea
 
Clean pragmatic architecture @ devflix
Clean pragmatic architecture @ devflixClean pragmatic architecture @ devflix
Clean pragmatic architecture @ devflixVictor Rentea
 
Extreme Professionalism - Software Craftsmanship
Extreme Professionalism - Software CraftsmanshipExtreme Professionalism - Software Craftsmanship
Extreme Professionalism - Software CraftsmanshipVictor Rentea
 
Clean architecture - Protecting the Domain
Clean architecture - Protecting the DomainClean architecture - Protecting the Domain
Clean architecture - Protecting the DomainVictor Rentea
 
Refactoring blockers and code smells @jNation 2021
Refactoring   blockers and code smells @jNation 2021Refactoring   blockers and code smells @jNation 2021
Refactoring blockers and code smells @jNation 2021Victor Rentea
 
Hibernate and Spring - Unleash the Magic
Hibernate and Spring - Unleash the MagicHibernate and Spring - Unleash the Magic
Hibernate and Spring - Unleash the MagicVictor Rentea
 
Integration testing with spring @JAX Mainz
Integration testing with spring @JAX MainzIntegration testing with spring @JAX Mainz
Integration testing with spring @JAX MainzVictor Rentea
 
The Proxy Fairy and the Magic of Spring @JAX Mainz 2021
The Proxy Fairy and the Magic of Spring @JAX Mainz 2021The Proxy Fairy and the Magic of Spring @JAX Mainz 2021
The Proxy Fairy and the Magic of Spring @JAX Mainz 2021Victor Rentea
 
Integration testing with spring @snow one
Integration testing with spring @snow oneIntegration testing with spring @snow one
Integration testing with spring @snow oneVictor Rentea
 
Pure functions and immutable objects @dev nexus 2021
Pure functions and immutable objects @dev nexus 2021Pure functions and immutable objects @dev nexus 2021
Pure functions and immutable objects @dev nexus 2021Victor Rentea
 
Definitive Guide to Working With Exceptions in Java - takj at Java Champions ...
Definitive Guide to Working With Exceptions in Java - takj at Java Champions ...Definitive Guide to Working With Exceptions in Java - takj at Java Champions ...
Definitive Guide to Working With Exceptions in Java - takj at Java Champions ...Victor Rentea
 
Don't Be Mocked by your Mocks - Best Practices using Mocks
Don't Be Mocked by your Mocks - Best Practices using MocksDon't Be Mocked by your Mocks - Best Practices using Mocks
Don't Be Mocked by your Mocks - Best Practices using MocksVictor Rentea
 
Pure Functions and Immutable Objects
Pure Functions and Immutable ObjectsPure Functions and Immutable Objects
Pure Functions and Immutable ObjectsVictor Rentea
 
Definitive Guide to Working With Exceptions in Java
Definitive Guide to Working With Exceptions in JavaDefinitive Guide to Working With Exceptions in Java
Definitive Guide to Working With Exceptions in JavaVictor Rentea
 

Plus de Victor Rentea (20)

From Web to Flux @DevoxxBE 2023.pptx
From Web to Flux @DevoxxBE 2023.pptxFrom Web to Flux @DevoxxBE 2023.pptx
From Web to Flux @DevoxxBE 2023.pptx
 
OAuth in the Wild
OAuth in the WildOAuth in the Wild
OAuth in the Wild
 
The tests are trying to tell you something@VoxxedBucharest.pptx
The tests are trying to tell you something@VoxxedBucharest.pptxThe tests are trying to tell you something@VoxxedBucharest.pptx
The tests are trying to tell you something@VoxxedBucharest.pptx
 
Vertical Slicing Architectures
Vertical Slicing ArchitecturesVertical Slicing Architectures
Vertical Slicing Architectures
 
Software Craftsmanship @Code Camp Festival 2022.pdf
Software Craftsmanship @Code Camp Festival 2022.pdfSoftware Craftsmanship @Code Camp Festival 2022.pdf
Software Craftsmanship @Code Camp Festival 2022.pdf
 
Unit testing - 9 design hints
Unit testing - 9 design hintsUnit testing - 9 design hints
Unit testing - 9 design hints
 
Clean pragmatic architecture @ devflix
Clean pragmatic architecture @ devflixClean pragmatic architecture @ devflix
Clean pragmatic architecture @ devflix
 
Extreme Professionalism - Software Craftsmanship
Extreme Professionalism - Software CraftsmanshipExtreme Professionalism - Software Craftsmanship
Extreme Professionalism - Software Craftsmanship
 
Clean architecture - Protecting the Domain
Clean architecture - Protecting the DomainClean architecture - Protecting the Domain
Clean architecture - Protecting the Domain
 
Refactoring blockers and code smells @jNation 2021
Refactoring   blockers and code smells @jNation 2021Refactoring   blockers and code smells @jNation 2021
Refactoring blockers and code smells @jNation 2021
 
Hibernate and Spring - Unleash the Magic
Hibernate and Spring - Unleash the MagicHibernate and Spring - Unleash the Magic
Hibernate and Spring - Unleash the Magic
 
Integration testing with spring @JAX Mainz
Integration testing with spring @JAX MainzIntegration testing with spring @JAX Mainz
Integration testing with spring @JAX Mainz
 
The Proxy Fairy and the Magic of Spring @JAX Mainz 2021
The Proxy Fairy and the Magic of Spring @JAX Mainz 2021The Proxy Fairy and the Magic of Spring @JAX Mainz 2021
The Proxy Fairy and the Magic of Spring @JAX Mainz 2021
 
Integration testing with spring @snow one
Integration testing with spring @snow oneIntegration testing with spring @snow one
Integration testing with spring @snow one
 
Pure functions and immutable objects @dev nexus 2021
Pure functions and immutable objects @dev nexus 2021Pure functions and immutable objects @dev nexus 2021
Pure functions and immutable objects @dev nexus 2021
 
TDD Mantra
TDD MantraTDD Mantra
TDD Mantra
 
Definitive Guide to Working With Exceptions in Java - takj at Java Champions ...
Definitive Guide to Working With Exceptions in Java - takj at Java Champions ...Definitive Guide to Working With Exceptions in Java - takj at Java Champions ...
Definitive Guide to Working With Exceptions in Java - takj at Java Champions ...
 
Don't Be Mocked by your Mocks - Best Practices using Mocks
Don't Be Mocked by your Mocks - Best Practices using MocksDon't Be Mocked by your Mocks - Best Practices using Mocks
Don't Be Mocked by your Mocks - Best Practices using Mocks
 
Pure Functions and Immutable Objects
Pure Functions and Immutable ObjectsPure Functions and Immutable Objects
Pure Functions and Immutable Objects
 
Definitive Guide to Working With Exceptions in Java
Definitive Guide to Working With Exceptions in JavaDefinitive Guide to Working With Exceptions in Java
Definitive Guide to Working With Exceptions in Java
 

Dernier

Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyFrank van der Linden
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...aditisharan08
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)Intelisync
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 

Dernier (20)

Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The Ugly
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the ProcessExploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 

Microservice Resilience Patterns @VoxxedCern'24

  • 1. The Resilience Patterns your Microservices Teams Should Know Victor Rentea | https://victorrentea.ro | @victorrentea
  • 2. 👉 victorrentea.ro/training-offer VictorRentea.ro 👋 I'm Victor Rentea 🇷🇴 PhD(CS), Java Champion 18 years of code 10 years training bright developers at 120+ EU companies: ❤ Clean Code, Tes+ng, Architecture 🛠 Java, Spring, Hibernate, Reac+ve ⚡ Java Performance, Secure Coding Educa9ve Talks on YouTube.com/vrentea European So=ware Cra=ers Community (6K devs) 👉 Join for free monthly events at victorrentea.ro/community Life += 👪 + 🐈 + 🌷garden
  • 3. 3 VictorRentea.ro Benefits of Microservices ü Faster Time-to-Market: 😁 Business ü Lower Cognitive Load: 😁 Developers ü Technology Upgrade/Change ü Scalability for the 🔥hot parts that require it ü Availability, tolerance to partial failures
  • 4. 4 VictorRentea.ro But we're safe. We're using HTTP between our services. 😎
  • 5. 5 VictorRentea.ro Fallacies of Distributed Computing The network is reliable Latency is zero Bandwidth is infinite Transport cost is zero The network is secure Fixed topology Has one administrator The network is homogeneous
  • 6. 6 VictorRentea.ro A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable. Leslie Lamport
  • 9. 9 VictorRentea.ro = 𝑀𝑇𝑇𝐹 𝑀𝑇𝑇𝐹 + 𝑀𝑇𝑇𝑅 Mean Time To Failure (crash) Mean Time To Recovery (downtime) ⬆ Write more tests: unit-, integration-, smoke-, end-to-end- Also: load-, spike-, resilience- (see ToxyProxy by shopify) ⬆ ⬇ Faster Recovery 𝐴𝑣𝑎𝑖𝑙𝑎𝑏𝑖𝑙𝑖𝑡𝑦
  • 11. 11 VictorRentea.ro The ability of a system to handle unexpected situations - without users noticing it (best case), or - with a graceful degradation of service Resilience
  • 12. 12 VictorRentea.ro 👉 A Query reading data can failover to: - Return an older value, eg. cached 5 minutes ago - Return a lower quality response: eg. NETFLIX: per-country, not client-tailored >tles - Call a slower system: eg. search in SQL DB when ES is down - Send results later: "We'll email you the results when done" 👉 A Command changing data: - Outbox table paJern: insert it in DB + schedule⏱ a persistent retry - Send it as a message instead of HTTP - Log an error raising an alarm🔔 calling for manual interven+on - Send an error event to a supervisor for automa+c recovery ("saga" paNern) Graceful Degrada5on
  • 14. 14 VictorRentea.ro The enKre system becomes unavailable ↓ Split the system in separate pieces and isolate those pieces against each other. Catastrophic Failure
  • 15. 15 VictorRentea.ro §Long chains of REST calls A à B à C à D à E §A request causes the instance to restart & client keeps retrying §A "poison pill" message is retried infinitely, blocking the listener §Unbounded queues kept in memory causes OutOfMemoryError §Concurrent massive import/export overloads the DB ☠ Isola7on An7-Pa;erns ☠
  • 19. 19 VictorRentea.ro §Core isolaKon paVern §Pure design issue: What should be independent of what ? §Used as units of redundancy §Used as units of scale Bulkhead (aka Failure Units) Microservices Resilience Pa/erns by jRebel
  • 20. 20 VictorRentea.ro §Key Features: catalog, search, checkout §Markets / Regions §Tenants ☁ Isolate them using separate: §ConnecKon-/ Thread- Pools §ApplicaKon Instances §Databases, Queues Bulkhead Examples
  • 22. 22 VictorRentea.ro §Limit the load on a server to - 💥 Prevent a crash: a few 503 are beNer than OutOfMemoryError - ⏱ Preserve response +me: error is beNer than OK aTer 60 seconds - ⚠ Protect cri+cal endpoints: place order over export invoices - ⚖ Ensure fairness: return 429 to greedy clients/tenants - 💲 Limit auto-scaling to fit budget §What to throEle: - Request Rate: max 300 requests/second, eg via @RateLimiter - Concurrency: max 3 exports in parallel, eg via @BulkHead - Traffic: max 1 GB/minute, or cost: max 300 credits/minute Thro=ling
  • 23. 23 VictorRentea.ro C degraded B throNled (stopped) A untouched as it's cri>cal Thro%ling Features
  • 25. 25 VictorRentea.ro Sweet Spot ß best performance à Performance Response Curve throughput # requests completed / sec by one machine # concurrent requests - load Enqueuing excess load can improve overall performance response_time = queue_waiting + execution_time Monitored. Bounded. 💥
  • 26. 26 VictorRentea.ro §Obvious, yet oen neglected §Validate data when you see it: client requests & API responses §But don't go too far: validate only what you care about👉 Full Parameter Check
  • 27. 27 VictorRentea.ro Be conservative in what you do, but liberal in what you accept from others. -- Robustness Principle (aka Postel's Law)
  • 28. 28 VictorRentea.ro ISOLATION LOOSE COUPLING LATENCY CONTROL SUPERVISION Thro%ling Bulkhead Complete Parameter Checking Bounded Queues
  • 30. 30 VictorRentea.ro §Set a Kmeout every Kme you block👍 §If too large - Impacts my response >me - ⚠ Mind the defaults: RestTemplate=1 minute, WebClient=unbounded😱 §If too short - False errors: the opera>on might succeed later on the server - Keep above the API measured/SLA response >me (max or 99%th) - Tailor per endpoint, eg: longer for /export §Monitoring + Alarms🔔 Timeout
  • 31. 31 VictorRentea.ro A call failed or ,med out > I want to retry it. Key ques,ons to ask?🤔 Error cause: 400, 500, Kmeout, {retryable:false} in response?.. Worth retrying? Max aVempts? What backoff? How to monitor? Is the operaKon idempotent?
  • 32. 32 VictorRentea.ro An operaEon is idempotent if repeaEng it doesn't change any state §Get product by id via GET ? - ✅ YES: no state changes on server §Cancel Payment by id via HTTP DELETE or message - ✅ YES: canceling it again won't change anything §Update Product price by id via PUT or message - ✅ YES: the price stays unchanged on server §Place Order { items: [..] } via POST or message - ❌ NO the retry could create a second order - ✅ YES, detect duplicates via pastHourOrders = Map<customerId, List<hash(order)>> §Place Order { id: UUID, items: [..] } (client generated id) via PUT or message - ✅ YES: a duplicate would cause a PK/UK viola>on Idempotency
  • 34. 34 VictorRentea.ro Over the last 5 minutes, 99% requests to a system failed or timed-out. What are the chances a new call will succeed?
  • 35. 35 VictorRentea.ro If you know you're going to fail, at least fail fast.
  • 36. 36 VictorRentea.ro Circuit Breaker 🛑 OPEN ✅ CLOSED 🛑 OPEN = flow is stopped Introduced by Michael Nygard in his book Release It h"ps://www.amazon.com/Release-Produc7on-Ready-So:ware-Pragma7c-Programmers/dp/0978739213
  • 37. 37 VictorRentea.ro Circuit Breaker ✅ CLOSED allow all calls, coun;ng OK, errors, ;med-out 🛑 OPEN stop all calls for a while 🟡 HALF-OPEN allow only a few calls (eg: 2), to test if server is back any request failed ⏱ delay all requests=OK failed % > threshold (eg. >99%) over last 100 calls/seconds START Fail fast to save client resources. Let server recover. Hit server gently aTer recovery.
  • 38. 38 VictorRentea.ro §When calling a fragile/slow systems: SAP, Siebel, external API §Between services in different bulkheads §At API Gateway level (entry point to ecosystem) Where to Implement These? Timeout Retry Circuit Breaker ThroYling Fallback
  • 39. 39 VictorRentea.ro Bounded Queues ISOLATION LOOSE COUPLING LATENCY CONTROL SUPERVISION Fan out & quickest reply Retry Circuit Breaker Timeouts Fail Fast Thro%ling Bulkhead Complete Parameter Checking Idempotency SLA
  • 41. 41 VictorRentea.ro §Keeping state inside a service can impact: - Consistency: all nodes must sync it - Availability: failover nodes must replicate it - Scalability: new instances must copy it §Move state out 👍 - To databases - To clients (browser, mobile) if related to UI flow - Via request tokens if user metadata (email, roles..) Stateless Service
  • 43. 43 VictorRentea.ro §Guaranteed delivery, even if the receiver is unavailable now §Prevent cascading failures between bulkheads §Breaks the call stack paradigm -- thus, apparently hard §With resilience concerns, REST also becomes scary §Sender should NOT need a response or status – design change §DON'T expect that your changes are applied instantaneously è Eventual Consistency ç Asynchronous Communica,on (MQ)
  • 44. 44 VictorRentea.ro You can either be Consistent or Available when Partitioned CAP Theorem All data is in perfect sync You get an answer Using mulEple machines
  • 45. 45 VictorRentea.ro §Examples: - One of the items in stock is accidentally broken during handling - The product page displays stock: 1, but by the Eme the customer clicks "add to cart", someone else already bought it! è Relax, and embrace Eventual Consistency - A"er you understand the busine$$ impact The Real World is not Consistent
  • 46. 46 VictorRentea.ro §State moves via events à less REST J §Lossless capture of history §Ability to =me-travel by replaying events §Allows scaling up the read flows separately (CQRS) Event-Sourcing
  • 48. 48 VictorRentea.ro §Poison pill message è send to dead-leEer or retry queue §Duplicated messages è idempotent consumer §Out-of-order messages è aggregator, Kme-based- on smart- §Data privacy è Claim Check paVern §Event versioning è Backward-forward compaPbilty §MisconfiguraKon of infrastructure Messaging PiIalls
  • 49. 49 VictorRentea.ro Bounded Queues ISOLATION LOOSE COUPLING LATENCY CONTROL SUPERVISION Fan out & quickest reply Retry Circuit Breaker Timeouts Fail Fast Thro%ling Bulkhead Complete Parameter Checking Error Handler Asynchronous Communica;on Loca;on Transparency Idempotency Event-Driven Stateless Eventual Consistency Monitor Escala;on SLA Also see: "Pa:erns of resilience" talk by Uwe Friedrichsen Health Check
  • 50. 50 VictorRentea.ro Clients cannot / should not handle downstream errors
  • 51. 51 VictorRentea.ro People No matter what a problem seems to be at first, it's always a people problem. - First Law of Consulting
  • 52. 52 VictorRentea.ro To build highly-resilient systems, you need DevOps teams
  • 55. 55 VictorRentea.ro Own Your Produc8on See metrics & set alarms Prometheus+Grafana.. Distributed request tracing OpenTelemetry Log aggregaPon ELK... Time span samples Zipkin... SHOW STOPPER if lacking these
  • 56. 56 VictorRentea.ro §IdenKfy bulkheads (failure units) §Sync calls += Kmeout, retry, throVling, circuit breaker, failover §Slow responses can cause cascading failures §Slow down to move faster §If you fail, at least fail fast §Embrace eventual consistency and favor async messages Key Points
  • 57. The Resilience Patterns your Microservices Teams Should Know Victor Rentea | https://victorrentea.ro | @victorrentea Thank you! Join my community: