SlideShare une entreprise Scribd logo
1  sur  68
Real-World Pulsar Architectural Patterns
Every pattern shown here has been developed and implemented with my
team at Overstock
Email: dbost@overstock.com
Twitter: DevinBost
LinkedIn: https://www.linkedin.com/in/devinbost/
By Devin Bost, Senior Data Engineer at Overstock
Distributed Caching + Distributed Tracing
+
Missing Stephen Bourke in Ireland and Sam Lowen, our new product manager.
Disclaimer: I’m assuming you
already know that Apache Pulsar is
the future of real-time messaging
= Pulsar topic
= Kafka topic
= Apache Bookkeeper
Tech Legend
Starting simple with Pub/Sub
Starting simple with Pub/Sub
Starting simple with Pub/Sub
Starting simple with Pub/Sub
Producer
Consumer
Alerts to end users
(e.g. Email, SMS, Twilio call, etc.)
Passthrough function
/ingest
/feeds
Diagram courtesy of
Thor Sigurjonsson
Producer Producer Producer
Consumer Consumer Consumer
Alerts to end users
(e.g. Email, SMS, Twilio call, etc.)
Passthrough function
Higher-availability:
/ingest
/feeds
Producer Producer Producer
Consumer Consumer Consumer
Passthrough function
Is this a safe approach to caching?
/ingest
/feeds
Producer Producer Producer
Consumer Consumer Consumer
Passthrough function
What happens if Redis goes down?
X
/ingest
/feeds
Producer Producer Producer
Consumer Consumer Consumer
Passthrough function
It’s much safer to use a
distributed cache
technology like Ignite
/ingest
/feeds
Producer Producer Producer
Consumer Consumer Consumer
Passthrough function
It’s much safer to use a
distributed cache
technology like Ignite
Smart Persistence
/ingest
/feeds
Producer Producer Producer
Consumer Consumer Consumer
Passthrough function
It’s much safer to use a
distributed cache
technology like Ignite
Smart Persistence
Faster than Redis
/ingest
/feeds
Producer Producer Producer
Consumer Consumer Consumer
Passthrough function
It’s much safer to use a
distributed cache
technology like Ignite
Smart Persistence
Faster than Redis
Supports tables with backing cache
/ingest
/feeds
Producer Producer Producer
Consumer Consumer Consumer
Passthrough function
It’s much safer to use a
distributed cache
technology like Ignite
Smart Persistence
Faster than Redis
Supports tables with backing cache
Supports transactions
/ingest
/feeds
What if you have a business-
critical service that can’t lose
messages?
“But, doesn’t Pulsar already
guarantee that messages
won’t be lost?”
There’s a difference between
Pulsar losing messages and your
application losing messages!
So, we introduce a backfill path.
Notice I avoided putting a SQL
database on the list
Replication
Persistent storage
Producer Producer Producer
Consumer Consumer Consumer
Alerts to end users
(e.g. Email, SMS, Twilio call, etc.)
Passthrough function
Batch job
Backfill Topic
Consumer Consumer Consumer
Alerts to end users
(e.g. Email, SMS, Twilio call, etc.)
OR
Etc
Message
delivered yet?
Message
delivered yet?
Kappa + Lambda architectures
/ingest
/feeds
Can we simplify it?
Replication
Persistent storage
Pulsar Function
Alerts to end users
(e.g. Email, SMS, Twilio call, etc.)
Alerts to end users
(e.g. Email, SMS, Twilio call, etc.)
Batch job OR
Etc
Producer Producer Producer
Passthrough function
Backfill Topic
Pulsar Function
Message
delivered yet?
Message
delivered yet?
You could add
another passthrough
function and topic if
you want more
isolation.
/ingest
/feeds
How about just for ingesting
data into a cache with a
backfill?
Option 1:
Web Service
Passthrough
Function
Persistent Append
Only Storage
Cache Sink
Batch Engine
(e.g. Spark, NiFi, etc.)
Read All Data
Loads into
existing
topic
OR
Starts Job
Etc
Replication
/ingest
/feeds
Option 2:
Achieves separation of concerns and
prevents QoS problems with live
traffic when running a backfill
Web Service
Passthrough
Function
Cache Sink
Batch Engine
(e.g. Spark, NiFi, etc.)
Loads into
backfill topic OR
Starts Job
Cache Sink
Persistent Append
Only Storage
Read All Data
Etc
Replication
/ingest
/feeds
Topic with Retention
Function
Cache Sink
Function
(stopped until
needing to backfill)
Exclusive
Mode
(Subscription stores in
Bookkeeper
automatically.)
Tiered Storage in
S3 or Google
Cloud
Backfill Cache Sink
Starts Function
Passthrough
FunctionOption 3:
Note: You need to ensure the
Bookkeeper cluster is fast enough
to keep up with the brokers or
your brokers’ memory will fill up
Also, this approach will only give
you a single backfill run unless
you have additional replication.
/feeds
What about adding caching to
a legacy application?
Legacy
SQL DB
Website
Enrich from cached data
Extract Relevant Data
Filter to desired clicks
(Raw Clicks)
Store in cache
Web Application
Emitting specific events
(Omitting
passthrough
details for
simplicity.)
Legacy
SQL DB
Website
Enrich from cached data
Extract Relevant Data
Filter to desired clicks
Store in cache
Web Application
Emitting specific events
You can also emit directly to Pulsar as a producer.
It’s simpler if you have the ability to touch the website code.
Legacy
SQL DB
Website
Enrich from cached data
Extract Relevant Data
Filter to desired clicks
Store in cache
Web Application
Emitting specific events
You can also emit directly to Pulsar as a producer.
It’s simpler if you have the ability to touch the website code.
However, the raw clicks flow still gets messy.
Legacy
SQL DB
Website
Enrich from cached data
Store in cache
Web Application
Emitting specific events
Event A
Event B
Event C
Cleaner & better separation of concerns to have
purposeful topics... Easier to debug & maintain.
What if you’re using a graph engine
for more complex query logic but
need that data in real-time?
If you don’t make it synchronous, you
will get race conditions when updating
and querying the graph!
Web Application
Emitting specific deltas
(e.g. state change, increment, etc.)
(2) Return on completion
(1) Write Change, Wait for Success
Get full record
Complex
graph query
Synchronous Update
Function
What if you need a more robust verification that deltas are in
order and aren’t duplicates?
(e.g. financially impacting increment/decrement values or
state variables)
Usually, it’s best to separate concerns into separate functions, like this.
Web Application
Emitting specific deltas
(e.g. state change, increment, etc.)
(2) Return on completion
(1) Write Change, Wait for Success
Get full record
Complex
graph query
Synchronous Update
Function
Gate Keeper Filter
Check if
message has
been seen
already
Duplicate
or late?
Drop messageYes
No
But, is that the right approach here?
What happens when you turn
up parallelism on the
functions?
Web Application
Emitting specific deltas
(e.g. state change, increment, etc.)
(2) Return on completion
(1) Write Change, Wait for Success
Get full record
Complex
graph query
Synchronous Update
Function
Synchronous Update
Function
Synchronous Update
Function
Synchronous Update
Function
Synchronous Update
Function
Synchronous Update
Function
Gate Keeper Filter
Gate Keeper Filter
Gate Keeper Filter
Gate Keeper Filter
Gate Keeper Filter
Gate Keeper Filter
In this case, the right approach is to
consolidate your logic to leverage the
transactional guarantees of your
database.
Web Application
(2) Return on completion
(1) Check if
duplicate or
outdated.
If not, write
Change, Wait for
Success
Get full record
Complex
graph query
Synchronous Update
Function
Leverage transactional guarantees of
your database! (Your function will
need to retry if a transaction fails.)
Web Application
Get full record
Complex
graph query
Synchronous Update
Function
Synchronous Update
Function
Synchronous Update
Function
Synchronous Update
Function
Leverage transactional guarantees of
your database! (Your function will
need to retry if a transaction fails.)
Web Application
Get full record
Complex
graph query
Synchronous Update
Function
Synchronous Update
Function
Synchronous Update
Function
Synchronous Update
Function
Always be mindful of how the behavior
might change when function
parallelism is turned up!
Leverage transactional guarantees of
your database! (Your function will
need to retry if a transaction fails.)
Web Application
Get full record
Complex
graph query
Synchronous Update
Function
Synchronous Update
Function
Synchronous Update
Function
Synchronous Update
Function
Always be mindful of how the behavior
might change when function
parallelism is turned up!
If making state changes,
be sure that you get
timestamps on your
upstream data contract
so you can verify if the
messages are in order!
Now, what happens when you
need to debug a large or
complex function flow?
FunctionFunction
Function
Web Application Function
Web Service
Web Application
Function
Function
Web Service Web Application
FunctionFunction
Function
Function
Function
Function
FunctionFunction
Function
Web Application Function
Web Service
Web Application
Function
Function
Web Service Web Application
FunctionFunction
Function
Function
Function
Function
What happens if some
messages seem to not be
reaching their destination?
FunctionFunction
Function
Web Application Function
Web Service
Web Application
Function
Function
Web Service Web Application
FunctionFunction
Function
Function
Function
Function
What happens if some
messages seem to not be
reaching their destination?
What happens if a message
isn’t getting transformed
correctly at some point or
null values are appearing?
FunctionFunction
Function
Web Application Function
Web Service
Web Application
Function
Function
Web Service Web Application
FunctionFunction
Function
Function
Function
Function
What happens if some
messages seem to not be
reaching their destination?
What happens if a message
isn’t getting transformed
correctly at some point or
null values are appearing?
What if we can’t modify the
function code (since it’s a
multi-tenant application)?
FunctionFunction
Function
Web Application Function
Web Service
Web Application
Function
Function
Web Service Web Application
FunctionFunction
Function
Function
Function
Function
What happens if some
messages seem to not be
reaching their destination?
What happens if a message
isn’t getting transformed
correctly at some point or
null values are appearing?
What if we can’t modify
the data contracts either
for the same reason?
What if we can’t modify the
function code (since it’s a
multi-tenant application)?
Function Function Function Function
Span 1
Span 2
Span 3
Span 4
Span 1
Trace
Function Function Function Function
Span 1
Span 2
Span 3
Span 4
Span 1
Trace
Function Function Function Function
Span 1
Span 2
Span 3
Span 4
Span 1
Trace
Jaeger is based on the OpenTracing standard
Check out the book, “Mastering Distributed Tracing” (2019) by Yuri Shkuro
Trace
Span Span Span
Span Span Span Span Span Span Span SpanSpan
Tags (Key/Value)
Logs (Key/Value + Timestamp)
Tags (Key/Value)
Logs (Key/Value + Timestamp)
Tags (Key/Value)
Logs (Key/Value + Timestamp)
. . .Span Span
Tags (Key/Value)
Logs (Key/Value + Timestamp)
4
5
7
3
Count By Key
e.g. Over a session-based window
+ + + OR
Metadata
parameters for the
TapFunction’s
envelope are
specified in its
config
Message1
Message2
Message3
Message1’
Message2’
Message3’
Message1’’
Message2’’
Message3’’
. . .Function1 Function1
JoinerFunction
Message2 Message2’ Message2’’
CorrelationId=productId-784 Message1 Message1’ Message1’’
CorrelationId=productId-142
Message3 Message3’ Message3’’CorrelationId=productId-923
Jaeger Sink
Taps wrap
message with
header containing:
CorrelationId,
Tenant,
Namespace,
Name,
timestamp,
etc.
TapFunction TapFunction TapFunction
SamplerFunction
CorrelationId=productId-784 Message1 Message1’ Message1’’
StartTimestamp
EndTimestamp
Span
CorrelationId=productId-784
StartTimestamp
EndTimestamp
Span
Message1
StartTimestamp
EndTimestamp
Span
Message1’
StartTimestamp
EndTimestamp
Span
Message1’’Message1’ Message1’’ Message1’’’
CorrelationId is
derived and put into
the envelope
produced by the
TapFunction.
{ "correlationKey": "productId",
"correlationValue": "20603199",
"correlationId": "productId-20603199”,
. . .
}
The TapFunction
defines the
correlationKey in
its Pulsar Config.
You can tap ANY topic!
Message1
Message2
Message3
Message1’
Message2’
Message3’
Message1’’
Message2’’
Message3’’
. . .Function1 Function1
Taps wrap
message with
header containing:
CorrelationId,
Tenant,
Namespace,
Name,
timestamp,
etc.
TapFunction TapFunction TapFunction
Uses Flink’s stateful join capability. It all happens in a keyed stream!
JoinerFunction
Message2 Message2’ Message2’’
CorrelationId=productId-784 Message1 Message1’ Message1’’
CorrelationId=productId-142
Message3 Message3’ Message3’’CorrelationId=productId-923
Allows us to specify rate to limit how
many messages are sampled. For dev,
this is set to 100% to allow all. This is just a simple Pulsar filter function.
SamplerFunction
CorrelationId=productId-784 Message1 Message1’ Message1’’
These Spans emit to Jaeger and can be stored in a
Cassandra or Elasticsearch backend for production.
OR
Jaeger Sink
StartTimestamp
EndTimestamp
Span
CorrelationId=productId-784
StartTimestamp
EndTimestamp
Span
Message1
StartTimestamp
EndTimestamp
Span
Message1’
StartTimestamp
EndTimestamp
Span
Message1’’Message1’ Message1’’ Message1’’’
If fields are omitted
from the tap’s
config, we capture
all that we can.
Another trick is to provide alternative representations of a value to make search/analytics easier downstream.
Questions?
Real-World Pulsar Architectural Patterns
Every pattern shown here has been developed and implemented with my
team at Overstock
Email: dbost@overstock.com
Twitter: DevinBost
LinkedIn: https://www.linkedin.com/in/devinbost/
By Devin Bost, Senior Data Engineer at Overstock
Distributed Caching + Distributed Tracing

Contenu connexe

Tendances

MongoDB .local London 2019: Streaming Data on the Shoulders of Giants
MongoDB .local London 2019: Streaming Data on the Shoulders of GiantsMongoDB .local London 2019: Streaming Data on the Shoulders of Giants
MongoDB .local London 2019: Streaming Data on the Shoulders of GiantsLisa Roth, PMP
 
Can Apache Kafka Replace a Database?
Can Apache Kafka Replace a Database?Can Apache Kafka Replace a Database?
Can Apache Kafka Replace a Database?Kai Wähner
 
From data stream management to distributed dataflows and beyond
From data stream management to distributed dataflows and beyondFrom data stream management to distributed dataflows and beyond
From data stream management to distributed dataflows and beyondVasia Kalavri
 
apidays LIVE India - REST the Events - REST APIs for Event-Driven Architectur...
apidays LIVE India - REST the Events - REST APIs for Event-Driven Architectur...apidays LIVE India - REST the Events - REST APIs for Event-Driven Architectur...
apidays LIVE India - REST the Events - REST APIs for Event-Driven Architectur...apidays
 
Self-service Events & Decentralised Governance with AsyncAPI: A Real World Ex...
Self-service Events & Decentralised Governance with AsyncAPI: A Real World Ex...Self-service Events & Decentralised Governance with AsyncAPI: A Real World Ex...
Self-service Events & Decentralised Governance with AsyncAPI: A Real World Ex...HostedbyConfluent
 
IoT Sensor Analytics with Kafka, ksqlDB and TensorFlow
IoT Sensor Analytics with Kafka, ksqlDB and TensorFlowIoT Sensor Analytics with Kafka, ksqlDB and TensorFlow
IoT Sensor Analytics with Kafka, ksqlDB and TensorFlowKai Wähner
 
Cloud Native London 2019 Faas composition using Kafka and cloud-events
Cloud Native London 2019 Faas composition using Kafka and cloud-eventsCloud Native London 2019 Faas composition using Kafka and cloud-events
Cloud Native London 2019 Faas composition using Kafka and cloud-eventsNeil Avery
 
Building a Web Application with Kafka as your Database
Building a Web Application with Kafka as your DatabaseBuilding a Web Application with Kafka as your Database
Building a Web Application with Kafka as your Databaseconfluent
 
How did we move the mountain? - Migrating 1 trillion+ messages per day across...
How did we move the mountain? - Migrating 1 trillion+ messages per day across...How did we move the mountain? - Migrating 1 trillion+ messages per day across...
How did we move the mountain? - Migrating 1 trillion+ messages per day across...HostedbyConfluent
 
Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?confluent
 
New Features in Confluent Platform 6.0 / Apache Kafka 2.6
New Features in Confluent Platform 6.0 / Apache Kafka 2.6New Features in Confluent Platform 6.0 / Apache Kafka 2.6
New Features in Confluent Platform 6.0 / Apache Kafka 2.6Kai Wähner
 
What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?confluent
 
Kakfa summit london 2019 - the art of the event-streaming app
Kakfa summit london 2019 - the art of the event-streaming appKakfa summit london 2019 - the art of the event-streaming app
Kakfa summit london 2019 - the art of the event-streaming appNeil Avery
 
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)Kai Wähner
 
Best Practices for Streaming IoT Data with MQTT and Apache Kafka
Best Practices for Streaming IoT Data with MQTT and Apache KafkaBest Practices for Streaming IoT Data with MQTT and Apache Kafka
Best Practices for Streaming IoT Data with MQTT and Apache KafkaKai Wähner
 
Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies?
Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies?Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies?
Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies?Kai Wähner
 
GCP for Apache Kafka® Users: Stream Ingestion and Processing
GCP for Apache Kafka® Users: Stream Ingestion and ProcessingGCP for Apache Kafka® Users: Stream Ingestion and Processing
GCP for Apache Kafka® Users: Stream Ingestion and Processingconfluent
 
Building a Modern, Scalable Cyber Intelligence Platform with Apache Kafka | J...
Building a Modern, Scalable Cyber Intelligence Platform with Apache Kafka | J...Building a Modern, Scalable Cyber Intelligence Platform with Apache Kafka | J...
Building a Modern, Scalable Cyber Intelligence Platform with Apache Kafka | J...HostedbyConfluent
 
Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...
Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...
Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...confluent
 
Benefits of Stream Processing and Apache Kafka Use Cases
Benefits of Stream Processing and Apache Kafka Use CasesBenefits of Stream Processing and Apache Kafka Use Cases
Benefits of Stream Processing and Apache Kafka Use Casesconfluent
 

Tendances (20)

MongoDB .local London 2019: Streaming Data on the Shoulders of Giants
MongoDB .local London 2019: Streaming Data on the Shoulders of GiantsMongoDB .local London 2019: Streaming Data on the Shoulders of Giants
MongoDB .local London 2019: Streaming Data on the Shoulders of Giants
 
Can Apache Kafka Replace a Database?
Can Apache Kafka Replace a Database?Can Apache Kafka Replace a Database?
Can Apache Kafka Replace a Database?
 
From data stream management to distributed dataflows and beyond
From data stream management to distributed dataflows and beyondFrom data stream management to distributed dataflows and beyond
From data stream management to distributed dataflows and beyond
 
apidays LIVE India - REST the Events - REST APIs for Event-Driven Architectur...
apidays LIVE India - REST the Events - REST APIs for Event-Driven Architectur...apidays LIVE India - REST the Events - REST APIs for Event-Driven Architectur...
apidays LIVE India - REST the Events - REST APIs for Event-Driven Architectur...
 
Self-service Events & Decentralised Governance with AsyncAPI: A Real World Ex...
Self-service Events & Decentralised Governance with AsyncAPI: A Real World Ex...Self-service Events & Decentralised Governance with AsyncAPI: A Real World Ex...
Self-service Events & Decentralised Governance with AsyncAPI: A Real World Ex...
 
IoT Sensor Analytics with Kafka, ksqlDB and TensorFlow
IoT Sensor Analytics with Kafka, ksqlDB and TensorFlowIoT Sensor Analytics with Kafka, ksqlDB and TensorFlow
IoT Sensor Analytics with Kafka, ksqlDB and TensorFlow
 
Cloud Native London 2019 Faas composition using Kafka and cloud-events
Cloud Native London 2019 Faas composition using Kafka and cloud-eventsCloud Native London 2019 Faas composition using Kafka and cloud-events
Cloud Native London 2019 Faas composition using Kafka and cloud-events
 
Building a Web Application with Kafka as your Database
Building a Web Application with Kafka as your DatabaseBuilding a Web Application with Kafka as your Database
Building a Web Application with Kafka as your Database
 
How did we move the mountain? - Migrating 1 trillion+ messages per day across...
How did we move the mountain? - Migrating 1 trillion+ messages per day across...How did we move the mountain? - Migrating 1 trillion+ messages per day across...
How did we move the mountain? - Migrating 1 trillion+ messages per day across...
 
Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?
 
New Features in Confluent Platform 6.0 / Apache Kafka 2.6
New Features in Confluent Platform 6.0 / Apache Kafka 2.6New Features in Confluent Platform 6.0 / Apache Kafka 2.6
New Features in Confluent Platform 6.0 / Apache Kafka 2.6
 
What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?
 
Kakfa summit london 2019 - the art of the event-streaming app
Kakfa summit london 2019 - the art of the event-streaming appKakfa summit london 2019 - the art of the event-streaming app
Kakfa summit london 2019 - the art of the event-streaming app
 
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
 
Best Practices for Streaming IoT Data with MQTT and Apache Kafka
Best Practices for Streaming IoT Data with MQTT and Apache KafkaBest Practices for Streaming IoT Data with MQTT and Apache Kafka
Best Practices for Streaming IoT Data with MQTT and Apache Kafka
 
Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies?
Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies?Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies?
Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies?
 
GCP for Apache Kafka® Users: Stream Ingestion and Processing
GCP for Apache Kafka® Users: Stream Ingestion and ProcessingGCP for Apache Kafka® Users: Stream Ingestion and Processing
GCP for Apache Kafka® Users: Stream Ingestion and Processing
 
Building a Modern, Scalable Cyber Intelligence Platform with Apache Kafka | J...
Building a Modern, Scalable Cyber Intelligence Platform with Apache Kafka | J...Building a Modern, Scalable Cyber Intelligence Platform with Apache Kafka | J...
Building a Modern, Scalable Cyber Intelligence Platform with Apache Kafka | J...
 
Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...
Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...
Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...
 
Benefits of Stream Processing and Apache Kafka Use Cases
Benefits of Stream Processing and Apache Kafka Use CasesBenefits of Stream Processing and Apache Kafka Use Cases
Benefits of Stream Processing and Apache Kafka Use Cases
 

Similaire à Real-World Pulsar Architectural Patterns

No More Hops Towards A Linearly Scalable Application Infrastructure
No More Hops Towards A Linearly Scalable Application InfrastructureNo More Hops Towards A Linearly Scalable Application Infrastructure
No More Hops Towards A Linearly Scalable Application InfrastructureConSanFrancisco123
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Guido Schmutz
 
Shattering The Monolith(s) (Martin Kess, Namely) Kafka Summit SF 2019
Shattering The Monolith(s) (Martin Kess, Namely) Kafka Summit SF 2019 Shattering The Monolith(s) (Martin Kess, Namely) Kafka Summit SF 2019
Shattering The Monolith(s) (Martin Kess, Namely) Kafka Summit SF 2019 confluent
 
MLOps with a Feature Store: Filling the Gap in ML Infrastructure
MLOps with a Feature Store: Filling the Gap in ML InfrastructureMLOps with a Feature Store: Filling the Gap in ML Infrastructure
MLOps with a Feature Store: Filling the Gap in ML InfrastructureData Science Milan
 
The Art of Message Queues - TEKX
The Art of Message Queues - TEKXThe Art of Message Queues - TEKX
The Art of Message Queues - TEKXMike Willbanks
 
## Introducing a reactive Scala-Akka based system in a Java centric company
## Introducing a reactive Scala-Akka based system in a Java centric company## Introducing a reactive Scala-Akka based system in a Java centric company
## Introducing a reactive Scala-Akka based system in a Java centric companyMilan Aleksić
 
Introduction to Magento Optimization
Introduction to Magento OptimizationIntroduction to Magento Optimization
Introduction to Magento OptimizationFabio Daniele
 
Advanced Container Management and Scheduling
Advanced Container Management and SchedulingAdvanced Container Management and Scheduling
Advanced Container Management and SchedulingAmazon Web Services
 
Windows Server AppFabric Caching - What it is & when you should use it?
Windows Server AppFabric Caching - What it is & when you should use it?Windows Server AppFabric Caching - What it is & when you should use it?
Windows Server AppFabric Caching - What it is & when you should use it?Robert MacLean
 
Ch-ch-ch-ch-changes....Stitch Triggers - Andrew Morgan
Ch-ch-ch-ch-changes....Stitch Triggers - Andrew MorganCh-ch-ch-ch-changes....Stitch Triggers - Andrew Morgan
Ch-ch-ch-ch-changes....Stitch Triggers - Andrew MorganMongoDB
 
Building Continuous Application with Structured Streaming and Real-Time Data ...
Building Continuous Application with Structured Streaming and Real-Time Data ...Building Continuous Application with Structured Streaming and Real-Time Data ...
Building Continuous Application with Structured Streaming and Real-Time Data ...Databricks
 
10 Principals for Effective Event Driven Microservices
10 Principals for Effective Event Driven Microservices10 Principals for Effective Event Driven Microservices
10 Principals for Effective Event Driven MicroservicesBen Stopford
 
Continuous Deployment: The Dirty Details
Continuous Deployment: The Dirty DetailsContinuous Deployment: The Dirty Details
Continuous Deployment: The Dirty DetailsMike Brittain
 
SQL Server 2008 Integration Services
SQL Server 2008 Integration ServicesSQL Server 2008 Integration Services
SQL Server 2008 Integration ServicesEduardo Castro
 
How to build unified Batch & Streaming Pipelines with Apache Beam and Dataflow
How to build unified Batch & Streaming Pipelines with Apache Beam and DataflowHow to build unified Batch & Streaming Pipelines with Apache Beam and Dataflow
How to build unified Batch & Streaming Pipelines with Apache Beam and DataflowDaniel Zivkovic
 
Intro To Spring Python
Intro To Spring PythonIntro To Spring Python
Intro To Spring Pythongturnquist
 
Microservices in the Apache Kafka Ecosystem
Microservices in the Apache Kafka EcosystemMicroservices in the Apache Kafka Ecosystem
Microservices in the Apache Kafka Ecosystemconfluent
 
10 Principals for Effective Event-Driven Microservices with Apache Kafka
10 Principals for Effective Event-Driven Microservices with Apache Kafka10 Principals for Effective Event-Driven Microservices with Apache Kafka
10 Principals for Effective Event-Driven Microservices with Apache KafkaBen Stopford
 
App engine devfest_mexico_10
App engine devfest_mexico_10App engine devfest_mexico_10
App engine devfest_mexico_10Chris Schalk
 

Similaire à Real-World Pulsar Architectural Patterns (20)

No More Hops Towards A Linearly Scalable Application Infrastructure
No More Hops Towards A Linearly Scalable Application InfrastructureNo More Hops Towards A Linearly Scalable Application Infrastructure
No More Hops Towards A Linearly Scalable Application Infrastructure
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
 
Shattering The Monolith(s) (Martin Kess, Namely) Kafka Summit SF 2019
Shattering The Monolith(s) (Martin Kess, Namely) Kafka Summit SF 2019 Shattering The Monolith(s) (Martin Kess, Namely) Kafka Summit SF 2019
Shattering The Monolith(s) (Martin Kess, Namely) Kafka Summit SF 2019
 
MLOps with a Feature Store: Filling the Gap in ML Infrastructure
MLOps with a Feature Store: Filling the Gap in ML InfrastructureMLOps with a Feature Store: Filling the Gap in ML Infrastructure
MLOps with a Feature Store: Filling the Gap in ML Infrastructure
 
The Art of Message Queues - TEKX
The Art of Message Queues - TEKXThe Art of Message Queues - TEKX
The Art of Message Queues - TEKX
 
## Introducing a reactive Scala-Akka based system in a Java centric company
## Introducing a reactive Scala-Akka based system in a Java centric company## Introducing a reactive Scala-Akka based system in a Java centric company
## Introducing a reactive Scala-Akka based system in a Java centric company
 
Introduction to Magento Optimization
Introduction to Magento OptimizationIntroduction to Magento Optimization
Introduction to Magento Optimization
 
Advanced Container Management and Scheduling
Advanced Container Management and SchedulingAdvanced Container Management and Scheduling
Advanced Container Management and Scheduling
 
Windows Server AppFabric Caching - What it is & when you should use it?
Windows Server AppFabric Caching - What it is & when you should use it?Windows Server AppFabric Caching - What it is & when you should use it?
Windows Server AppFabric Caching - What it is & when you should use it?
 
Ch-ch-ch-ch-changes....Stitch Triggers - Andrew Morgan
Ch-ch-ch-ch-changes....Stitch Triggers - Andrew MorganCh-ch-ch-ch-changes....Stitch Triggers - Andrew Morgan
Ch-ch-ch-ch-changes....Stitch Triggers - Andrew Morgan
 
Building Continuous Application with Structured Streaming and Real-Time Data ...
Building Continuous Application with Structured Streaming and Real-Time Data ...Building Continuous Application with Structured Streaming and Real-Time Data ...
Building Continuous Application with Structured Streaming and Real-Time Data ...
 
10 Principals for Effective Event Driven Microservices
10 Principals for Effective Event Driven Microservices10 Principals for Effective Event Driven Microservices
10 Principals for Effective Event Driven Microservices
 
Continuous Deployment: The Dirty Details
Continuous Deployment: The Dirty DetailsContinuous Deployment: The Dirty Details
Continuous Deployment: The Dirty Details
 
SQL Server 2008 Integration Services
SQL Server 2008 Integration ServicesSQL Server 2008 Integration Services
SQL Server 2008 Integration Services
 
How to build unified Batch & Streaming Pipelines with Apache Beam and Dataflow
How to build unified Batch & Streaming Pipelines with Apache Beam and DataflowHow to build unified Batch & Streaming Pipelines with Apache Beam and Dataflow
How to build unified Batch & Streaming Pipelines with Apache Beam and Dataflow
 
Intro To Spring Python
Intro To Spring PythonIntro To Spring Python
Intro To Spring Python
 
Clontab webpage
Clontab webpageClontab webpage
Clontab webpage
 
Microservices in the Apache Kafka Ecosystem
Microservices in the Apache Kafka EcosystemMicroservices in the Apache Kafka Ecosystem
Microservices in the Apache Kafka Ecosystem
 
10 Principals for Effective Event-Driven Microservices with Apache Kafka
10 Principals for Effective Event-Driven Microservices with Apache Kafka10 Principals for Effective Event-Driven Microservices with Apache Kafka
10 Principals for Effective Event-Driven Microservices with Apache Kafka
 
App engine devfest_mexico_10
App engine devfest_mexico_10App engine devfest_mexico_10
App engine devfest_mexico_10
 

Plus de Devin Bost

Vector Search / Generative AI introduction at Pulsar Meetup
Vector Search / Generative AI introduction at Pulsar MeetupVector Search / Generative AI introduction at Pulsar Meetup
Vector Search / Generative AI introduction at Pulsar MeetupDevin Bost
 
Streaming Patterns and Best Practices with Apache Pulsar for Enabling Machine...
Streaming Patterns and Best Practices with Apache Pulsar for Enabling Machine...Streaming Patterns and Best Practices with Apache Pulsar for Enabling Machine...
Streaming Patterns and Best Practices with Apache Pulsar for Enabling Machine...Devin Bost
 
How to introduce Apache Pulsar into your organization successfully - Devin Bost
How to introduce Apache Pulsar into your organization successfully - Devin BostHow to introduce Apache Pulsar into your organization successfully - Devin Bost
How to introduce Apache Pulsar into your organization successfully - Devin BostDevin Bost
 
Pulsar Architectural Patterns for CI/CD Automation and Self-Service
Pulsar Architectural Patterns for CI/CD Automation and Self-ServicePulsar Architectural Patterns for CI/CD Automation and Self-Service
Pulsar Architectural Patterns for CI/CD Automation and Self-ServiceDevin Bost
 
Apache Pulsar - Real-time data flows drive core business processes
Apache Pulsar - Real-time data flows drive core business processesApache Pulsar - Real-time data flows drive core business processes
Apache Pulsar - Real-time data flows drive core business processesDevin Bost
 
Real World NLP, ML, and Big Data
Real World NLP, ML, and Big DataReal World NLP, ML, and Big Data
Real World NLP, ML, and Big DataDevin Bost
 

Plus de Devin Bost (6)

Vector Search / Generative AI introduction at Pulsar Meetup
Vector Search / Generative AI introduction at Pulsar MeetupVector Search / Generative AI introduction at Pulsar Meetup
Vector Search / Generative AI introduction at Pulsar Meetup
 
Streaming Patterns and Best Practices with Apache Pulsar for Enabling Machine...
Streaming Patterns and Best Practices with Apache Pulsar for Enabling Machine...Streaming Patterns and Best Practices with Apache Pulsar for Enabling Machine...
Streaming Patterns and Best Practices with Apache Pulsar for Enabling Machine...
 
How to introduce Apache Pulsar into your organization successfully - Devin Bost
How to introduce Apache Pulsar into your organization successfully - Devin BostHow to introduce Apache Pulsar into your organization successfully - Devin Bost
How to introduce Apache Pulsar into your organization successfully - Devin Bost
 
Pulsar Architectural Patterns for CI/CD Automation and Self-Service
Pulsar Architectural Patterns for CI/CD Automation and Self-ServicePulsar Architectural Patterns for CI/CD Automation and Self-Service
Pulsar Architectural Patterns for CI/CD Automation and Self-Service
 
Apache Pulsar - Real-time data flows drive core business processes
Apache Pulsar - Real-time data flows drive core business processesApache Pulsar - Real-time data flows drive core business processes
Apache Pulsar - Real-time data flows drive core business processes
 
Real World NLP, ML, and Big Data
Real World NLP, ML, and Big DataReal World NLP, ML, and Big Data
Real World NLP, ML, and Big Data
 

Dernier

Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)Intelisync
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyFrank van der Linden
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 

Dernier (20)

Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The Ugly
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 

Real-World Pulsar Architectural Patterns

  • 1. Real-World Pulsar Architectural Patterns Every pattern shown here has been developed and implemented with my team at Overstock Email: dbost@overstock.com Twitter: DevinBost LinkedIn: https://www.linkedin.com/in/devinbost/ By Devin Bost, Senior Data Engineer at Overstock Distributed Caching + Distributed Tracing +
  • 2. Missing Stephen Bourke in Ireland and Sam Lowen, our new product manager.
  • 3. Disclaimer: I’m assuming you already know that Apache Pulsar is the future of real-time messaging
  • 4. = Pulsar topic = Kafka topic = Apache Bookkeeper Tech Legend
  • 9. Producer Consumer Alerts to end users (e.g. Email, SMS, Twilio call, etc.) Passthrough function /ingest /feeds
  • 10. Diagram courtesy of Thor Sigurjonsson
  • 11. Producer Producer Producer Consumer Consumer Consumer Alerts to end users (e.g. Email, SMS, Twilio call, etc.) Passthrough function Higher-availability: /ingest /feeds
  • 12. Producer Producer Producer Consumer Consumer Consumer Passthrough function Is this a safe approach to caching? /ingest /feeds
  • 13. Producer Producer Producer Consumer Consumer Consumer Passthrough function What happens if Redis goes down? X /ingest /feeds
  • 14. Producer Producer Producer Consumer Consumer Consumer Passthrough function It’s much safer to use a distributed cache technology like Ignite /ingest /feeds
  • 15. Producer Producer Producer Consumer Consumer Consumer Passthrough function It’s much safer to use a distributed cache technology like Ignite Smart Persistence /ingest /feeds
  • 16. Producer Producer Producer Consumer Consumer Consumer Passthrough function It’s much safer to use a distributed cache technology like Ignite Smart Persistence Faster than Redis /ingest /feeds
  • 17. Producer Producer Producer Consumer Consumer Consumer Passthrough function It’s much safer to use a distributed cache technology like Ignite Smart Persistence Faster than Redis Supports tables with backing cache /ingest /feeds
  • 18. Producer Producer Producer Consumer Consumer Consumer Passthrough function It’s much safer to use a distributed cache technology like Ignite Smart Persistence Faster than Redis Supports tables with backing cache Supports transactions /ingest /feeds
  • 19. What if you have a business- critical service that can’t lose messages?
  • 20. “But, doesn’t Pulsar already guarantee that messages won’t be lost?”
  • 21. There’s a difference between Pulsar losing messages and your application losing messages!
  • 22. So, we introduce a backfill path.
  • 23. Notice I avoided putting a SQL database on the list Replication Persistent storage Producer Producer Producer Consumer Consumer Consumer Alerts to end users (e.g. Email, SMS, Twilio call, etc.) Passthrough function Batch job Backfill Topic Consumer Consumer Consumer Alerts to end users (e.g. Email, SMS, Twilio call, etc.) OR Etc Message delivered yet? Message delivered yet? Kappa + Lambda architectures /ingest /feeds
  • 25. Replication Persistent storage Pulsar Function Alerts to end users (e.g. Email, SMS, Twilio call, etc.) Alerts to end users (e.g. Email, SMS, Twilio call, etc.) Batch job OR Etc Producer Producer Producer Passthrough function Backfill Topic Pulsar Function Message delivered yet? Message delivered yet? You could add another passthrough function and topic if you want more isolation. /ingest /feeds
  • 26. How about just for ingesting data into a cache with a backfill?
  • 27. Option 1: Web Service Passthrough Function Persistent Append Only Storage Cache Sink Batch Engine (e.g. Spark, NiFi, etc.) Read All Data Loads into existing topic OR Starts Job Etc Replication /ingest /feeds
  • 28. Option 2: Achieves separation of concerns and prevents QoS problems with live traffic when running a backfill Web Service Passthrough Function Cache Sink Batch Engine (e.g. Spark, NiFi, etc.) Loads into backfill topic OR Starts Job Cache Sink Persistent Append Only Storage Read All Data Etc Replication /ingest /feeds
  • 29. Topic with Retention Function Cache Sink Function (stopped until needing to backfill) Exclusive Mode (Subscription stores in Bookkeeper automatically.) Tiered Storage in S3 or Google Cloud Backfill Cache Sink Starts Function Passthrough FunctionOption 3: Note: You need to ensure the Bookkeeper cluster is fast enough to keep up with the brokers or your brokers’ memory will fill up Also, this approach will only give you a single backfill run unless you have additional replication. /feeds
  • 30. What about adding caching to a legacy application?
  • 31. Legacy SQL DB Website Enrich from cached data Extract Relevant Data Filter to desired clicks (Raw Clicks) Store in cache Web Application Emitting specific events (Omitting passthrough details for simplicity.)
  • 32. Legacy SQL DB Website Enrich from cached data Extract Relevant Data Filter to desired clicks Store in cache Web Application Emitting specific events You can also emit directly to Pulsar as a producer. It’s simpler if you have the ability to touch the website code.
  • 33. Legacy SQL DB Website Enrich from cached data Extract Relevant Data Filter to desired clicks Store in cache Web Application Emitting specific events You can also emit directly to Pulsar as a producer. It’s simpler if you have the ability to touch the website code. However, the raw clicks flow still gets messy.
  • 34. Legacy SQL DB Website Enrich from cached data Store in cache Web Application Emitting specific events Event A Event B Event C Cleaner & better separation of concerns to have purposeful topics... Easier to debug & maintain.
  • 35. What if you’re using a graph engine for more complex query logic but need that data in real-time?
  • 36. If you don’t make it synchronous, you will get race conditions when updating and querying the graph! Web Application Emitting specific deltas (e.g. state change, increment, etc.) (2) Return on completion (1) Write Change, Wait for Success Get full record Complex graph query Synchronous Update Function
  • 37. What if you need a more robust verification that deltas are in order and aren’t duplicates? (e.g. financially impacting increment/decrement values or state variables)
  • 38. Usually, it’s best to separate concerns into separate functions, like this. Web Application Emitting specific deltas (e.g. state change, increment, etc.) (2) Return on completion (1) Write Change, Wait for Success Get full record Complex graph query Synchronous Update Function Gate Keeper Filter Check if message has been seen already Duplicate or late? Drop messageYes No But, is that the right approach here?
  • 39. What happens when you turn up parallelism on the functions?
  • 40. Web Application Emitting specific deltas (e.g. state change, increment, etc.) (2) Return on completion (1) Write Change, Wait for Success Get full record Complex graph query Synchronous Update Function Synchronous Update Function Synchronous Update Function Synchronous Update Function Synchronous Update Function Synchronous Update Function Gate Keeper Filter Gate Keeper Filter Gate Keeper Filter Gate Keeper Filter Gate Keeper Filter Gate Keeper Filter
  • 41. In this case, the right approach is to consolidate your logic to leverage the transactional guarantees of your database. Web Application (2) Return on completion (1) Check if duplicate or outdated. If not, write Change, Wait for Success Get full record Complex graph query Synchronous Update Function
  • 42. Leverage transactional guarantees of your database! (Your function will need to retry if a transaction fails.) Web Application Get full record Complex graph query Synchronous Update Function Synchronous Update Function Synchronous Update Function Synchronous Update Function
  • 43. Leverage transactional guarantees of your database! (Your function will need to retry if a transaction fails.) Web Application Get full record Complex graph query Synchronous Update Function Synchronous Update Function Synchronous Update Function Synchronous Update Function Always be mindful of how the behavior might change when function parallelism is turned up!
  • 44. Leverage transactional guarantees of your database! (Your function will need to retry if a transaction fails.) Web Application Get full record Complex graph query Synchronous Update Function Synchronous Update Function Synchronous Update Function Synchronous Update Function Always be mindful of how the behavior might change when function parallelism is turned up! If making state changes, be sure that you get timestamps on your upstream data contract so you can verify if the messages are in order!
  • 45. Now, what happens when you need to debug a large or complex function flow?
  • 46. FunctionFunction Function Web Application Function Web Service Web Application Function Function Web Service Web Application FunctionFunction Function Function Function Function
  • 47. FunctionFunction Function Web Application Function Web Service Web Application Function Function Web Service Web Application FunctionFunction Function Function Function Function What happens if some messages seem to not be reaching their destination?
  • 48. FunctionFunction Function Web Application Function Web Service Web Application Function Function Web Service Web Application FunctionFunction Function Function Function Function What happens if some messages seem to not be reaching their destination? What happens if a message isn’t getting transformed correctly at some point or null values are appearing?
  • 49. FunctionFunction Function Web Application Function Web Service Web Application Function Function Web Service Web Application FunctionFunction Function Function Function Function What happens if some messages seem to not be reaching their destination? What happens if a message isn’t getting transformed correctly at some point or null values are appearing? What if we can’t modify the function code (since it’s a multi-tenant application)?
  • 50. FunctionFunction Function Web Application Function Web Service Web Application Function Function Web Service Web Application FunctionFunction Function Function Function Function What happens if some messages seem to not be reaching their destination? What happens if a message isn’t getting transformed correctly at some point or null values are appearing? What if we can’t modify the data contracts either for the same reason? What if we can’t modify the function code (since it’s a multi-tenant application)?
  • 51. Function Function Function Function Span 1 Span 2 Span 3 Span 4 Span 1 Trace
  • 52. Function Function Function Function Span 1 Span 2 Span 3 Span 4 Span 1 Trace
  • 53. Function Function Function Function Span 1 Span 2 Span 3 Span 4 Span 1 Trace
  • 54. Jaeger is based on the OpenTracing standard Check out the book, “Mastering Distributed Tracing” (2019) by Yuri Shkuro Trace Span Span Span Span Span Span Span Span Span Span SpanSpan Tags (Key/Value) Logs (Key/Value + Timestamp) Tags (Key/Value) Logs (Key/Value + Timestamp) Tags (Key/Value) Logs (Key/Value + Timestamp) . . .Span Span Tags (Key/Value) Logs (Key/Value + Timestamp)
  • 55.
  • 56.
  • 57. 4 5 7 3 Count By Key e.g. Over a session-based window
  • 58. + + + OR
  • 59. Metadata parameters for the TapFunction’s envelope are specified in its config Message1 Message2 Message3 Message1’ Message2’ Message3’ Message1’’ Message2’’ Message3’’ . . .Function1 Function1 JoinerFunction Message2 Message2’ Message2’’ CorrelationId=productId-784 Message1 Message1’ Message1’’ CorrelationId=productId-142 Message3 Message3’ Message3’’CorrelationId=productId-923 Jaeger Sink Taps wrap message with header containing: CorrelationId, Tenant, Namespace, Name, timestamp, etc. TapFunction TapFunction TapFunction SamplerFunction CorrelationId=productId-784 Message1 Message1’ Message1’’ StartTimestamp EndTimestamp Span CorrelationId=productId-784 StartTimestamp EndTimestamp Span Message1 StartTimestamp EndTimestamp Span Message1’ StartTimestamp EndTimestamp Span Message1’’Message1’ Message1’’ Message1’’’
  • 60. CorrelationId is derived and put into the envelope produced by the TapFunction. { "correlationKey": "productId", "correlationValue": "20603199", "correlationId": "productId-20603199”, . . . } The TapFunction defines the correlationKey in its Pulsar Config. You can tap ANY topic! Message1 Message2 Message3 Message1’ Message2’ Message3’ Message1’’ Message2’’ Message3’’ . . .Function1 Function1 Taps wrap message with header containing: CorrelationId, Tenant, Namespace, Name, timestamp, etc. TapFunction TapFunction TapFunction
  • 61. Uses Flink’s stateful join capability. It all happens in a keyed stream! JoinerFunction Message2 Message2’ Message2’’ CorrelationId=productId-784 Message1 Message1’ Message1’’ CorrelationId=productId-142 Message3 Message3’ Message3’’CorrelationId=productId-923
  • 62. Allows us to specify rate to limit how many messages are sampled. For dev, this is set to 100% to allow all. This is just a simple Pulsar filter function. SamplerFunction CorrelationId=productId-784 Message1 Message1’ Message1’’
  • 63. These Spans emit to Jaeger and can be stored in a Cassandra or Elasticsearch backend for production. OR Jaeger Sink StartTimestamp EndTimestamp Span CorrelationId=productId-784 StartTimestamp EndTimestamp Span Message1 StartTimestamp EndTimestamp Span Message1’ StartTimestamp EndTimestamp Span Message1’’Message1’ Message1’’ Message1’’’
  • 64.
  • 65. If fields are omitted from the tap’s config, we capture all that we can.
  • 66. Another trick is to provide alternative representations of a value to make search/analytics easier downstream.
  • 68. Real-World Pulsar Architectural Patterns Every pattern shown here has been developed and implemented with my team at Overstock Email: dbost@overstock.com Twitter: DevinBost LinkedIn: https://www.linkedin.com/in/devinbost/ By Devin Bost, Senior Data Engineer at Overstock Distributed Caching + Distributed Tracing