SlideShare une entreprise Scribd logo
1  sur  28
Télécharger pour lire hors ligne
@allenxwang
Multi-cluster, Multi-tenant and
Hierarchical Kafka Messaging Service
Allen Wang
Growing Pains for A Kafka Cluster
● A few brokers, handful topics, tens of partitions
○ Wonderful!
● Tens of brokers, tens of topics, hundreds of
partitions
○ Life is good!
● A hundred brokers, a hundred topics, thousands of
partitions
○ … OK
● Hundreds of brokers, hundreds of topics, one
hundred thousand partitions
○ ???
Why Huge Kafka Cluster Does Not Work
● Significant time increase on operations
○ Rolling binary update
■ Three minutes per broker, 500 brokers = 1 whole day
○ Rolling AMI (image) update with data copying
■ One hour per broker, 500 brokers = 20 days
● Increased latency due to number of partitions
○ https://www.confluent.io/blog/how-to-choose-the-number
-of-topicspartitions-in-a-kafka-cluster/
● Vulnerability to ZK/Controller failures
Scaling and Data Balancing Challenge
● The problem with partition reassignment
○ Time consuming
○ Replication traffic taking bandwidth
○ Complexity of bin packing for data balancing
The Consumer Fan-out Problem
BytesOut = (numberOfConsumers + replicationFactor - 1) ✕ BytesIn
● A single cluster may easily fit for bytes in, but not
necessarily for bytes out
Solve Consumer Fan-out with Hierarchies
Inevitability of Multi-cluster
The Idea
● Create many small and mostly “immutable”
clusters
● Organize them in a topology with routing service
connecting the clusters
Multi-Cluster Kafka Service At Netflix
Router
(w/ simple ETL)
Fronting
Kafka
Event
Producer
Consumer
Kafka
Management
HTTP
PROXY
Consumers
Multi-cluster Producers
● Support producing to multiple clusters at the same
time
● High level producer API implemented by multiple
embedded Kafka producers
public interface KsProducer<V> {
// ...
<T extends V> CompletableFuture<SendResult> send(T obj)
}
● Dynamic topic to cluster mapping
○ Enabled by NetflixOSS/Archaius
"t1, t2" : {
"where" : [{
"sink" : "fronting-kafka-1"
}]
},
"t3" : {
"where" : [{
"sink" : "fronting-kafka-2"
}]
},
"__default__" : {
"where" : [ {
"sink" : "fronting-kafka-2"
}]
}
@Stream("foo") // send to topic “foo”
public class Foo {
// ...
}
@Stream("bar") // send to topic “bar”
public class Bar {
// ...
}
KsProducer<Object> producer = // …
producer.send(new Foo()); // Send to Kafka cluster which has “foo” topic
producer.send(new Bar()); // Send to Kafka cluster which has “bar” topic
Fronting Kafka
● For data collection and buffering
● Optimized for producers
○ Only consumers are routers
Scaling of Fronting Kafka
● Creating / destroying Kafka clusters
○ E.g., create new topic on new clusters and update topic to
cluster mapping
● No partition reassignment
Data Balancing
● Assign the same number of partitions of any topic
to every brokers
○ E.g., for clusters of 12 brokers, create topics with partitions
of 12, 24, 36
○ Guaranteed even distribution of data (aside from
occasional leader imbalance)
● Balance data among clusters by moving topics
○ Must dynamically update topic to cluster mapping
Topic Move
RouterFronting
Kafka
Event
Producer
Consumer
Kafka
Create topic “foo”
Consumer
“foo”
“foo”
Consumer Kafka
● Scaling
○ Add brokers and partitions for small cluster for non-keyed
topics
○ Create same topics on a new cluster and move consumers
Future Plan
● Cross-cluster topic
○ load sharing beyond single cluster
○ Auto-scale
○ Consumer/producer support needed
Multi-Cluster Consumer (Ongoing work)
● Same Kafka consumer interface
● Consume from multiple clusters with dynamic
topic to cluster mapping
○ Keep subscription state
○ Receive mapping updates
○ Create and delegate to underlying Kafka consumer for each
associated cluster on the fly
Multi-Cluster Consumer Topic to Cluster Mapping and
Code Example
{
"foo": [
{"vip": "cluster1"},
{"vip": "cluster2"}
],
“bar”: [
{“vip”: “cluster2”}
]
}
// Create a multi-cluster consumer
Consumer<String, String> multiClusterConsumer = ...
// subscribe as usual and keep subscription state
consumer.subscribe(new ArrayList<String>(“foo”));
while (...) {
// fetch from both clusters for topic “foo” and
// return the aggregated records
ConsumerRecords<String, String> records =
multiClusterConsumer.poll(2000);
process(records);
}
Topic move for Multi-cluster Consumers
Multi-cluster Consumer
Producer
“foo”: “cluster1” “foo”: [“cluster1”]
“foo”: “cluster2”
“foo”: [“cluster1”, “cluster2”]
“foo”: [“cluster2”]
cluster1
cluster2
Our Vision
Producers
“foo”
“foo”
“bar”
“bar”
“bar”
Multi-cluster
Consumer
Advanced Consumer
Router
Fronting Kafka w/
Cross-cluster Topics
Consumer Kafka
Multi-cluster
Consumer
What About Keyed Messages
● Few topics requiring keyed messages in Netflix
● A word of caution for keyed messages
○ Inflexible/skewed load balancing
○ Difficult to scale
● Handling of keyed messages
○ Currently only produced by routers to consumer Kafka
○ Hard to guarantee message ordering in multi-cluster setting
○ Key-consumer affinity is guaranteed
Think Differently on Scaling Kafka
The “broker” way The “cluster” way
Scale up Add brokers Add clusters
Data balance Move partitions to
different brokers
Move/expand topics to
different clusters
Producer Produce to different
brokers at the same time
Produce to different clusters at
the same time
Consumer Consume from different
brokers at the same time
Consume from different
clusters at the same time
Thank You
https://medium.com/netflix-techblog
https://jobs.netflix.com/

Contenu connexe

Tendances

Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka IntroductionAmita Mirajkar
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using KafkaKnoldus Inc.
 
APACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka StreamsAPACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka StreamsKetan Gote
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explainedconfluent
 
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Jean-Paul Azar
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache KafkaJeff Holoman
 
Disaster Recovery with MirrorMaker 2.0 (Ryanne Dolan, Cloudera) Kafka Summit ...
Disaster Recovery with MirrorMaker 2.0 (Ryanne Dolan, Cloudera) Kafka Summit ...Disaster Recovery with MirrorMaker 2.0 (Ryanne Dolan, Cloudera) Kafka Summit ...
Disaster Recovery with MirrorMaker 2.0 (Ryanne Dolan, Cloudera) Kafka Summit ...confluent
 
Kafka High Availability in multi data center setup with floating Observers wi...
Kafka High Availability in multi data center setup with floating Observers wi...Kafka High Availability in multi data center setup with floating Observers wi...
Kafka High Availability in multi data center setup with floating Observers wi...HostedbyConfluent
 
Deploying Confluent Platform for Production
Deploying Confluent Platform for ProductionDeploying Confluent Platform for Production
Deploying Confluent Platform for Productionconfluent
 
KSQL: Streaming SQL for Kafka
KSQL: Streaming SQL for KafkaKSQL: Streaming SQL for Kafka
KSQL: Streaming SQL for Kafkaconfluent
 
Hello, kafka! (an introduction to apache kafka)
Hello, kafka! (an introduction to apache kafka)Hello, kafka! (an introduction to apache kafka)
Hello, kafka! (an introduction to apache kafka)Timothy Spann
 

Tendances (20)

Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache Kafka - Overview
Apache Kafka - OverviewApache Kafka - Overview
Apache Kafka - Overview
 
Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka Introduction
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
APACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka StreamsAPACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka Streams
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explained
 
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Kafka: Internals
Kafka: InternalsKafka: Internals
Kafka: Internals
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Disaster Recovery with MirrorMaker 2.0 (Ryanne Dolan, Cloudera) Kafka Summit ...
Disaster Recovery with MirrorMaker 2.0 (Ryanne Dolan, Cloudera) Kafka Summit ...Disaster Recovery with MirrorMaker 2.0 (Ryanne Dolan, Cloudera) Kafka Summit ...
Disaster Recovery with MirrorMaker 2.0 (Ryanne Dolan, Cloudera) Kafka Summit ...
 
Kafka basics
Kafka basicsKafka basics
Kafka basics
 
Kafka High Availability in multi data center setup with floating Observers wi...
Kafka High Availability in multi data center setup with floating Observers wi...Kafka High Availability in multi data center setup with floating Observers wi...
Kafka High Availability in multi data center setup with floating Observers wi...
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
 
Deploying Confluent Platform for Production
Deploying Confluent Platform for ProductionDeploying Confluent Platform for Production
Deploying Confluent Platform for Production
 
KSQL: Streaming SQL for Kafka
KSQL: Streaming SQL for KafkaKSQL: Streaming SQL for Kafka
KSQL: Streaming SQL for Kafka
 
Hello, kafka! (an introduction to apache kafka)
Hello, kafka! (an introduction to apache kafka)Hello, kafka! (an introduction to apache kafka)
Hello, kafka! (an introduction to apache kafka)
 

Similaire à Multi cluster, multitenant and hierarchical kafka messaging service slideshare

Kafka Summit SF 2017 - MultiCluster, MultiTenant and Hierarchical Kafka Messa...
Kafka Summit SF 2017 - MultiCluster, MultiTenant and Hierarchical Kafka Messa...Kafka Summit SF 2017 - MultiCluster, MultiTenant and Hierarchical Kafka Messa...
Kafka Summit SF 2017 - MultiCluster, MultiTenant and Hierarchical Kafka Messa...confluent
 
I can't believe it's not a queue: Kafka and Spring
I can't believe it's not a queue: Kafka and SpringI can't believe it's not a queue: Kafka and Spring
I can't believe it's not a queue: Kafka and SpringJoe Kutner
 
Enabling Data Scientists to easily create and own Kafka Consumers
Enabling Data Scientists to easily create and own Kafka ConsumersEnabling Data Scientists to easily create and own Kafka Consumers
Enabling Data Scientists to easily create and own Kafka ConsumersStefan Krawczyk
 
Enabling Data Scientists to easily create and own Kafka Consumers | Stefan Kr...
Enabling Data Scientists to easily create and own Kafka Consumers | Stefan Kr...Enabling Data Scientists to easily create and own Kafka Consumers | Stefan Kr...
Enabling Data Scientists to easily create and own Kafka Consumers | Stefan Kr...HostedbyConfluent
 
Updating materialized views and caches using kafka
Updating materialized views and caches using kafkaUpdating materialized views and caches using kafka
Updating materialized views and caches using kafkaZach Cox
 
Exactly-once Stream Processing with Kafka Streams
Exactly-once Stream Processing with Kafka StreamsExactly-once Stream Processing with Kafka Streams
Exactly-once Stream Processing with Kafka StreamsGuozhang Wang
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Guido Schmutz
 
Uber Real Time Data Analytics
Uber Real Time Data AnalyticsUber Real Time Data Analytics
Uber Real Time Data AnalyticsAnkur Bansal
 
Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streams
Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka StreamsKafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streams
Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streamsconfluent
 
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & PartitioningApache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & PartitioningGuido Schmutz
 
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Monal Daxini
 
TDEA 2018 Kafka EOS (Exactly-once)
TDEA 2018 Kafka EOS (Exactly-once)TDEA 2018 Kafka EOS (Exactly-once)
TDEA 2018 Kafka EOS (Exactly-once)Erhwen Kuo
 
Introduction to apache kafka
Introduction to apache kafkaIntroduction to apache kafka
Introduction to apache kafkaSamuel Kerrien
 
Exactly-once Data Processing with Kafka Streams - July 27, 2017
Exactly-once Data Processing with Kafka Streams - July 27, 2017Exactly-once Data Processing with Kafka Streams - July 27, 2017
Exactly-once Data Processing with Kafka Streams - July 27, 2017confluent
 
Integration for real-time Kafka SQL
Integration for real-time Kafka SQLIntegration for real-time Kafka SQL
Integration for real-time Kafka SQLAmit Nijhawan
 
From a Kafkaesque Story to The Promised Land at LivePerson
From a Kafkaesque Story to The Promised Land at LivePersonFrom a Kafkaesque Story to The Promised Land at LivePerson
From a Kafkaesque Story to The Promised Land at LivePersonLivePerson
 
Follow the (Kafka) Streams
Follow the (Kafka) StreamsFollow the (Kafka) Streams
Follow the (Kafka) Streamsconfluent
 

Similaire à Multi cluster, multitenant and hierarchical kafka messaging service slideshare (20)

Kafka Summit SF 2017 - MultiCluster, MultiTenant and Hierarchical Kafka Messa...
Kafka Summit SF 2017 - MultiCluster, MultiTenant and Hierarchical Kafka Messa...Kafka Summit SF 2017 - MultiCluster, MultiTenant and Hierarchical Kafka Messa...
Kafka Summit SF 2017 - MultiCluster, MultiTenant and Hierarchical Kafka Messa...
 
I can't believe it's not a queue: Kafka and Spring
I can't believe it's not a queue: Kafka and SpringI can't believe it's not a queue: Kafka and Spring
I can't believe it's not a queue: Kafka and Spring
 
Enabling Data Scientists to easily create and own Kafka Consumers
Enabling Data Scientists to easily create and own Kafka ConsumersEnabling Data Scientists to easily create and own Kafka Consumers
Enabling Data Scientists to easily create and own Kafka Consumers
 
Enabling Data Scientists to easily create and own Kafka Consumers | Stefan Kr...
Enabling Data Scientists to easily create and own Kafka Consumers | Stefan Kr...Enabling Data Scientists to easily create and own Kafka Consumers | Stefan Kr...
Enabling Data Scientists to easily create and own Kafka Consumers | Stefan Kr...
 
Updating materialized views and caches using kafka
Updating materialized views and caches using kafkaUpdating materialized views and caches using kafka
Updating materialized views and caches using kafka
 
Exactly-once Stream Processing with Kafka Streams
Exactly-once Stream Processing with Kafka StreamsExactly-once Stream Processing with Kafka Streams
Exactly-once Stream Processing with Kafka Streams
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
 
Uber Real Time Data Analytics
Uber Real Time Data AnalyticsUber Real Time Data Analytics
Uber Real Time Data Analytics
 
Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streams
Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka StreamsKafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streams
Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streams
 
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & PartitioningApache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
 
Kafka Workshop
Kafka WorkshopKafka Workshop
Kafka Workshop
 
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
 
Data Pipeline at Tapad
Data Pipeline at TapadData Pipeline at Tapad
Data Pipeline at Tapad
 
TDEA 2018 Kafka EOS (Exactly-once)
TDEA 2018 Kafka EOS (Exactly-once)TDEA 2018 Kafka EOS (Exactly-once)
TDEA 2018 Kafka EOS (Exactly-once)
 
Introduction to apache kafka
Introduction to apache kafkaIntroduction to apache kafka
Introduction to apache kafka
 
Exactly-once Data Processing with Kafka Streams - July 27, 2017
Exactly-once Data Processing with Kafka Streams - July 27, 2017Exactly-once Data Processing with Kafka Streams - July 27, 2017
Exactly-once Data Processing with Kafka Streams - July 27, 2017
 
Integration for real-time Kafka SQL
Integration for real-time Kafka SQLIntegration for real-time Kafka SQL
Integration for real-time Kafka SQL
 
From a Kafkaesque Story to The Promised Land at LivePerson
From a Kafkaesque Story to The Promised Land at LivePersonFrom a Kafkaesque Story to The Promised Land at LivePerson
From a Kafkaesque Story to The Promised Land at LivePerson
 
Follow the (Kafka) Streams
Follow the (Kafka) StreamsFollow the (Kafka) Streams
Follow the (Kafka) Streams
 

Dernier

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 

Dernier (20)

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 

Multi cluster, multitenant and hierarchical kafka messaging service slideshare

  • 1. @allenxwang Multi-cluster, Multi-tenant and Hierarchical Kafka Messaging Service Allen Wang
  • 2. Growing Pains for A Kafka Cluster ● A few brokers, handful topics, tens of partitions ○ Wonderful! ● Tens of brokers, tens of topics, hundreds of partitions ○ Life is good!
  • 3. ● A hundred brokers, a hundred topics, thousands of partitions ○ … OK ● Hundreds of brokers, hundreds of topics, one hundred thousand partitions ○ ???
  • 4. Why Huge Kafka Cluster Does Not Work ● Significant time increase on operations ○ Rolling binary update ■ Three minutes per broker, 500 brokers = 1 whole day ○ Rolling AMI (image) update with data copying ■ One hour per broker, 500 brokers = 20 days
  • 5. ● Increased latency due to number of partitions ○ https://www.confluent.io/blog/how-to-choose-the-number -of-topicspartitions-in-a-kafka-cluster/ ● Vulnerability to ZK/Controller failures
  • 6. Scaling and Data Balancing Challenge ● The problem with partition reassignment ○ Time consuming ○ Replication traffic taking bandwidth ○ Complexity of bin packing for data balancing
  • 8. BytesOut = (numberOfConsumers + replicationFactor - 1) ✕ BytesIn ● A single cluster may easily fit for bytes in, but not necessarily for bytes out
  • 9. Solve Consumer Fan-out with Hierarchies
  • 11. The Idea ● Create many small and mostly “immutable” clusters ● Organize them in a topology with routing service connecting the clusters
  • 12. Multi-Cluster Kafka Service At Netflix Router (w/ simple ETL) Fronting Kafka Event Producer Consumer Kafka Management HTTP PROXY Consumers
  • 13. Multi-cluster Producers ● Support producing to multiple clusters at the same time ● High level producer API implemented by multiple embedded Kafka producers public interface KsProducer<V> { // ... <T extends V> CompletableFuture<SendResult> send(T obj) }
  • 14. ● Dynamic topic to cluster mapping ○ Enabled by NetflixOSS/Archaius "t1, t2" : { "where" : [{ "sink" : "fronting-kafka-1" }] }, "t3" : { "where" : [{ "sink" : "fronting-kafka-2" }] }, "__default__" : { "where" : [ { "sink" : "fronting-kafka-2" }] }
  • 15. @Stream("foo") // send to topic “foo” public class Foo { // ... } @Stream("bar") // send to topic “bar” public class Bar { // ... } KsProducer<Object> producer = // … producer.send(new Foo()); // Send to Kafka cluster which has “foo” topic producer.send(new Bar()); // Send to Kafka cluster which has “bar” topic
  • 16. Fronting Kafka ● For data collection and buffering ● Optimized for producers ○ Only consumers are routers
  • 17. Scaling of Fronting Kafka ● Creating / destroying Kafka clusters ○ E.g., create new topic on new clusters and update topic to cluster mapping ● No partition reassignment
  • 18. Data Balancing ● Assign the same number of partitions of any topic to every brokers ○ E.g., for clusters of 12 brokers, create topics with partitions of 12, 24, 36 ○ Guaranteed even distribution of data (aside from occasional leader imbalance) ● Balance data among clusters by moving topics ○ Must dynamically update topic to cluster mapping
  • 20. Consumer Kafka ● Scaling ○ Add brokers and partitions for small cluster for non-keyed topics ○ Create same topics on a new cluster and move consumers
  • 21. Future Plan ● Cross-cluster topic ○ load sharing beyond single cluster ○ Auto-scale ○ Consumer/producer support needed
  • 22. Multi-Cluster Consumer (Ongoing work) ● Same Kafka consumer interface ● Consume from multiple clusters with dynamic topic to cluster mapping ○ Keep subscription state ○ Receive mapping updates ○ Create and delegate to underlying Kafka consumer for each associated cluster on the fly
  • 23. Multi-Cluster Consumer Topic to Cluster Mapping and Code Example { "foo": [ {"vip": "cluster1"}, {"vip": "cluster2"} ], “bar”: [ {“vip”: “cluster2”} ] } // Create a multi-cluster consumer Consumer<String, String> multiClusterConsumer = ... // subscribe as usual and keep subscription state consumer.subscribe(new ArrayList<String>(“foo”)); while (...) { // fetch from both clusters for topic “foo” and // return the aggregated records ConsumerRecords<String, String> records = multiClusterConsumer.poll(2000); process(records); }
  • 24. Topic move for Multi-cluster Consumers Multi-cluster Consumer Producer “foo”: “cluster1” “foo”: [“cluster1”] “foo”: “cluster2” “foo”: [“cluster1”, “cluster2”] “foo”: [“cluster2”] cluster1 cluster2
  • 26. What About Keyed Messages ● Few topics requiring keyed messages in Netflix ● A word of caution for keyed messages ○ Inflexible/skewed load balancing ○ Difficult to scale ● Handling of keyed messages ○ Currently only produced by routers to consumer Kafka ○ Hard to guarantee message ordering in multi-cluster setting ○ Key-consumer affinity is guaranteed
  • 27. Think Differently on Scaling Kafka The “broker” way The “cluster” way Scale up Add brokers Add clusters Data balance Move partitions to different brokers Move/expand topics to different clusters Producer Produce to different brokers at the same time Produce to different clusters at the same time Consumer Consume from different brokers at the same time Consume from different clusters at the same time