SlideShare une entreprise Scribd logo
1  sur  42
Télécharger pour lire hors ligne
Kafka
Introduction
Rahul Shukla
2© 2015 Cloudera, Inc. All rights reserved.
Agenda
• Introduction
• Concepts
• Use Cases
• Development
• Administration
3© 2015 Cloudera and/or its affiliates. All rights reserved.
Concepts
Basic Kafka Concepts
4© 2015 Cloudera, Inc. All rights reserved.
Why Kafka
4
ClientClient SourceSource
Data Pipelines Start like this.
5© 2015 Cloudera, Inc. All rights reserved.
Why Kafka
5
ClientClient SourceSource
ClientClient
ClientClient
ClientClient
Then we reuse them
6© 2015 Cloudera, Inc. All rights reserved.
Why Kafka
6
ClientClient BackendBackend
ClientClient
ClientClient
ClientClient
Then we add additional
endpoints to the existing
sources
Another
Backend
Another
Backend
7© 2015 Cloudera, Inc. All rights reserved.
Why Kafka
7
ClientClient BackendBackend
ClientClient
ClientClient
ClientClient
Then it starts to look like this
Another
Backend
Another
Backend
Another
Backend
Another
Backend
Another
Backend
Another
Backend
8© 2015 Cloudera, Inc. All rights reserved.
Why Kafka
8
ClientClient BackendBackend
ClientClient
ClientClient
ClientClient
With maybe some of this
Another
Backend
Another
Backend
Another
Backend
Another
Backend
Another
Backend
Another
Backend
As distributed systems and services increasingly
become part of a modern architecture, this makes
for a fragile system
9© 2015 Cloudera, Inc. All rights reserved.
Kafka decouples Data Pipelines
Why Kafka
9
Source
System
Source
System
Source
System
Source
System
Source
System
Source
System
Source
System
Source
System
HadoopHadoop Security
Systems
Security
Systems
Real-time
monitoring
Real-time
monitoring
Data
Warehouse
Data
Warehouse
KafkaKafka
Producers
Brokers
Consumers
10© 2015 Cloudera, Inc. All rights reserved.
Key terminology
• Kafka is run as a cluster comprised of one or more servers each of
which is called a broker.
• Kafka maintains feeds of messages in categories called topics.
• Kafka maintains each topic in 1 or more partitions.
• Processes that publish messages to a Kafka topic are called
producers.
• Processes that subscribe to topics and process the feed of published
messages are called consumers.
• Communication between all components is done via a high
performance simple binary API over TCP protocol
11© 2015 Cloudera, Inc. All rights reserved.
Efficiency
• Kafka achieves it’s high throughput and low latency primarily from
two key concepts
• 1) Batching of individual messages to amortize network overhead
and append/consume chunks together
• 2) Zero copy I/O using sendfile (Java’s NIO FileChannel transferTo
method).
• Implements linux sendfile() system call which skips unnecessary copies
• Heavily relies on Linux PageCache
• The I/O scheduler will batch together consecutive small writes into bigger physical writes
which improves throughput.
• The I/O scheduler will attempt to re-sequence writes to minimize movement of the disk
head which improves throughput.
• It automatically uses all the free memory on the machine
12© 2015 Cloudera, Inc. All rights reserved.
Efficiency - Implication
• In a system where consumers are roughly caught up with
producers, you are essentially reading data from cache.
• This is what allows end-to-end latency to be so low
13© 2015 Cloudera, Inc. All rights reserved.
Architecture
13
ProducerProducer
ConsumerConsumer ConsumerConsumer
Producers
Kafka
Cluster
Consumers
BrokerBroker BrokerBroker BrokerBroker BrokerBroker
ProducerProducer
ZookeeperZookeeper
Offsets
14© 2015 Cloudera, Inc. All rights reserved.
Topics - Partitions
• Topics are broken up into ordered commit logs called
partitions.
• Each message in a partition is assigned a sequential id
called an offset.
• Data is retained for a configurable period of time*
00 11 22 33 44 55 66 77 88 99 1
0
1
0
1
1
1
1
1
2
1
2
1
3
1
3
00 11 22 33 44 55 66 77 88 99 1
0
1
0
1
1
1
1
00 11 22 33 44 55 66 77 88 99 1
0
1
0
1
1
1
1
1
2
1
2
1
3
1
3
Partition
1
Partition
2
Partition
3
Writes
Old Ne
w
15© 2015 Cloudera, Inc. All rights reserved.
Message Ordering
• Ordering is only guaranteed within a partition for a topic
• To ensure ordering:
• Group messages in a partition by key (producer)
• Configure exactly one consumer instance per partition within a
consumer group
16© 2015 Cloudera, Inc. All rights reserved.
Guarantees
• Messages sent by a producer to a particular topic partition
will be appended in the order they are sent
• A consumer instance sees messages in the order they are
stored in the log
• For a topic with replication factor N, Kafka can tolerate up to
N-1 server failures without “losing” any messages
committed to the log
17© 2015 Cloudera, Inc. All rights reserved.
Topics - Replication
• Topics can (and should) be replicated.
• The unit of replication is the partition
• Each partition in a topic has 1 leader and 0 or more replicas.
• A replica is deemed to be “in-sync” if
• The replica can communicate with Zookeeper
• The replica is not “too far” behind the leader (configurable)
• The group of in-sync replicas for a partition is called the ISR
(In-Sync Replicas)
• The Replication factor cannot be lowered
18© 2015 Cloudera, Inc. All rights reserved.
Topics - Replication
• Durability can be configured with the producer configuration
request.required.acks
• 0 The producer never waits for an ack
• 1 The producer gets an ack after the leader replica has received the
data
• -1 The producer gets an ack after all ISRs receive the data
• Minimum available ISR can also be configured such that an
error is returned if enough replicas are not available to
replicate data
19© 2015 Cloudera, Inc. All rights reserved.
• Producers can choose to trade throughput for durability of writes:
• Throughput can also be raised with more brokers… (so do this instead)!
• A sane configuration:
Durable Writes
Durability Behaviour Per Event
Latency
Required
Acknowledgements
(request.required.acks)
Highest ACK all ISRs have received Highest -1
Medium ACK once the leader has received Medium 1
Lowest No ACKs required Lowest 0
Property Value
replication 3
min.insync.replicas 2
request.required.acks -1
20© 2015 Cloudera, Inc. All rights reserved.
Producer
• Producers publish to a topic of their choosing (push)
• Load can be distributed
• Typically by “round-robin”
• Can also do “semantic partitioning” based on a key in the message
• Brokers load balance by partition
• Can support async (less durable) sending
• All nodes can answer metadata requests about:
• Which servers are alive
• Where leaders are for the partitions of a topic
21© 2015 Cloudera, Inc. All rights reserved.
Producer – Load Balancing and ISRs
00
11
22
00
11
22
00
11
22
ProducerProducer
Broker
100
Broker
101
Broker
102
Topic:
Partitions:
Replicas:
my_topic
3
3
Partition:
Leader:
ISR:
1
101
100,102
Partition:
Leader:
ISR:
2
102
101,100
Partition:
Leader:
ISR:
0
100
101,102
22© 2015 Cloudera, Inc. All rights reserved.
Consumer
• Multiple Consumers can read from the same topic
• Each Consumer is responsible for managing it’s own offset
• Messages stay on Kafka…they are not removed after they are
consumed
12345671234567
12345681234568
12345691234569
12345701234570
12345711234571
12345721234572
12345731234573
12345741234574
12345751234575
12345761234576
12345771234577
ConsumerConsumer
ProducerProducer ConsumerConsumer
ConsumerConsumer
12345771234577
Send
Write
Fetch
Fetch
Fetch
23© 2015 Cloudera, Inc. All rights reserved.
Consumer
• Consumers can go away
12345671234567
12345681234568
12345691234569
12345701234570
12345711234571
12345721234572
12345731234573
12345741234574
12345751234575
12345761234576
12345771234577
ConsumerConsumer
ProducerProducer
ConsumerConsumer
12345771234577
Send
Write
Fetch
Fetch
24© 2015 Cloudera, Inc. All rights reserved.
Consumer
• And then come back
12345671234567
12345681234568
12345691234569
12345701234570
12345711234571
12345721234572
12345731234573
12345741234574
12345751234575
12345761234576
12345771234577
ConsumerConsumer
ProducerProducer ConsumerConsumer
ConsumerConsumer
12345771234577
Send
Write
Fetch
Fetch
Fetch
25© 2015 Cloudera, Inc. All rights reserved.
Consumer - Groups
• Consumers can be organized into Consumer Groups
Common Patterns:
• 1) All consumer instances in one group
• Acts like a traditional queue with load balancing
• 2) All consumer instances in different groups
• All messages are broadcast to all consumer instances
• 3) “Logical Subscriber” – Many consumer instances in a group
• Consumers are added for scalability and fault tolerance
• Each consumer instance reads from one or more partitions for a topic
• There cannot be more consumer instances than partitions
26© 2015 Cloudera, Inc. All rights reserved.
Consumer - Groups
P0P0 P3P3 P1P1 P2P2
C1C1 C2C2 C3C3 C4C4 C5C5 C6C6
Kafka
ClusterBroker 1 Broker 2
Consumer Group
A
Consumer Group B
Consumer
Groups provide
isolation to
topics and
partitions
27© 2015 Cloudera, Inc. All rights reserved.
Consumer - Groups
P0P0 P3P3 P1P1 P2P2
C1C1 C2C2 C3C3 C4C4 C5C5 C6C6
Kafka
ClusterBroker 1 Broker 2
Consumer Group
A
Consumer Group B
Can rebalance
themselves X
28© 2015 Cloudera, Inc. All rights reserved.
Delivery Semantics
• At least once
• Messages are never lost but may be redelivered
• At most once
• Messages are lost but never redelivered
• Exactly once
• Messages are delivered once and only once
Default
29© 2015 Cloudera, Inc. All rights reserved.
Delivery Semantics
• At least once
• Messages are never lost but may be redelivered
• At most once
• Messages are lost but never redelivered
• Exactly once
• Messages are delivered once and only once
Much Harder
(Impossible??)
30© 2015 Cloudera, Inc. All rights reserved.
Getting Exactly Once Semantics
• Must consider two components
• Durability guarantees when publishing a message
• Durability guarantees when consuming a message
• Producer
• What happens when a produce request was sent but a network error
returned before an ack?
• Use a single writer per partition and check the latest committed
value after network errors
• Consumer
• Include a unique ID (e.g. UUID) and de-duplicate.
• Consider storing offsets with data
31© 2015 Cloudera, Inc. All rights reserved.
Use Cases
• Real-Time Stream Processing (combined with Spark Streaming)
• General purpose Message Bus
• Collecting User Activity Data
• Collecting Operational Metrics from applications, servers or devices
• Log Aggregation
• Change Data Capture
• Commit Log for distributed systems
32© 2015 Cloudera, Inc. All rights reserved.
Positioning
Should I use Kafka ?
• For really large file transfers?
• Probably not, it’s designed for “messages” not really for files. If you need to ship
large files, consider good-ole-file transfer, or breaking up the files and reading per
line to move to Kafka.
• As a replacement for MQ/Rabbit/Tibco
• Probably. Performance Numbers are drastically superior. Also gives the ability for
transient consumers. Handles failures pretty well.
• If security on the broker and across the wire is important?
• Kafka-0.9.0.0 onwards kafka provides ssl security and authentication(experimental)
• To do transformations of data?
• Kafka-0.9.0.0 onwards kafka introduce kafka-connect for this.
33© 2015 Cloudera and/or its affiliates. All rights reserved.
Developing with
Kafka
API and Clients
34© 2015 Cloudera, Inc. All rights reserved.
Kafka Clients
• Remember Kafka implements a binary TCP protocol.
• All clients except the JVM-based clients are maintained external to the
code base.
35© 2015 Cloudera, Inc. All rights reserved.
Producer
public static void main(String[] args) throws InterruptedException {
String topic = "kafka";
String brokers = "localhost:9092";
String StringSerializer = "org.apache.kafka.common.serialization.StringSerializer";
Random rnd = new Random();
Map<String, Object> config = new HashMap<>();
config.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, brokers);
config.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer);
config.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer);
KafkaProducer producer = new KafkaProducer<String, String>(config);
while (true) {
LocalDateTime runtime = LocalDateTime.now();
String ip = "192.168.2." + rnd.nextInt(255);
String msg = runtime + " www.example.com " + ip;
System.out.println(msg);
ProducerRecord record = new ProducerRecord<String, String>(topic, ip, msg);
producer.send(record);
Thread.sleep(100);
}
}
36© 2015 Cloudera, Inc. All rights reserved.
Consumer
public static void main(String[] args) throws InterruptedException {
String topic = "kafka";
String brokers = "localhost:9092";
String stringSerializer = "org.apache.kafka.common.serialization.StringDeserializer";
Map<String, Object> config = new HashMap<>();
config.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, brokers);
config.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, stringSerializer);
config.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, stringSerializer);
config.put(ConsumerConfig.GROUP_ID_CONFIG, "test");
KafkaConsumer consumer = new KafkaConsumer<String, String>(config);
HashSet<String> topics = new HashSet<>();
topics.add(topic);
consumer.subscribe(topics);
while (true) {
ConsumerRecords<String, String> records = consumer.poll(200);
if (!records.isEmpty()) {
for (ConsumerRecord<String, String> record : records) {
System.out.println(record);
}
}
}
}
37© 2015 Cloudera, Inc. All rights reserved.
Kafka Clients
• Full list on the wiki, some examples…
38© 2015 Cloudera and/or its affiliates. All rights reserved.
Administration
Basic
39© 2015 Cloudera, Inc. All rights reserved.
Installation / Configuration
1. Download kafka from
– > wget
http://apache.mirror.serversaustralia.com.au/kafka/0.10.0.0/kafka_2.11-0.10.0.0.tgz
– > tar -xvzf kafka_2.11-0.10.0.0.tgz
– > cd kafka_2.11-0.10.0.0
2. Start Zookeeper server
– > bin/zookeeper-server-start.sh config/zookeeper.properties
3. Start Kafka sever
– > bin/kafka-server-start.sh config/server.properties
40© 2015 Cloudera, Inc. All rights reserved.
• Create a topic*:
• List topics:
• Write data:
• Read data:
Usage
bin/kafka­topics.sh ­­zookeeper zkhost:2181 ­­create ­­topic foo ­­replication­factor 1 ­­partitions 1
bin/kafka­topics.sh ­­zookeeper zkhost:2181 ­­list
cat data | bin/kafka­console­producer.sh ­­broker­list brokerhost:9092 ­­topic test
bin/kafka­console­consumer.sh ­­zookeeper zkhost:2181 ­­topic test ­­from­beginning
41© 2015 Cloudera, Inc. All rights reserved.
Kafka Topics - Admin
• Commands to administer topics are via shell script:
bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic
test
bin/kafka-topics.sh --list --zookeeper localhost:2181
Topics can be created dynamically by producers…don’t do
this
Thank you.

Contenu connexe

Tendances

Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017Gwen (Chen) Shapira
 
Protecting your data at rest with Apache Kafka by Confluent and Vormetric
Protecting your data at rest with Apache Kafka by Confluent and VormetricProtecting your data at rest with Apache Kafka by Confluent and Vormetric
Protecting your data at rest with Apache Kafka by Confluent and Vormetricconfluent
 
Real time Messages at Scale with Apache Kafka and Couchbase
Real time Messages at Scale with Apache Kafka and CouchbaseReal time Messages at Scale with Apache Kafka and Couchbase
Real time Messages at Scale with Apache Kafka and CouchbaseWill Gardella
 
How Apache Kafka is transforming Hadoop, Spark and Storm
How Apache Kafka is transforming Hadoop, Spark and StormHow Apache Kafka is transforming Hadoop, Spark and Storm
How Apache Kafka is transforming Hadoop, Spark and StormEdureka!
 
Kafka clients and emitters
Kafka clients and emittersKafka clients and emitters
Kafka clients and emittersEdgar Domingues
 
IBM Message Hub service in Bluemix - Apache Kafka in a public cloud
IBM Message Hub service in Bluemix - Apache Kafka in a public cloudIBM Message Hub service in Bluemix - Apache Kafka in a public cloud
IBM Message Hub service in Bluemix - Apache Kafka in a public cloudAndrew Schofield
 
Secure Kafka at Salesforce.com
Secure Kafka at Salesforce.comSecure Kafka at Salesforce.com
Secure Kafka at Salesforce.comRajasekar Elango
 
Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...
Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...
Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...AWS Summits
 
Decisions behind hypervisor selection in CloudStack 4.3
Decisions behind hypervisor selection in CloudStack 4.3Decisions behind hypervisor selection in CloudStack 4.3
Decisions behind hypervisor selection in CloudStack 4.3Tim Mackey
 
Kafka Security 101 and Real-World Tips
Kafka Security 101 and Real-World Tips Kafka Security 101 and Real-World Tips
Kafka Security 101 and Real-World Tips confluent
 
Deploying Apache CloudStack from API to UI
Deploying Apache CloudStack from API to UIDeploying Apache CloudStack from API to UI
Deploying Apache CloudStack from API to UIJoe Brockmeier
 
Kafka at Peak Performance
Kafka at Peak PerformanceKafka at Peak Performance
Kafka at Peak PerformanceTodd Palino
 
Developing Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaDeveloping Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaJoe Stein
 
Unifying Messaging, Queueing & Light Weight Compute Using Apache Pulsar
Unifying Messaging, Queueing & Light Weight Compute Using Apache PulsarUnifying Messaging, Queueing & Light Weight Compute Using Apache Pulsar
Unifying Messaging, Queueing & Light Weight Compute Using Apache PulsarKarthik Ramasamy
 

Tendances (20)

Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017
 
kafka for db as postgres
kafka for db as postgreskafka for db as postgres
kafka for db as postgres
 
Protecting your data at rest with Apache Kafka by Confluent and Vormetric
Protecting your data at rest with Apache Kafka by Confluent and VormetricProtecting your data at rest with Apache Kafka by Confluent and Vormetric
Protecting your data at rest with Apache Kafka by Confluent and Vormetric
 
Real time Messages at Scale with Apache Kafka and Couchbase
Real time Messages at Scale with Apache Kafka and CouchbaseReal time Messages at Scale with Apache Kafka and Couchbase
Real time Messages at Scale with Apache Kafka and Couchbase
 
Apache Kafka Security
Apache Kafka Security Apache Kafka Security
Apache Kafka Security
 
How Apache Kafka is transforming Hadoop, Spark and Storm
How Apache Kafka is transforming Hadoop, Spark and StormHow Apache Kafka is transforming Hadoop, Spark and Storm
How Apache Kafka is transforming Hadoop, Spark and Storm
 
Kafka clients and emitters
Kafka clients and emittersKafka clients and emitters
Kafka clients and emitters
 
Kafka Security
Kafka SecurityKafka Security
Kafka Security
 
IBM Message Hub service in Bluemix - Apache Kafka in a public cloud
IBM Message Hub service in Bluemix - Apache Kafka in a public cloudIBM Message Hub service in Bluemix - Apache Kafka in a public cloud
IBM Message Hub service in Bluemix - Apache Kafka in a public cloud
 
Secure Kafka at Salesforce.com
Secure Kafka at Salesforce.comSecure Kafka at Salesforce.com
Secure Kafka at Salesforce.com
 
Event Hub & Kafka
Event Hub & KafkaEvent Hub & Kafka
Event Hub & Kafka
 
Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...
Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...
Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...
 
Decisions behind hypervisor selection in CloudStack 4.3
Decisions behind hypervisor selection in CloudStack 4.3Decisions behind hypervisor selection in CloudStack 4.3
Decisions behind hypervisor selection in CloudStack 4.3
 
Kafka Security 101 and Real-World Tips
Kafka Security 101 and Real-World Tips Kafka Security 101 and Real-World Tips
Kafka Security 101 and Real-World Tips
 
Deploying Apache CloudStack from API to UI
Deploying Apache CloudStack from API to UIDeploying Apache CloudStack from API to UI
Deploying Apache CloudStack from API to UI
 
Introduction to CloudStack
Introduction to CloudStack Introduction to CloudStack
Introduction to CloudStack
 
Apache CloudStack from API to UI
Apache CloudStack from API to UIApache CloudStack from API to UI
Apache CloudStack from API to UI
 
Kafka at Peak Performance
Kafka at Peak PerformanceKafka at Peak Performance
Kafka at Peak Performance
 
Developing Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaDeveloping Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache Kafka
 
Unifying Messaging, Queueing & Light Weight Compute Using Apache Pulsar
Unifying Messaging, Queueing & Light Weight Compute Using Apache PulsarUnifying Messaging, Queueing & Light Weight Compute Using Apache Pulsar
Unifying Messaging, Queueing & Light Weight Compute Using Apache Pulsar
 

En vedette

Decoupling Decisions with Apache Kafka
Decoupling Decisions with Apache KafkaDecoupling Decisions with Apache Kafka
Decoupling Decisions with Apache KafkaGrant Henke
 
Tuning Kafka for Fun and Profit
Tuning Kafka for Fun and ProfitTuning Kafka for Fun and Profit
Tuning Kafka for Fun and ProfitTodd Palino
 
Optimizing Regulatory Compliance with Big Data
Optimizing Regulatory Compliance with Big DataOptimizing Regulatory Compliance with Big Data
Optimizing Regulatory Compliance with Big DataCloudera, Inc.
 
Kafka Reliability Guarantees ATL Kafka User Group
Kafka Reliability Guarantees ATL Kafka User GroupKafka Reliability Guarantees ATL Kafka User Group
Kafka Reliability Guarantees ATL Kafka User GroupJeff Holoman
 
Building Realtim Data Pipelines with Kafka Connect and Spark Streaming
Building Realtim Data Pipelines with Kafka Connect and Spark StreamingBuilding Realtim Data Pipelines with Kafka Connect and Spark Streaming
Building Realtim Data Pipelines with Kafka Connect and Spark StreamingGuozhang Wang
 
Introduction to Apache Kudu
Introduction to Apache KuduIntroduction to Apache Kudu
Introduction to Apache KuduJeff Holoman
 
Kafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be thereKafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be thereGwen (Chen) Shapira
 
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
 Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ... Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...Cloudera, Inc.
 

En vedette (9)

Decoupling Decisions with Apache Kafka
Decoupling Decisions with Apache KafkaDecoupling Decisions with Apache Kafka
Decoupling Decisions with Apache Kafka
 
Tuning Kafka for Fun and Profit
Tuning Kafka for Fun and ProfitTuning Kafka for Fun and Profit
Tuning Kafka for Fun and Profit
 
Optimizing Regulatory Compliance with Big Data
Optimizing Regulatory Compliance with Big DataOptimizing Regulatory Compliance with Big Data
Optimizing Regulatory Compliance with Big Data
 
Kafka Reliability Guarantees ATL Kafka User Group
Kafka Reliability Guarantees ATL Kafka User GroupKafka Reliability Guarantees ATL Kafka User Group
Kafka Reliability Guarantees ATL Kafka User Group
 
Building Realtim Data Pipelines with Kafka Connect and Spark Streaming
Building Realtim Data Pipelines with Kafka Connect and Spark StreamingBuilding Realtim Data Pipelines with Kafka Connect and Spark Streaming
Building Realtim Data Pipelines with Kafka Connect and Spark Streaming
 
Introduction to Apache Kudu
Introduction to Apache KuduIntroduction to Apache Kudu
Introduction to Apache Kudu
 
Kafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be thereKafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be there
 
Kafka presentation
Kafka presentationKafka presentation
Kafka presentation
 
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
 Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ... Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
 

Similaire à intro-kafka

Big Data Day LA 2015 - Introduction to Apache Kafka - The Big Data Message Bu...
Big Data Day LA 2015 - Introduction to Apache Kafka - The Big Data Message Bu...Big Data Day LA 2015 - Introduction to Apache Kafka - The Big Data Message Bu...
Big Data Day LA 2015 - Introduction to Apache Kafka - The Big Data Message Bu...Data Con LA
 
Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache KafkaChhavi Parasher
 
Kinesis vs-kafka-and-kafka-deep-dive
Kinesis vs-kafka-and-kafka-deep-diveKinesis vs-kafka-and-kafka-deep-dive
Kinesis vs-kafka-and-kafka-deep-diveYifeng Jiang
 
Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka IntroductionAmita Mirajkar
 
Fundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache KafkaFundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache KafkaAngelo Cesaro
 
Unleashing Real-time Power with Kafka.pptx
Unleashing Real-time Power with Kafka.pptxUnleashing Real-time Power with Kafka.pptx
Unleashing Real-time Power with Kafka.pptxKnoldus Inc.
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Cloudera, Inc.
 
The Flink - Apache Bigtop integration
The Flink - Apache Bigtop integrationThe Flink - Apache Bigtop integration
The Flink - Apache Bigtop integrationMárton Balassi
 
Not Your Mother's Kafka - Deep Dive into Confluent Cloud Infrastructure | Gwe...
Not Your Mother's Kafka - Deep Dive into Confluent Cloud Infrastructure | Gwe...Not Your Mother's Kafka - Deep Dive into Confluent Cloud Infrastructure | Gwe...
Not Your Mother's Kafka - Deep Dive into Confluent Cloud Infrastructure | Gwe...HostedbyConfluent
 
Availability of Kafka - Beyond the Brokers | Andrew Borley and Emma Humber, IBM
Availability of Kafka - Beyond the Brokers | Andrew Borley and Emma Humber, IBMAvailability of Kafka - Beyond the Brokers | Andrew Borley and Emma Humber, IBM
Availability of Kafka - Beyond the Brokers | Andrew Borley and Emma Humber, IBMHostedbyConfluent
 
OnPrem Monitoring.pdf
OnPrem Monitoring.pdfOnPrem Monitoring.pdf
OnPrem Monitoring.pdfTarekHamdi8
 
Netherlands Tech Tour 02 - MySQL Fabric
Netherlands Tech Tour 02 -   MySQL FabricNetherlands Tech Tour 02 -   MySQL Fabric
Netherlands Tech Tour 02 - MySQL FabricMark Swarbrick
 
Building Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Building Effective Near-Real-Time Analytics with Spark Streaming and KuduBuilding Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Building Effective Near-Real-Time Analytics with Spark Streaming and KuduJeremy Beard
 
Hello, kafka! (an introduction to apache kafka)
Hello, kafka! (an introduction to apache kafka)Hello, kafka! (an introduction to apache kafka)
Hello, kafka! (an introduction to apache kafka)Timothy Spann
 
Distributed messaging with Apache Kafka
Distributed messaging with Apache KafkaDistributed messaging with Apache Kafka
Distributed messaging with Apache KafkaSaumitra Srivastav
 
BDW Chicago 2016 - Jayesh Thakrar, Sr. Software Engineer, Conversant - Data...
BDW Chicago 2016 -  Jayesh Thakrar, Sr. Software Engineer, Conversant -  Data...BDW Chicago 2016 -  Jayesh Thakrar, Sr. Software Engineer, Conversant -  Data...
BDW Chicago 2016 - Jayesh Thakrar, Sr. Software Engineer, Conversant - Data...Big Data Week
 
Lambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLLambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLhuguk
 

Similaire à intro-kafka (20)

Big Data Day LA 2015 - Introduction to Apache Kafka - The Big Data Message Bu...
Big Data Day LA 2015 - Introduction to Apache Kafka - The Big Data Message Bu...Big Data Day LA 2015 - Introduction to Apache Kafka - The Big Data Message Bu...
Big Data Day LA 2015 - Introduction to Apache Kafka - The Big Data Message Bu...
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Kafka for DBAs
Kafka for DBAsKafka for DBAs
Kafka for DBAs
 
Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache Kafka
 
Kinesis vs-kafka-and-kafka-deep-dive
Kinesis vs-kafka-and-kafka-deep-diveKinesis vs-kafka-and-kafka-deep-dive
Kinesis vs-kafka-and-kafka-deep-dive
 
Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka Introduction
 
Fundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache KafkaFundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache Kafka
 
Unleashing Real-time Power with Kafka.pptx
Unleashing Real-time Power with Kafka.pptxUnleashing Real-time Power with Kafka.pptx
Unleashing Real-time Power with Kafka.pptx
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive


 
The Flink - Apache Bigtop integration
The Flink - Apache Bigtop integrationThe Flink - Apache Bigtop integration
The Flink - Apache Bigtop integration
 
Not Your Mother's Kafka - Deep Dive into Confluent Cloud Infrastructure | Gwe...
Not Your Mother's Kafka - Deep Dive into Confluent Cloud Infrastructure | Gwe...Not Your Mother's Kafka - Deep Dive into Confluent Cloud Infrastructure | Gwe...
Not Your Mother's Kafka - Deep Dive into Confluent Cloud Infrastructure | Gwe...
 
Availability of Kafka - Beyond the Brokers | Andrew Borley and Emma Humber, IBM
Availability of Kafka - Beyond the Brokers | Andrew Borley and Emma Humber, IBMAvailability of Kafka - Beyond the Brokers | Andrew Borley and Emma Humber, IBM
Availability of Kafka - Beyond the Brokers | Andrew Borley and Emma Humber, IBM
 
OnPrem Monitoring.pdf
OnPrem Monitoring.pdfOnPrem Monitoring.pdf
OnPrem Monitoring.pdf
 
Netherlands Tech Tour 02 - MySQL Fabric
Netherlands Tech Tour 02 -   MySQL FabricNetherlands Tech Tour 02 -   MySQL Fabric
Netherlands Tech Tour 02 - MySQL Fabric
 
Kafka Explainaton
Kafka ExplainatonKafka Explainaton
Kafka Explainaton
 
Building Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Building Effective Near-Real-Time Analytics with Spark Streaming and KuduBuilding Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Building Effective Near-Real-Time Analytics with Spark Streaming and Kudu
 
Hello, kafka! (an introduction to apache kafka)
Hello, kafka! (an introduction to apache kafka)Hello, kafka! (an introduction to apache kafka)
Hello, kafka! (an introduction to apache kafka)
 
Distributed messaging with Apache Kafka
Distributed messaging with Apache KafkaDistributed messaging with Apache Kafka
Distributed messaging with Apache Kafka
 
BDW Chicago 2016 - Jayesh Thakrar, Sr. Software Engineer, Conversant - Data...
BDW Chicago 2016 -  Jayesh Thakrar, Sr. Software Engineer, Conversant -  Data...BDW Chicago 2016 -  Jayesh Thakrar, Sr. Software Engineer, Conversant -  Data...
BDW Chicago 2016 - Jayesh Thakrar, Sr. Software Engineer, Conversant - Data...
 
Lambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLLambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale ML
 

Plus de Rahul Shukla

Plus de Rahul Shukla (6)

Akka (1)
Akka (1)Akka (1)
Akka (1)
 
Advance RCP
Advance RCPAdvance RCP
Advance RCP
 
Advance JFACE
Advance JFACEAdvance JFACE
Advance JFACE
 
Jface
JfaceJface
Jface
 
Introduction_To_SWT_Rahul_Shukla
Introduction_To_SWT_Rahul_ShuklaIntroduction_To_SWT_Rahul_Shukla
Introduction_To_SWT_Rahul_Shukla
 
Eclipse_Building_Blocks
Eclipse_Building_BlocksEclipse_Building_Blocks
Eclipse_Building_Blocks
 

intro-kafka

  • 2. 2© 2015 Cloudera, Inc. All rights reserved. Agenda • Introduction • Concepts • Use Cases • Development • Administration
  • 3. 3© 2015 Cloudera and/or its affiliates. All rights reserved. Concepts Basic Kafka Concepts
  • 4. 4© 2015 Cloudera, Inc. All rights reserved. Why Kafka 4 ClientClient SourceSource Data Pipelines Start like this.
  • 5. 5© 2015 Cloudera, Inc. All rights reserved. Why Kafka 5 ClientClient SourceSource ClientClient ClientClient ClientClient Then we reuse them
  • 6. 6© 2015 Cloudera, Inc. All rights reserved. Why Kafka 6 ClientClient BackendBackend ClientClient ClientClient ClientClient Then we add additional endpoints to the existing sources Another Backend Another Backend
  • 7. 7© 2015 Cloudera, Inc. All rights reserved. Why Kafka 7 ClientClient BackendBackend ClientClient ClientClient ClientClient Then it starts to look like this Another Backend Another Backend Another Backend Another Backend Another Backend Another Backend
  • 8. 8© 2015 Cloudera, Inc. All rights reserved. Why Kafka 8 ClientClient BackendBackend ClientClient ClientClient ClientClient With maybe some of this Another Backend Another Backend Another Backend Another Backend Another Backend Another Backend As distributed systems and services increasingly become part of a modern architecture, this makes for a fragile system
  • 9. 9© 2015 Cloudera, Inc. All rights reserved. Kafka decouples Data Pipelines Why Kafka 9 Source System Source System Source System Source System Source System Source System Source System Source System HadoopHadoop Security Systems Security Systems Real-time monitoring Real-time monitoring Data Warehouse Data Warehouse KafkaKafka Producers Brokers Consumers
  • 10. 10© 2015 Cloudera, Inc. All rights reserved. Key terminology • Kafka is run as a cluster comprised of one or more servers each of which is called a broker. • Kafka maintains feeds of messages in categories called topics. • Kafka maintains each topic in 1 or more partitions. • Processes that publish messages to a Kafka topic are called producers. • Processes that subscribe to topics and process the feed of published messages are called consumers. • Communication between all components is done via a high performance simple binary API over TCP protocol
  • 11. 11© 2015 Cloudera, Inc. All rights reserved. Efficiency • Kafka achieves it’s high throughput and low latency primarily from two key concepts • 1) Batching of individual messages to amortize network overhead and append/consume chunks together • 2) Zero copy I/O using sendfile (Java’s NIO FileChannel transferTo method). • Implements linux sendfile() system call which skips unnecessary copies • Heavily relies on Linux PageCache • The I/O scheduler will batch together consecutive small writes into bigger physical writes which improves throughput. • The I/O scheduler will attempt to re-sequence writes to minimize movement of the disk head which improves throughput. • It automatically uses all the free memory on the machine
  • 12. 12© 2015 Cloudera, Inc. All rights reserved. Efficiency - Implication • In a system where consumers are roughly caught up with producers, you are essentially reading data from cache. • This is what allows end-to-end latency to be so low
  • 13. 13© 2015 Cloudera, Inc. All rights reserved. Architecture 13 ProducerProducer ConsumerConsumer ConsumerConsumer Producers Kafka Cluster Consumers BrokerBroker BrokerBroker BrokerBroker BrokerBroker ProducerProducer ZookeeperZookeeper Offsets
  • 14. 14© 2015 Cloudera, Inc. All rights reserved. Topics - Partitions • Topics are broken up into ordered commit logs called partitions. • Each message in a partition is assigned a sequential id called an offset. • Data is retained for a configurable period of time* 00 11 22 33 44 55 66 77 88 99 1 0 1 0 1 1 1 1 1 2 1 2 1 3 1 3 00 11 22 33 44 55 66 77 88 99 1 0 1 0 1 1 1 1 00 11 22 33 44 55 66 77 88 99 1 0 1 0 1 1 1 1 1 2 1 2 1 3 1 3 Partition 1 Partition 2 Partition 3 Writes Old Ne w
  • 15. 15© 2015 Cloudera, Inc. All rights reserved. Message Ordering • Ordering is only guaranteed within a partition for a topic • To ensure ordering: • Group messages in a partition by key (producer) • Configure exactly one consumer instance per partition within a consumer group
  • 16. 16© 2015 Cloudera, Inc. All rights reserved. Guarantees • Messages sent by a producer to a particular topic partition will be appended in the order they are sent • A consumer instance sees messages in the order they are stored in the log • For a topic with replication factor N, Kafka can tolerate up to N-1 server failures without “losing” any messages committed to the log
  • 17. 17© 2015 Cloudera, Inc. All rights reserved. Topics - Replication • Topics can (and should) be replicated. • The unit of replication is the partition • Each partition in a topic has 1 leader and 0 or more replicas. • A replica is deemed to be “in-sync” if • The replica can communicate with Zookeeper • The replica is not “too far” behind the leader (configurable) • The group of in-sync replicas for a partition is called the ISR (In-Sync Replicas) • The Replication factor cannot be lowered
  • 18. 18© 2015 Cloudera, Inc. All rights reserved. Topics - Replication • Durability can be configured with the producer configuration request.required.acks • 0 The producer never waits for an ack • 1 The producer gets an ack after the leader replica has received the data • -1 The producer gets an ack after all ISRs receive the data • Minimum available ISR can also be configured such that an error is returned if enough replicas are not available to replicate data
  • 19. 19© 2015 Cloudera, Inc. All rights reserved. • Producers can choose to trade throughput for durability of writes: • Throughput can also be raised with more brokers… (so do this instead)! • A sane configuration: Durable Writes Durability Behaviour Per Event Latency Required Acknowledgements (request.required.acks) Highest ACK all ISRs have received Highest -1 Medium ACK once the leader has received Medium 1 Lowest No ACKs required Lowest 0 Property Value replication 3 min.insync.replicas 2 request.required.acks -1
  • 20. 20© 2015 Cloudera, Inc. All rights reserved. Producer • Producers publish to a topic of their choosing (push) • Load can be distributed • Typically by “round-robin” • Can also do “semantic partitioning” based on a key in the message • Brokers load balance by partition • Can support async (less durable) sending • All nodes can answer metadata requests about: • Which servers are alive • Where leaders are for the partitions of a topic
  • 21. 21© 2015 Cloudera, Inc. All rights reserved. Producer – Load Balancing and ISRs 00 11 22 00 11 22 00 11 22 ProducerProducer Broker 100 Broker 101 Broker 102 Topic: Partitions: Replicas: my_topic 3 3 Partition: Leader: ISR: 1 101 100,102 Partition: Leader: ISR: 2 102 101,100 Partition: Leader: ISR: 0 100 101,102
  • 22. 22© 2015 Cloudera, Inc. All rights reserved. Consumer • Multiple Consumers can read from the same topic • Each Consumer is responsible for managing it’s own offset • Messages stay on Kafka…they are not removed after they are consumed 12345671234567 12345681234568 12345691234569 12345701234570 12345711234571 12345721234572 12345731234573 12345741234574 12345751234575 12345761234576 12345771234577 ConsumerConsumer ProducerProducer ConsumerConsumer ConsumerConsumer 12345771234577 Send Write Fetch Fetch Fetch
  • 23. 23© 2015 Cloudera, Inc. All rights reserved. Consumer • Consumers can go away 12345671234567 12345681234568 12345691234569 12345701234570 12345711234571 12345721234572 12345731234573 12345741234574 12345751234575 12345761234576 12345771234577 ConsumerConsumer ProducerProducer ConsumerConsumer 12345771234577 Send Write Fetch Fetch
  • 24. 24© 2015 Cloudera, Inc. All rights reserved. Consumer • And then come back 12345671234567 12345681234568 12345691234569 12345701234570 12345711234571 12345721234572 12345731234573 12345741234574 12345751234575 12345761234576 12345771234577 ConsumerConsumer ProducerProducer ConsumerConsumer ConsumerConsumer 12345771234577 Send Write Fetch Fetch Fetch
  • 25. 25© 2015 Cloudera, Inc. All rights reserved. Consumer - Groups • Consumers can be organized into Consumer Groups Common Patterns: • 1) All consumer instances in one group • Acts like a traditional queue with load balancing • 2) All consumer instances in different groups • All messages are broadcast to all consumer instances • 3) “Logical Subscriber” – Many consumer instances in a group • Consumers are added for scalability and fault tolerance • Each consumer instance reads from one or more partitions for a topic • There cannot be more consumer instances than partitions
  • 26. 26© 2015 Cloudera, Inc. All rights reserved. Consumer - Groups P0P0 P3P3 P1P1 P2P2 C1C1 C2C2 C3C3 C4C4 C5C5 C6C6 Kafka ClusterBroker 1 Broker 2 Consumer Group A Consumer Group B Consumer Groups provide isolation to topics and partitions
  • 27. 27© 2015 Cloudera, Inc. All rights reserved. Consumer - Groups P0P0 P3P3 P1P1 P2P2 C1C1 C2C2 C3C3 C4C4 C5C5 C6C6 Kafka ClusterBroker 1 Broker 2 Consumer Group A Consumer Group B Can rebalance themselves X
  • 28. 28© 2015 Cloudera, Inc. All rights reserved. Delivery Semantics • At least once • Messages are never lost but may be redelivered • At most once • Messages are lost but never redelivered • Exactly once • Messages are delivered once and only once Default
  • 29. 29© 2015 Cloudera, Inc. All rights reserved. Delivery Semantics • At least once • Messages are never lost but may be redelivered • At most once • Messages are lost but never redelivered • Exactly once • Messages are delivered once and only once Much Harder (Impossible??)
  • 30. 30© 2015 Cloudera, Inc. All rights reserved. Getting Exactly Once Semantics • Must consider two components • Durability guarantees when publishing a message • Durability guarantees when consuming a message • Producer • What happens when a produce request was sent but a network error returned before an ack? • Use a single writer per partition and check the latest committed value after network errors • Consumer • Include a unique ID (e.g. UUID) and de-duplicate. • Consider storing offsets with data
  • 31. 31© 2015 Cloudera, Inc. All rights reserved. Use Cases • Real-Time Stream Processing (combined with Spark Streaming) • General purpose Message Bus • Collecting User Activity Data • Collecting Operational Metrics from applications, servers or devices • Log Aggregation • Change Data Capture • Commit Log for distributed systems
  • 32. 32© 2015 Cloudera, Inc. All rights reserved. Positioning Should I use Kafka ? • For really large file transfers? • Probably not, it’s designed for “messages” not really for files. If you need to ship large files, consider good-ole-file transfer, or breaking up the files and reading per line to move to Kafka. • As a replacement for MQ/Rabbit/Tibco • Probably. Performance Numbers are drastically superior. Also gives the ability for transient consumers. Handles failures pretty well. • If security on the broker and across the wire is important? • Kafka-0.9.0.0 onwards kafka provides ssl security and authentication(experimental) • To do transformations of data? • Kafka-0.9.0.0 onwards kafka introduce kafka-connect for this.
  • 33. 33© 2015 Cloudera and/or its affiliates. All rights reserved. Developing with Kafka API and Clients
  • 34. 34© 2015 Cloudera, Inc. All rights reserved. Kafka Clients • Remember Kafka implements a binary TCP protocol. • All clients except the JVM-based clients are maintained external to the code base.
  • 35. 35© 2015 Cloudera, Inc. All rights reserved. Producer public static void main(String[] args) throws InterruptedException { String topic = "kafka"; String brokers = "localhost:9092"; String StringSerializer = "org.apache.kafka.common.serialization.StringSerializer"; Random rnd = new Random(); Map<String, Object> config = new HashMap<>(); config.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, brokers); config.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer); config.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer); KafkaProducer producer = new KafkaProducer<String, String>(config); while (true) { LocalDateTime runtime = LocalDateTime.now(); String ip = "192.168.2." + rnd.nextInt(255); String msg = runtime + " www.example.com " + ip; System.out.println(msg); ProducerRecord record = new ProducerRecord<String, String>(topic, ip, msg); producer.send(record); Thread.sleep(100); } }
  • 36. 36© 2015 Cloudera, Inc. All rights reserved. Consumer public static void main(String[] args) throws InterruptedException { String topic = "kafka"; String brokers = "localhost:9092"; String stringSerializer = "org.apache.kafka.common.serialization.StringDeserializer"; Map<String, Object> config = new HashMap<>(); config.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, brokers); config.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, stringSerializer); config.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, stringSerializer); config.put(ConsumerConfig.GROUP_ID_CONFIG, "test"); KafkaConsumer consumer = new KafkaConsumer<String, String>(config); HashSet<String> topics = new HashSet<>(); topics.add(topic); consumer.subscribe(topics); while (true) { ConsumerRecords<String, String> records = consumer.poll(200); if (!records.isEmpty()) { for (ConsumerRecord<String, String> record : records) { System.out.println(record); } } } }
  • 37. 37© 2015 Cloudera, Inc. All rights reserved. Kafka Clients • Full list on the wiki, some examples…
  • 38. 38© 2015 Cloudera and/or its affiliates. All rights reserved. Administration Basic
  • 39. 39© 2015 Cloudera, Inc. All rights reserved. Installation / Configuration 1. Download kafka from – > wget http://apache.mirror.serversaustralia.com.au/kafka/0.10.0.0/kafka_2.11-0.10.0.0.tgz – > tar -xvzf kafka_2.11-0.10.0.0.tgz – > cd kafka_2.11-0.10.0.0 2. Start Zookeeper server – > bin/zookeeper-server-start.sh config/zookeeper.properties 3. Start Kafka sever – > bin/kafka-server-start.sh config/server.properties
  • 40. 40© 2015 Cloudera, Inc. All rights reserved. • Create a topic*: • List topics: • Write data: • Read data: Usage bin/kafka­topics.sh ­­zookeeper zkhost:2181 ­­create ­­topic foo ­­replication­factor 1 ­­partitions 1 bin/kafka­topics.sh ­­zookeeper zkhost:2181 ­­list cat data | bin/kafka­console­producer.sh ­­broker­list brokerhost:9092 ­­topic test bin/kafka­console­consumer.sh ­­zookeeper zkhost:2181 ­­topic test ­­from­beginning
  • 41. 41© 2015 Cloudera, Inc. All rights reserved. Kafka Topics - Admin • Commands to administer topics are via shell script: bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test bin/kafka-topics.sh --list --zookeeper localhost:2181 Topics can be created dynamically by producers…don’t do this