SlideShare une entreprise Scribd logo
1  sur  25
Télécharger pour lire hors ligne
Kinesis vs. Kafka –
Kafka Deep Dive
Yifeng Jiang
Solutions Engineer, Hortonworks
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
自己紹介
蒋  逸峰  (Yifeng  Jiang)
•  Solutions  Engineer,  Hortonworks
•  HBase  book  author
•  ⽇日本に来て10年年経ちました…
•  趣味は⼭山登り
•  Twitter:  @uprush
About Hortonworks
Customer Momentum
•  556 customers (as of August 5, 2015)
•  119 customers added in Q2 2015
•  Publicly traded on NASDAQ: HDP
Hortonworks Data Platform
•  Completely open multi-tenant platform
for any app and any data
•  Consistent enterprise services for security,
operations, and governance
Partner for Customer Success
•  Leader in open-source community, focused on
innovation to meet enterprise needs
•  Unrivaled Hadoop support subscriptions
Founded in 2011
Original 24 architects, developers,
operators of Hadoop from Yahoo!
740+
E M P L O Y E E S
1350+
E C O S Y S T E M
PA R T N E R S
Hortonworks Data Plateform (HDP)
Deploy on premises and cloud
Page 5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Kinesis vs. Kafka
Page 6 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Amazon Kinesis -- Introduction
Amazon Kinesis is a fully managed, cloud-based service for real-time data
processing over large, distributed data streams.
Page 7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Apache Kafka -- Introduction
Messaging systems
Real-time
Scalable to handle large data volume
Low Latency
Fault tolerant
Originated at LinkedIn
Aimed at solving data movement across systems
Scala and Java
Open Source (Apache 2.0)
Adapted at many companies
Page 8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Kinesis vs. Kafka – Future
Similar Futures
•  Messaging system for large scale
real-time data processing
•  High performance, highly scalable,
low latency
•  Fault tolerant
Difference
•  Full managed cloud service vs. OSS
•  Data durability and performance
trade off
•  Interface
•  AWS service integration vs. OSS or
single platform (e.g., HDP)
integration
Page 9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Kinesis vs. Kafka – Data Durability
Kinesis
•  Synchronously replicates data
across three facilities
•  High durability for free
Kafka
•  Replication across servers in the
same DC/AZ. Configurable min # in-
sync replica and ACKs.
•  Asynchronously mirror data across
clusters across datacenters / AZs
Performance trade off
Page 10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Kinesis vs. Kafka – Interface
Kinesis
•  REST only
•  Client library wraps REST API
Kafka
•  Low level API
•  REST API available (wrapping low
level API).
Impact throughput and latency
Page 11 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Kinesis vs. Kafka – Processing
Kafka
•  Custom consumers
•  Event monitoring and alerting use case
•  Strom
•  Fraud detection, Simple aggregation
•  Spark Streaming / Storm Trident
•  Micro-batch, near real-time
•  Camus
•  Batch hadoop ingestion
Kinesis
•  KCL applications on EC2
•  Storm
•  Spark streaming
•  EMR for batch ingestion, e.g., write to S3
Page 12 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Kinesis vs. Kafka – Deployment & Operation
Kafka
•  HDP: almost one-click deploy with Ambari
•  Basic monitoring with Ambari
•  Expand and rebalance: partition assignment
and consumer rebalance
•  Zookeeper can also be managed by Ambari
Kinesis
•  Fully managed, one-click deploy
•  CloudWatch monitoring
•  Expand and rebalance: resharding a stream
•  Easy operation
Page 13 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Kafka Deep Dive
Page 14 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Kafka – Concepts
* ZK is used by Broker, Consumer
Broker-0
P0.R0 (L)
P1.R0
Broker-1
P0.R1
P2.R1 (L)
Broker-2
P1.R2 (L)
P2.R2
Topic with 3 partition and Replica factor 2
Producer
Consumer
Page 15 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Kafka -- Concepts
Topics
Partitions
•  Offset
•  Ordered
Replication
•  Prevents data loss
•  Never read or written to
•  Does not increase throughput
•  Tolerates Replica-1 failures
$[ambari-­‐qa@c6401	
  bin]$	
  kafka-­‐topics.sh	
  -­‐-­‐zookeeper	
  c6401:2181	
  -­‐-­‐describe	
  -­‐-­‐topic	
  page_visits	
  
Topic:page_visits 	
  PartitionCount:4 	
  ReplicationFactor:2 	
  Configs:	
  
	
  Topic:	
  page_visits	
  Partition:	
  0 	
  Leader:	
  1	
   	
  Replicas:	
  0,1	
  	
   	
  Isr:	
  1,0	
  
	
  Topic:	
  page_visits	
  Partition:	
  1 	
  Leader:	
  0	
   	
  Replicas:	
  1,0	
  	
  	
  	
  	
  	
  Isr:	
  0,1	
  
	
  Topic:	
  page_visits	
  Partition:	
  2 	
  Leader:	
  1	
   	
  Replicas:	
  0,1	
   	
  Isr:	
  1,0	
  
	
  Topic:	
  page_visits	
  Partition:	
  3 	
  Leader:	
  0	
   	
  Replicas:	
  1,0	
   	
  Isr:	
  0,1	
  
Page 16 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Kafka Broker
Store messages (logs) on local disk
•  Messages are appended to log file
•  Log Retention – time and size based
Controller
•  Cluster management
•  Runs on each broker machine
•  One leader, others follower
Leader Partition
•  Broker that is the leader for certain partitions
Use ZK for coordination
Page 17 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Kafka Producer
New Producer API in 0.8.2
•  Kafka-client.jar
•  New Java API
•  Default Asynchronous mode
Create a new message and publish to a Topic and Partition
•  Takes topic, value and optional key and partition id
Page 18 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Kafka Producer API (0.8.2) – Cont.
•  Original messages are partitioned and then split into batches
•  Each split batch is sent to leader broker (and then replicated to ISR)
•  Each send is acknowledged by either leader broker and/or all ISR
p3 p2 p1 p2 p1m5 m4 m3 m2 m1
Broker-0
P0.R0 (L)
P1.R0
Broker-1
P0.R1
P2.R1 (L)
Broker-2
P1.R2 (L)
P2.R2
Topic with 3 partition and Replica factor 2
App Producer
Lib
partitioner Split
batch
Page 19 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Kafka Consumer
Read data from Kafka brokers
•  JVM APIs supported out of box by project
•  Consumers pull data from brokers
•  Consumer apps have to keep track of the topic-partition offset read
Consumer API
Simple API
•  Greater control over consumption of topic/partitions
•  Consumer apps will be complex as they need to handle things like offset handling.
High-level
•  Uses Simple API internally
•  Consumer apps will be simple to implement as offset tracking is out of box
•  But not flexible in terms of what partitions to read.
Page 20 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Kafka Consumer – Cont.
Consumer Groups
•  Allow multiple hosts to form a group to access a topic
•  Consume hosts join a group by using same group.id
•  Guarantees a message is read by only one consumer in a group
•  Partitions are assigned to consumers in a group
•  A consumer node may get one or more partitions
•  But one partition is assigned to only one consumer host
•  Order of the message is guaranteed with in a partition
•  Max parallelism – determined by topic partitions
•  More consumers than partitions – some consumers will be idle
P0
Broker-0
P3
Broker-1
P1 P2
C1 C2
Consumer Group - 1
C3 C4
Consumer Group - 2
C5 C6
Page 21 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Kafka – Why Kafka is fast
Fast Writes
Writes are appends to file system
Partitions improve performance and throughput
Uses OS buffer cache
Lots of memory on the machine helps
Fast Reads
Memory mapped files
File descriptor to socket descriptor efficient transfer
Linux sendfile(), JVM transferTo() implementation
Why Performance?
Disk flushes are delayed
Durability is guaranteed via replication
When consumers are reading the latest data, it reads from page cache
Page 22 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Kafka – Cluster Mirroring
Mirror Maker
•  Mirror data across clusters even in different DCs / AZs
•  Stand alone tool uses Consumer and Producer API
•  Reads from one or more source cluster and writes to a target cluster
•  Whitelist/blacklist topic
Page 23 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Kafka REST Interface
REST Interface
•  Wraps Producer and Consumer API
Performance Overhead
•  Two hops
•  Extra REST server to maintain
•  Parse JSON payload
Page 24 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Kinesis vs. Kafka -- Terms
Amazon Kinesis Apache Kafka
Streams Topics
Data Records Messages
Producers Producers
Kinesis Producer Library Producer API
Consumers Consumers
Kinesis Applications Consumer Applications
Kinesis Client Library Consumer – High level API
N/A Consumer – Simple API
Shards Partitions
N/A (built in MD5 hash on partition
keys)
Custom partitioner
Sequence Numbers Offset
Application Name Consumer Group ID
Page 25 © Hortonworks Inc. 2011 – 2015. All Rights ReservedPage 25 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Tweet: #hadooproadshow
More About Apache Kafka:
http://hortonworks.com/hadoop/kafka/

Contenu connexe

Tendances

Real-Time Streaming Data Solution on AWS with Beeswax
Real-Time Streaming Data Solution on AWS with BeeswaxReal-Time Streaming Data Solution on AWS with Beeswax
Real-Time Streaming Data Solution on AWS with BeeswaxAmazon Web Services
 
Apache Spark and the Hadoop Ecosystem on AWS
Apache Spark and the Hadoop Ecosystem on AWSApache Spark and the Hadoop Ecosystem on AWS
Apache Spark and the Hadoop Ecosystem on AWSAmazon Web Services
 
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014Amazon Web Services
 
(BDT316) Offloading ETL to Amazon Elastic MapReduce
(BDT316) Offloading ETL to Amazon Elastic MapReduce(BDT316) Offloading ETL to Amazon Elastic MapReduce
(BDT316) Offloading ETL to Amazon Elastic MapReduceAmazon Web Services
 
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & TableauBig Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & TableauSam Palani
 
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudFSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudAmazon Web Services
 
Apache kafka-a distributed streaming platform
Apache kafka-a distributed streaming platformApache kafka-a distributed streaming platform
Apache kafka-a distributed streaming platformconfluent
 
Introduction to Apache Kafka and Confluent... and why they matter
Introduction to Apache Kafka and Confluent... and why they matterIntroduction to Apache Kafka and Confluent... and why they matter
Introduction to Apache Kafka and Confluent... and why they matterconfluent
 
Ten reasons to choose Apache Pulsar over Apache Kafka for Event Sourcing_Robe...
Ten reasons to choose Apache Pulsar over Apache Kafka for Event Sourcing_Robe...Ten reasons to choose Apache Pulsar over Apache Kafka for Event Sourcing_Robe...
Ten reasons to choose Apache Pulsar over Apache Kafka for Event Sourcing_Robe...StreamNative
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache KafkaJim Plush
 
Data Driven Enterprise with Apache Kafka
Data Driven Enterprise with Apache KafkaData Driven Enterprise with Apache Kafka
Data Driven Enterprise with Apache Kafkaconfluent
 
Data Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDBData Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDBconfluent
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift Amazon Web Services
 
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv Amazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Apache Spark and the Hadoop Ecosystem on AWS
Apache Spark and the Hadoop Ecosystem on AWSApache Spark and the Hadoop Ecosystem on AWS
Apache Spark and the Hadoop Ecosystem on AWSAmazon Web Services
 
Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift Amazon Web Services
 

Tendances (20)

Amazon Redshift Deep Dive
Amazon Redshift Deep Dive Amazon Redshift Deep Dive
Amazon Redshift Deep Dive
 
Real-Time Streaming Data Solution on AWS with Beeswax
Real-Time Streaming Data Solution on AWS with BeeswaxReal-Time Streaming Data Solution on AWS with Beeswax
Real-Time Streaming Data Solution on AWS with Beeswax
 
Apache Spark and the Hadoop Ecosystem on AWS
Apache Spark and the Hadoop Ecosystem on AWSApache Spark and the Hadoop Ecosystem on AWS
Apache Spark and the Hadoop Ecosystem on AWS
 
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
 
(BDT316) Offloading ETL to Amazon Elastic MapReduce
(BDT316) Offloading ETL to Amazon Elastic MapReduce(BDT316) Offloading ETL to Amazon Elastic MapReduce
(BDT316) Offloading ETL to Amazon Elastic MapReduce
 
Spark Uber Development Kit
Spark Uber Development KitSpark Uber Development Kit
Spark Uber Development Kit
 
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & TableauBig Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
 
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudFSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
 
Apache kafka-a distributed streaming platform
Apache kafka-a distributed streaming platformApache kafka-a distributed streaming platform
Apache kafka-a distributed streaming platform
 
Introduction to Apache Kafka and Confluent... and why they matter
Introduction to Apache Kafka and Confluent... and why they matterIntroduction to Apache Kafka and Confluent... and why they matter
Introduction to Apache Kafka and Confluent... and why they matter
 
Ten reasons to choose Apache Pulsar over Apache Kafka for Event Sourcing_Robe...
Ten reasons to choose Apache Pulsar over Apache Kafka for Event Sourcing_Robe...Ten reasons to choose Apache Pulsar over Apache Kafka for Event Sourcing_Robe...
Ten reasons to choose Apache Pulsar over Apache Kafka for Event Sourcing_Robe...
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Data Driven Enterprise with Apache Kafka
Data Driven Enterprise with Apache KafkaData Driven Enterprise with Apache Kafka
Data Driven Enterprise with Apache Kafka
 
Data Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDBData Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDB
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
 
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Apache Spark and the Hadoop Ecosystem on AWS
Apache Spark and the Hadoop Ecosystem on AWSApache Spark and the Hadoop Ecosystem on AWS
Apache Spark and the Hadoop Ecosystem on AWS
 
Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift
 
Big Data Tools in AWS
Big Data Tools in AWSBig Data Tools in AWS
Big Data Tools in AWS
 

En vedette

Real-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache KafkaReal-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache KafkaJoe Stein
 
Developing Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaDeveloping Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaJoe Stein
 
Kafka at Scale: Multi-Tier Architectures
Kafka at Scale: Multi-Tier ArchitecturesKafka at Scale: Multi-Tier Architectures
Kafka at Scale: Multi-Tier ArchitecturesTodd Palino
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache KafkaJeff Holoman
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkRahul Jain
 
Apache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignApache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignMichael Noll
 

En vedette (6)

Real-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache KafkaReal-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache Kafka
 
Developing Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaDeveloping Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache Kafka
 
Kafka at Scale: Multi-Tier Architectures
Kafka at Scale: Multi-Tier ArchitecturesKafka at Scale: Multi-Tier Architectures
Kafka at Scale: Multi-Tier Architectures
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache Spark
 
Apache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignApache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - Verisign
 

Similaire à Kinesis vs-kafka-and-kafka-deep-dive

Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...Hortonworks
 
How Tencent Applies Apache Pulsar to Apache InLong —— A Streaming Data Integr...
How Tencent Applies Apache Pulsar to Apache InLong —— A Streaming Data Integr...How Tencent Applies Apache Pulsar to Apache InLong —— A Streaming Data Integr...
How Tencent Applies Apache Pulsar to Apache InLong —— A Streaming Data Integr...StreamNative
 
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks
 
LinuxONE cavemen mmit 20160505 v1.0
LinuxONE cavemen mmit 20160505 v1.0LinuxONE cavemen mmit 20160505 v1.0
LinuxONE cavemen mmit 20160505 v1.0Marcel Mitran
 
Capital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream ProcessingCapital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream Processingconfluent
 
Sub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scaleSub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scaleYifeng Jiang
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseDataWorks Summit
 
Curing the Kafka blindness—Streams Messaging Manager
Curing the Kafka blindness—Streams Messaging ManagerCuring the Kafka blindness—Streams Messaging Manager
Curing the Kafka blindness—Streams Messaging ManagerDataWorks Summit
 
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...Timothy Spann
 
DBCC 2021 - FLiP Stack for Cloud Data Lakes
DBCC 2021 - FLiP Stack for Cloud Data LakesDBCC 2021 - FLiP Stack for Cloud Data Lakes
DBCC 2021 - FLiP Stack for Cloud Data LakesTimothy Spann
 
PortoTechHub - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
PortoTechHub  - Hail Hydrate! From Stream to Lake with Apache Pulsar and FriendsPortoTechHub  - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
PortoTechHub - Hail Hydrate! From Stream to Lake with Apache Pulsar and FriendsTimothy Spann
 
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the EnterpriseUsing Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the EnterpriseDataWorks Summit
 
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...Trivadis
 
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...LINE Corporation
 
Stinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of HortonworksStinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of HortonworksData Con LA
 

Similaire à Kinesis vs-kafka-and-kafka-deep-dive (20)

Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
 
How Tencent Applies Apache Pulsar to Apache InLong —— A Streaming Data Integr...
How Tencent Applies Apache Pulsar to Apache InLong —— A Streaming Data Integr...How Tencent Applies Apache Pulsar to Apache InLong —— A Streaming Data Integr...
How Tencent Applies Apache Pulsar to Apache InLong —— A Streaming Data Integr...
 
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
LinuxONE cavemen mmit 20160505 v1.0
LinuxONE cavemen mmit 20160505 v1.0LinuxONE cavemen mmit 20160505 v1.0
LinuxONE cavemen mmit 20160505 v1.0
 
Capital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream ProcessingCapital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream Processing
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Sub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scaleSub-second-sql-on-hadoop-at-scale
Sub-second-sql-on-hadoop-at-scale
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
 
Curing the Kafka blindness—Streams Messaging Manager
Curing the Kafka blindness—Streams Messaging ManagerCuring the Kafka blindness—Streams Messaging Manager
Curing the Kafka blindness—Streams Messaging Manager
 
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
 
DBCC 2021 - FLiP Stack for Cloud Data Lakes
DBCC 2021 - FLiP Stack for Cloud Data LakesDBCC 2021 - FLiP Stack for Cloud Data Lakes
DBCC 2021 - FLiP Stack for Cloud Data Lakes
 
intro-kafka
intro-kafkaintro-kafka
intro-kafka
 
PortoTechHub - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
PortoTechHub  - Hail Hydrate! From Stream to Lake with Apache Pulsar and FriendsPortoTechHub  - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
PortoTechHub - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
 
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the EnterpriseUsing Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
 
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
 
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
 
Stinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of HortonworksStinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of Hortonworks
 

Plus de Yifeng Jiang

Hive spark-s3acommitter-hbase-nfs
Hive spark-s3acommitter-hbase-nfsHive spark-s3acommitter-hbase-nfs
Hive spark-s3acommitter-hbase-nfsYifeng Jiang
 
introduction-to-apache-kafka
introduction-to-apache-kafkaintroduction-to-apache-kafka
introduction-to-apache-kafkaYifeng Jiang
 
Hive2 Introduction -- Interactive SQL for Big Data
Hive2 Introduction -- Interactive SQL for Big DataHive2 Introduction -- Interactive SQL for Big Data
Hive2 Introduction -- Interactive SQL for Big DataYifeng Jiang
 
Introduction to Streaming Analytics Manager
Introduction to Streaming Analytics ManagerIntroduction to Streaming Analytics Manager
Introduction to Streaming Analytics ManagerYifeng Jiang
 
HDF 3.0 IoT Platform for Everyone
HDF 3.0 IoT Platform for EveryoneHDF 3.0 IoT Platform for Everyone
HDF 3.0 IoT Platform for EveryoneYifeng Jiang
 
Hortonworks Data Cloud for AWS 1.11 Updates
Hortonworks Data Cloud for AWS 1.11 UpdatesHortonworks Data Cloud for AWS 1.11 Updates
Hortonworks Data Cloud for AWS 1.11 UpdatesYifeng Jiang
 
Introduction to Hortonworks Data Cloud for AWS
Introduction to Hortonworks Data Cloud for AWSIntroduction to Hortonworks Data Cloud for AWS
Introduction to Hortonworks Data Cloud for AWSYifeng Jiang
 
Real-time Analytics in Financial
Real-time Analytics in FinancialReal-time Analytics in Financial
Real-time Analytics in FinancialYifeng Jiang
 
sparksql-hive-bench-by-nec-hwx-at-hcj16
sparksql-hive-bench-by-nec-hwx-at-hcj16sparksql-hive-bench-by-nec-hwx-at-hcj16
sparksql-hive-bench-by-nec-hwx-at-hcj16Yifeng Jiang
 
Yifeng hadoop-present-public
Yifeng hadoop-present-publicYifeng hadoop-present-public
Yifeng hadoop-present-publicYifeng Jiang
 
Hive-sub-second-sql-on-hadoop-public
Hive-sub-second-sql-on-hadoop-publicHive-sub-second-sql-on-hadoop-public
Hive-sub-second-sql-on-hadoop-publicYifeng Jiang
 
Yifeng spark-final-public
Yifeng spark-final-publicYifeng spark-final-public
Yifeng spark-final-publicYifeng Jiang
 
Hive present-and-feature-shanghai
Hive present-and-feature-shanghaiHive present-and-feature-shanghai
Hive present-and-feature-shanghaiYifeng Jiang
 
Hadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise HadoopHadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise HadoopYifeng Jiang
 
Apache Hiveの今とこれから
Apache Hiveの今とこれからApache Hiveの今とこれから
Apache Hiveの今とこれからYifeng Jiang
 
Hadoop Trends & Hadoop on EC2
Hadoop Trends & Hadoop on EC2Hadoop Trends & Hadoop on EC2
Hadoop Trends & Hadoop on EC2Yifeng Jiang
 
Apache Ambari Overview -- Hadoop for Everyone
Apache Ambari Overview -- Hadoop for EveryoneApache Ambari Overview -- Hadoop for Everyone
Apache Ambari Overview -- Hadoop for EveryoneYifeng Jiang
 

Plus de Yifeng Jiang (20)

Hive spark-s3acommitter-hbase-nfs
Hive spark-s3acommitter-hbase-nfsHive spark-s3acommitter-hbase-nfs
Hive spark-s3acommitter-hbase-nfs
 
introduction-to-apache-kafka
introduction-to-apache-kafkaintroduction-to-apache-kafka
introduction-to-apache-kafka
 
Hive2 Introduction -- Interactive SQL for Big Data
Hive2 Introduction -- Interactive SQL for Big DataHive2 Introduction -- Interactive SQL for Big Data
Hive2 Introduction -- Interactive SQL for Big Data
 
Introduction to Streaming Analytics Manager
Introduction to Streaming Analytics ManagerIntroduction to Streaming Analytics Manager
Introduction to Streaming Analytics Manager
 
HDF 3.0 IoT Platform for Everyone
HDF 3.0 IoT Platform for EveryoneHDF 3.0 IoT Platform for Everyone
HDF 3.0 IoT Platform for Everyone
 
Hortonworks Data Cloud for AWS 1.11 Updates
Hortonworks Data Cloud for AWS 1.11 UpdatesHortonworks Data Cloud for AWS 1.11 Updates
Hortonworks Data Cloud for AWS 1.11 Updates
 
Spark Security
Spark SecuritySpark Security
Spark Security
 
Introduction to Hortonworks Data Cloud for AWS
Introduction to Hortonworks Data Cloud for AWSIntroduction to Hortonworks Data Cloud for AWS
Introduction to Hortonworks Data Cloud for AWS
 
Real-time Analytics in Financial
Real-time Analytics in FinancialReal-time Analytics in Financial
Real-time Analytics in Financial
 
sparksql-hive-bench-by-nec-hwx-at-hcj16
sparksql-hive-bench-by-nec-hwx-at-hcj16sparksql-hive-bench-by-nec-hwx-at-hcj16
sparksql-hive-bench-by-nec-hwx-at-hcj16
 
Nifi workshop
Nifi workshopNifi workshop
Nifi workshop
 
Yifeng hadoop-present-public
Yifeng hadoop-present-publicYifeng hadoop-present-public
Yifeng hadoop-present-public
 
Hive-sub-second-sql-on-hadoop-public
Hive-sub-second-sql-on-hadoop-publicHive-sub-second-sql-on-hadoop-public
Hive-sub-second-sql-on-hadoop-public
 
Yifeng spark-final-public
Yifeng spark-final-publicYifeng spark-final-public
Yifeng spark-final-public
 
Hive present-and-feature-shanghai
Hive present-and-feature-shanghaiHive present-and-feature-shanghai
Hive present-and-feature-shanghai
 
Hadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise HadoopHadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise Hadoop
 
Apache Hiveの今とこれから
Apache Hiveの今とこれからApache Hiveの今とこれから
Apache Hiveの今とこれから
 
HDFS Deep Dive
HDFS Deep DiveHDFS Deep Dive
HDFS Deep Dive
 
Hadoop Trends & Hadoop on EC2
Hadoop Trends & Hadoop on EC2Hadoop Trends & Hadoop on EC2
Hadoop Trends & Hadoop on EC2
 
Apache Ambari Overview -- Hadoop for Everyone
Apache Ambari Overview -- Hadoop for EveryoneApache Ambari Overview -- Hadoop for Everyone
Apache Ambari Overview -- Hadoop for Everyone
 

Dernier

MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfFerryKemperman
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 
Best Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdfBest Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdfIdiosysTechnologies1
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....kzayra69
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfHow to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfLivetecs LLC
 

Dernier (20)

MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdf
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 
Best Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdfBest Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdf
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfHow to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdf
 

Kinesis vs-kafka-and-kafka-deep-dive

  • 1. Kinesis vs. Kafka – Kafka Deep Dive Yifeng Jiang Solutions Engineer, Hortonworks © Hortonworks Inc. 2011 – 2015. All Rights Reserved
  • 2. 自己紹介 蒋  逸峰  (Yifeng  Jiang) •  Solutions  Engineer,  Hortonworks •  HBase  book  author •  ⽇日本に来て10年年経ちました… •  趣味は⼭山登り •  Twitter:  @uprush
  • 3. About Hortonworks Customer Momentum •  556 customers (as of August 5, 2015) •  119 customers added in Q2 2015 •  Publicly traded on NASDAQ: HDP Hortonworks Data Platform •  Completely open multi-tenant platform for any app and any data •  Consistent enterprise services for security, operations, and governance Partner for Customer Success •  Leader in open-source community, focused on innovation to meet enterprise needs •  Unrivaled Hadoop support subscriptions Founded in 2011 Original 24 architects, developers, operators of Hadoop from Yahoo! 740+ E M P L O Y E E S 1350+ E C O S Y S T E M PA R T N E R S
  • 4. Hortonworks Data Plateform (HDP) Deploy on premises and cloud
  • 5. Page 5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Kinesis vs. Kafka
  • 6. Page 6 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Amazon Kinesis -- Introduction Amazon Kinesis is a fully managed, cloud-based service for real-time data processing over large, distributed data streams.
  • 7. Page 7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Apache Kafka -- Introduction Messaging systems Real-time Scalable to handle large data volume Low Latency Fault tolerant Originated at LinkedIn Aimed at solving data movement across systems Scala and Java Open Source (Apache 2.0) Adapted at many companies
  • 8. Page 8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Kinesis vs. Kafka – Future Similar Futures •  Messaging system for large scale real-time data processing •  High performance, highly scalable, low latency •  Fault tolerant Difference •  Full managed cloud service vs. OSS •  Data durability and performance trade off •  Interface •  AWS service integration vs. OSS or single platform (e.g., HDP) integration
  • 9. Page 9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Kinesis vs. Kafka – Data Durability Kinesis •  Synchronously replicates data across three facilities •  High durability for free Kafka •  Replication across servers in the same DC/AZ. Configurable min # in- sync replica and ACKs. •  Asynchronously mirror data across clusters across datacenters / AZs Performance trade off
  • 10. Page 10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Kinesis vs. Kafka – Interface Kinesis •  REST only •  Client library wraps REST API Kafka •  Low level API •  REST API available (wrapping low level API). Impact throughput and latency
  • 11. Page 11 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Kinesis vs. Kafka – Processing Kafka •  Custom consumers •  Event monitoring and alerting use case •  Strom •  Fraud detection, Simple aggregation •  Spark Streaming / Storm Trident •  Micro-batch, near real-time •  Camus •  Batch hadoop ingestion Kinesis •  KCL applications on EC2 •  Storm •  Spark streaming •  EMR for batch ingestion, e.g., write to S3
  • 12. Page 12 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Kinesis vs. Kafka – Deployment & Operation Kafka •  HDP: almost one-click deploy with Ambari •  Basic monitoring with Ambari •  Expand and rebalance: partition assignment and consumer rebalance •  Zookeeper can also be managed by Ambari Kinesis •  Fully managed, one-click deploy •  CloudWatch monitoring •  Expand and rebalance: resharding a stream •  Easy operation
  • 13. Page 13 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Kafka Deep Dive
  • 14. Page 14 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Kafka – Concepts * ZK is used by Broker, Consumer Broker-0 P0.R0 (L) P1.R0 Broker-1 P0.R1 P2.R1 (L) Broker-2 P1.R2 (L) P2.R2 Topic with 3 partition and Replica factor 2 Producer Consumer
  • 15. Page 15 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Kafka -- Concepts Topics Partitions •  Offset •  Ordered Replication •  Prevents data loss •  Never read or written to •  Does not increase throughput •  Tolerates Replica-1 failures $[ambari-­‐qa@c6401  bin]$  kafka-­‐topics.sh  -­‐-­‐zookeeper  c6401:2181  -­‐-­‐describe  -­‐-­‐topic  page_visits   Topic:page_visits  PartitionCount:4  ReplicationFactor:2  Configs:    Topic:  page_visits  Partition:  0  Leader:  1    Replicas:  0,1      Isr:  1,0    Topic:  page_visits  Partition:  1  Leader:  0    Replicas:  1,0            Isr:  0,1    Topic:  page_visits  Partition:  2  Leader:  1    Replicas:  0,1    Isr:  1,0    Topic:  page_visits  Partition:  3  Leader:  0    Replicas:  1,0    Isr:  0,1  
  • 16. Page 16 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Kafka Broker Store messages (logs) on local disk •  Messages are appended to log file •  Log Retention – time and size based Controller •  Cluster management •  Runs on each broker machine •  One leader, others follower Leader Partition •  Broker that is the leader for certain partitions Use ZK for coordination
  • 17. Page 17 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Kafka Producer New Producer API in 0.8.2 •  Kafka-client.jar •  New Java API •  Default Asynchronous mode Create a new message and publish to a Topic and Partition •  Takes topic, value and optional key and partition id
  • 18. Page 18 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Kafka Producer API (0.8.2) – Cont. •  Original messages are partitioned and then split into batches •  Each split batch is sent to leader broker (and then replicated to ISR) •  Each send is acknowledged by either leader broker and/or all ISR p3 p2 p1 p2 p1m5 m4 m3 m2 m1 Broker-0 P0.R0 (L) P1.R0 Broker-1 P0.R1 P2.R1 (L) Broker-2 P1.R2 (L) P2.R2 Topic with 3 partition and Replica factor 2 App Producer Lib partitioner Split batch
  • 19. Page 19 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Kafka Consumer Read data from Kafka brokers •  JVM APIs supported out of box by project •  Consumers pull data from brokers •  Consumer apps have to keep track of the topic-partition offset read Consumer API Simple API •  Greater control over consumption of topic/partitions •  Consumer apps will be complex as they need to handle things like offset handling. High-level •  Uses Simple API internally •  Consumer apps will be simple to implement as offset tracking is out of box •  But not flexible in terms of what partitions to read.
  • 20. Page 20 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Kafka Consumer – Cont. Consumer Groups •  Allow multiple hosts to form a group to access a topic •  Consume hosts join a group by using same group.id •  Guarantees a message is read by only one consumer in a group •  Partitions are assigned to consumers in a group •  A consumer node may get one or more partitions •  But one partition is assigned to only one consumer host •  Order of the message is guaranteed with in a partition •  Max parallelism – determined by topic partitions •  More consumers than partitions – some consumers will be idle P0 Broker-0 P3 Broker-1 P1 P2 C1 C2 Consumer Group - 1 C3 C4 Consumer Group - 2 C5 C6
  • 21. Page 21 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Kafka – Why Kafka is fast Fast Writes Writes are appends to file system Partitions improve performance and throughput Uses OS buffer cache Lots of memory on the machine helps Fast Reads Memory mapped files File descriptor to socket descriptor efficient transfer Linux sendfile(), JVM transferTo() implementation Why Performance? Disk flushes are delayed Durability is guaranteed via replication When consumers are reading the latest data, it reads from page cache
  • 22. Page 22 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Kafka – Cluster Mirroring Mirror Maker •  Mirror data across clusters even in different DCs / AZs •  Stand alone tool uses Consumer and Producer API •  Reads from one or more source cluster and writes to a target cluster •  Whitelist/blacklist topic
  • 23. Page 23 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Kafka REST Interface REST Interface •  Wraps Producer and Consumer API Performance Overhead •  Two hops •  Extra REST server to maintain •  Parse JSON payload
  • 24. Page 24 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Kinesis vs. Kafka -- Terms Amazon Kinesis Apache Kafka Streams Topics Data Records Messages Producers Producers Kinesis Producer Library Producer API Consumers Consumers Kinesis Applications Consumer Applications Kinesis Client Library Consumer – High level API N/A Consumer – Simple API Shards Partitions N/A (built in MD5 hash on partition keys) Custom partitioner Sequence Numbers Offset Application Name Consumer Group ID
  • 25. Page 25 © Hortonworks Inc. 2011 – 2015. All Rights ReservedPage 25 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Tweet: #hadooproadshow More About Apache Kafka: http://hortonworks.com/hadoop/kafka/