SlideShare une entreprise Scribd logo
1  sur  32
RealTime Messages at Scale
with Apache Kafka
Will Gardella
Product Manager
©2015 Couchbase Inc. 2
©2015 Couchbase Inc. 3
©2015 Couchbase Inc. 4
©2015 Couchbase Inc. 5
©2015 Couchbase Inc. 6
Agenda
 You might need Kafka if…
 Kafka architecture
 Background - Couchbase
 Couchbase & Kafka
 Behind the Scenes
 Demo
 An Example Producer and Consumer
What’s Apache Kafka for?
You might need Kafka if…
©2015 Couchbase Inc. 8
You might need Kafka if…
Photo Credit: Cory Doctorow
https://www.flickr.com/photos/doctorow/14638938
©2015 Couchbase Inc. 9
Different speeds for different
systems
 NoSQL
 RDBMS
 Cache
 Search
 Apps
 Metrics
 Logs
 Hadoop
 Relational Data Warehouse
Source:
Confluen
t
Typical Kafka Use Cases
©2015 Couchbase Inc. 12
Kafka Architecture
Broker 1
Consumer
Producer
Producer
Producer
Consumer
Consumer
Consumer
Kafka Cluster
Broker 2
Broker 3
©2015 Couchbase Inc. 13
Kafka Architecture
Broker 1
Consumer
Zookeeper
Producer
Producer
Producer
Consumer
Consumer
Consumer
Kafka Cluster
Broker 2
Broker 3
Topic 1 – Partition 1
Topic 2 – Partition 2
Topic 2 – Partition 1
Topic 3 – Partition 1
Topic 1 – Partition 2
Topic 3 – Partition 2
Couchbase Server 4.0
A brief Introduction
©2015 Couchbase Inc. 15
Couchbase Server 4.0 for modern applications
Combines the flexibility of JSON, the power of SQL, and the scale of
NoSQL
Develop with Agility Operate at Any Scale
 Flexible JSON data model
 Dynamic schema support
 Powerful query language that extends SQL to JSON
 Sub-millisecond latencies at scale
 Elastic scaling on commodity servers
 High availability
©2015 Couchbase Inc. 16
Couchbase Server Defined
The first NoSQL database that enables you to develop with
agility and operate at any scale.
Managed Cache Key-Value Store Document
Database
Embedded
Database
Sync Management
©2015 Couchbase Inc. 17
The Power Of The Flexible JSON Schema
Ability to store data in multiple ways
• Denormalized single document, as opposed to normalizing data across multiple table
• Dynamic Schema to add new values when needed
©2015 Couchbase Inc. 18
Couchbase and Other Big Data Systems
data scientist / engineersup to 1010 application
users
NoSQL
Database
101- 102
Kafka Hadoop
Spark
Elasticsearch
EDW
Kafka & Couchbase Use Cases
©2015 Couchbase Inc. 21
Couchbase & Kafka Use Cases
 Couchbase as the Master Database
– Changes in the bucket update data elsewhere
 Triggers / Event Handling
– Handle events like deletions / expirations
externally
– E.g. expiration & replicated session tokens
 Real-time Data Integration
– Extract from Couchbase, transform and load
data in real-time
 Real-time Data Processing
– Extract from a bucket, process in real-time and
load back to another bucket
The Couchbase Kafka Connector
How it works
©2015 Couchbase Inc. 26
Database Change Protocol (DCP)
Couchbase Server’s internal data sync mechanism since Couchbase Server 3.x
 Used for
– Intra-Cluster Replication
– Indexing
– XDCR (Cross Datacenter Replication for HA/DR)
– Some connectors, including Kafka and Spark
• Use Couchbase 2.x Java SDK JVM Core IO DCP handling library
 Sends mutations
– Mutations = creation, update, or delete of an item
– Each mutation that occurs in a vBucket has a sequence number
Important: DCP not supported for external clients!
An Example Producer and Consumer
ConnectingCouchbase via Kafka to anApplication
©2015 Couchbase Inc. 35
Kafka Generator Example
©2015 Couchbase Inc. 36
Kafka Producer Example
©2015 Couchbase Inc. 37
Kafka Producer Example
©2015 Couchbase Inc. 38
A Kafka Consumer Example
©2015 Couchbase Inc. 39
A Kafka Consumer Example
Demo
©2015 Couchbase Inc. 41
Couchbase Kafka Connector Roadmap
Available Now: 1.2 GA
 Kafka Producer or Consumer
 Stream events
 Filters
 Transform events
41
Code: https://github.com/couchbase/couchbase-kafka-connector/
Issues: https://issues.couchbase.com/projects/KAFKAC
Docs: http://developer.couchbase.com/documentation/server/4.1/connectors/kafka-1.2/kafka-intro.html
Planned
 Monthly maintenance releases
Under discussion
 Merge code for Storm connector
 Adopt Kafka Connect (Kafka 0.9)
 ???
©2015 Couchbase Inc. 42
Learn More - Couchbase Kafka Connector
Confluent’s Ewen Cheslack-Postava at Couchbase Connect 2015
 Great high level intro to Kafka in ~20 minutes
 https://youtu.be/fFPVwYKUTHs
Couchbase and Kafka - Up and Running in 10 Minutes
 Run through the sample code yourself
 http://blog.couchbase.com/2015/november/kafka-and-couchbase-up-and-running-in-10-minutes
Product docs
 http://developer.couchbase.com/documentation/server/4.1/connectors/kafka-1.2/kafka-intro.html
Avalon Consulting blog and Github repo
 http://blogs.avalonconsult.com/blog/big-data/purchase-transaction-alerting-with-couchbase-and-
kafka/
 https://github.com/Avalon-Consulting-LLC/couchbase-kafka
42
Thank you.
will.gardella@couchbase.com
Twitter: @WillGardella

Contenu connexe

Tendances

Architecture of a Kafka camus infrastructure
Architecture of a Kafka camus infrastructureArchitecture of a Kafka camus infrastructure
Architecture of a Kafka camus infrastructuremattlieber
 
How Apache Kafka is transforming Hadoop, Spark and Storm
How Apache Kafka is transforming Hadoop, Spark and StormHow Apache Kafka is transforming Hadoop, Spark and Storm
How Apache Kafka is transforming Hadoop, Spark and StormEdureka!
 
PostgreSQL + Kafka: The Delight of Change Data Capture
PostgreSQL + Kafka: The Delight of Change Data CapturePostgreSQL + Kafka: The Delight of Change Data Capture
PostgreSQL + Kafka: The Delight of Change Data CaptureJeff Klukas
 
Introduction to Apache Kafka- Part 1
Introduction to Apache Kafka- Part 1Introduction to Apache Kafka- Part 1
Introduction to Apache Kafka- Part 1Knoldus Inc.
 
Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017Gwen (Chen) Shapira
 
Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache KafkaChhavi Parasher
 
Fundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache KafkaFundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache KafkaAngelo Cesaro
 
Kafka connect-london-meetup-2016
Kafka connect-london-meetup-2016Kafka connect-london-meetup-2016
Kafka connect-london-meetup-2016Gwen (Chen) Shapira
 
Introduction to Kafka and Zookeeper
Introduction to Kafka and ZookeeperIntroduction to Kafka and Zookeeper
Introduction to Kafka and ZookeeperRahul Jain
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache KafkaShiao-An Yuan
 
Introduction Apache Kafka
Introduction Apache KafkaIntroduction Apache Kafka
Introduction Apache KafkaJoe Stein
 
What's inside the black box? Using ML to tune and manage Kafka. (Matthew Stum...
What's inside the black box? Using ML to tune and manage Kafka. (Matthew Stum...What's inside the black box? Using ML to tune and manage Kafka. (Matthew Stum...
What's inside the black box? Using ML to tune and manage Kafka. (Matthew Stum...confluent
 
Building Realtim Data Pipelines with Kafka Connect and Spark Streaming
Building Realtim Data Pipelines with Kafka Connect and Spark StreamingBuilding Realtim Data Pipelines with Kafka Connect and Spark Streaming
Building Realtim Data Pipelines with Kafka Connect and Spark StreamingGuozhang Wang
 
Building Event-Driven Systems with Apache Kafka
Building Event-Driven Systems with Apache KafkaBuilding Event-Driven Systems with Apache Kafka
Building Event-Driven Systems with Apache KafkaBrian Ritchie
 
Papers we love realtime at facebook
Papers we love   realtime at facebookPapers we love   realtime at facebook
Papers we love realtime at facebookGwen (Chen) Shapira
 
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...DataWorks Summit/Hadoop Summit
 
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the FieldKafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Fieldconfluent
 

Tendances (20)

Architecture of a Kafka camus infrastructure
Architecture of a Kafka camus infrastructureArchitecture of a Kafka camus infrastructure
Architecture of a Kafka camus infrastructure
 
How Apache Kafka is transforming Hadoop, Spark and Storm
How Apache Kafka is transforming Hadoop, Spark and StormHow Apache Kafka is transforming Hadoop, Spark and Storm
How Apache Kafka is transforming Hadoop, Spark and Storm
 
PostgreSQL + Kafka: The Delight of Change Data Capture
PostgreSQL + Kafka: The Delight of Change Data CapturePostgreSQL + Kafka: The Delight of Change Data Capture
PostgreSQL + Kafka: The Delight of Change Data Capture
 
Introduction to Apache Kafka- Part 1
Introduction to Apache Kafka- Part 1Introduction to Apache Kafka- Part 1
Introduction to Apache Kafka- Part 1
 
Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache Kafka
 
Fundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache KafkaFundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache Kafka
 
Kafka connect-london-meetup-2016
Kafka connect-london-meetup-2016Kafka connect-london-meetup-2016
Kafka connect-london-meetup-2016
 
Introduction to Kafka and Zookeeper
Introduction to Kafka and ZookeeperIntroduction to Kafka and Zookeeper
Introduction to Kafka and Zookeeper
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Introduction Apache Kafka
Introduction Apache KafkaIntroduction Apache Kafka
Introduction Apache Kafka
 
What's inside the black box? Using ML to tune and manage Kafka. (Matthew Stum...
What's inside the black box? Using ML to tune and manage Kafka. (Matthew Stum...What's inside the black box? Using ML to tune and manage Kafka. (Matthew Stum...
What's inside the black box? Using ML to tune and manage Kafka. (Matthew Stum...
 
Kafka blr-meetup-presentation - Kafka internals
Kafka blr-meetup-presentation - Kafka internalsKafka blr-meetup-presentation - Kafka internals
Kafka blr-meetup-presentation - Kafka internals
 
Building Realtim Data Pipelines with Kafka Connect and Spark Streaming
Building Realtim Data Pipelines with Kafka Connect and Spark StreamingBuilding Realtim Data Pipelines with Kafka Connect and Spark Streaming
Building Realtim Data Pipelines with Kafka Connect and Spark Streaming
 
Building Event-Driven Systems with Apache Kafka
Building Event-Driven Systems with Apache KafkaBuilding Event-Driven Systems with Apache Kafka
Building Event-Driven Systems with Apache Kafka
 
Papers we love realtime at facebook
Papers we love   realtime at facebookPapers we love   realtime at facebook
Papers we love realtime at facebook
 
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
 
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the FieldKafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
 

Similaire à Real time Messages at Scale with Apache Kafka and Couchbase

Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...Data Con LA
 
Building streaming data applications using Kafka*[Connect + Core + Streams] b...
Building streaming data applications using Kafka*[Connect + Core + Streams] b...Building streaming data applications using Kafka*[Connect + Core + Streams] b...
Building streaming data applications using Kafka*[Connect + Core + Streams] b...Data Con LA
 
Kafka & Couchbase Integration Patterns
Kafka & Couchbase Integration PatternsKafka & Couchbase Integration Patterns
Kafka & Couchbase Integration PatternsManuel Hurtado
 
Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...
Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...
Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...Denodo
 
Building Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache KafkaBuilding Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache KafkaSlim Baltagi
 
Leveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern AnalyticsLeveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern Analyticsconfluent
 
Couchbase and Apache Spark
Couchbase and Apache SparkCouchbase and Apache Spark
Couchbase and Apache SparkMatt Ingenthron
 
Confluent Operator as Cloud-Native Kafka Operator for Kubernetes
Confluent Operator as Cloud-Native Kafka Operator for KubernetesConfluent Operator as Cloud-Native Kafka Operator for Kubernetes
Confluent Operator as Cloud-Native Kafka Operator for KubernetesKai Wähner
 
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...Athens Big Data
 
Data Engineer's Lunch #86: Building Real-Time Applications at Scale: A Case S...
Data Engineer's Lunch #86: Building Real-Time Applications at Scale: A Case S...Data Engineer's Lunch #86: Building Real-Time Applications at Scale: A Case S...
Data Engineer's Lunch #86: Building Real-Time Applications at Scale: A Case S...Anant Corporation
 
GSJUG: Mastering Data Streaming Pipelines 09May2023
GSJUG: Mastering Data Streaming Pipelines 09May2023GSJUG: Mastering Data Streaming Pipelines 09May2023
GSJUG: Mastering Data Streaming Pipelines 09May2023Timothy Spann
 
The Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and StreamingThe Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and StreamingTimothy Spann
 
Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!Guido Schmutz
 
Current and Future of Apache Kafka
Current and Future of Apache KafkaCurrent and Future of Apache Kafka
Current and Future of Apache KafkaJoe Stein
 
Confluent Kafka and KSQL: Streaming Data Pipelines Made Easy
Confluent Kafka and KSQL: Streaming Data Pipelines Made EasyConfluent Kafka and KSQL: Streaming Data Pipelines Made Easy
Confluent Kafka and KSQL: Streaming Data Pipelines Made EasyKairo Tavares
 
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...Timothy Spann
 
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Being Ready for Apache Kafka - Apache: Big Data Europe 2015Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Being Ready for Apache Kafka - Apache: Big Data Europe 2015Michael Noll
 
Mule soft meetup_chandigarh_#7_25_sept_2021
Mule soft meetup_chandigarh_#7_25_sept_2021Mule soft meetup_chandigarh_#7_25_sept_2021
Mule soft meetup_chandigarh_#7_25_sept_2021Lalit Panwar
 

Similaire à Real time Messages at Scale with Apache Kafka and Couchbase (20)

Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
 
Kafka Explainaton
Kafka ExplainatonKafka Explainaton
Kafka Explainaton
 
Couchbase Day
Couchbase DayCouchbase Day
Couchbase Day
 
Building streaming data applications using Kafka*[Connect + Core + Streams] b...
Building streaming data applications using Kafka*[Connect + Core + Streams] b...Building streaming data applications using Kafka*[Connect + Core + Streams] b...
Building streaming data applications using Kafka*[Connect + Core + Streams] b...
 
Kafka & Couchbase Integration Patterns
Kafka & Couchbase Integration PatternsKafka & Couchbase Integration Patterns
Kafka & Couchbase Integration Patterns
 
Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...
Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...
Unlocking the Power of Apache Kafka: How Kafka Listeners Facilitate Real-time...
 
Building Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache KafkaBuilding Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache Kafka
 
Leveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern AnalyticsLeveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern Analytics
 
Couchbase and Apache Spark
Couchbase and Apache SparkCouchbase and Apache Spark
Couchbase and Apache Spark
 
Confluent Operator as Cloud-Native Kafka Operator for Kubernetes
Confluent Operator as Cloud-Native Kafka Operator for KubernetesConfluent Operator as Cloud-Native Kafka Operator for Kubernetes
Confluent Operator as Cloud-Native Kafka Operator for Kubernetes
 
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
 
Data Engineer's Lunch #86: Building Real-Time Applications at Scale: A Case S...
Data Engineer's Lunch #86: Building Real-Time Applications at Scale: A Case S...Data Engineer's Lunch #86: Building Real-Time Applications at Scale: A Case S...
Data Engineer's Lunch #86: Building Real-Time Applications at Scale: A Case S...
 
GSJUG: Mastering Data Streaming Pipelines 09May2023
GSJUG: Mastering Data Streaming Pipelines 09May2023GSJUG: Mastering Data Streaming Pipelines 09May2023
GSJUG: Mastering Data Streaming Pipelines 09May2023
 
The Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and StreamingThe Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and Streaming
 
Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!
 
Current and Future of Apache Kafka
Current and Future of Apache KafkaCurrent and Future of Apache Kafka
Current and Future of Apache Kafka
 
Confluent Kafka and KSQL: Streaming Data Pipelines Made Easy
Confluent Kafka and KSQL: Streaming Data Pipelines Made EasyConfluent Kafka and KSQL: Streaming Data Pipelines Made Easy
Confluent Kafka and KSQL: Streaming Data Pipelines Made Easy
 
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
 
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Being Ready for Apache Kafka - Apache: Big Data Europe 2015Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
 
Mule soft meetup_chandigarh_#7_25_sept_2021
Mule soft meetup_chandigarh_#7_25_sept_2021Mule soft meetup_chandigarh_#7_25_sept_2021
Mule soft meetup_chandigarh_#7_25_sept_2021
 

Dernier

What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZABSYZ Inc
 
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingOpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingShane Coughlan
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slidesvaideheekore1
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 
Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogueitservices996
 
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...OnePlan Solutions
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
 
SoftTeco - Software Development Company Profile
SoftTeco - Software Development Company ProfileSoftTeco - Software Development Company Profile
SoftTeco - Software Development Company Profileakrivarotava
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsJean Silva
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Rob Geurden
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecturerahul_net
 
Patterns for automating API delivery. API conference
Patterns for automating API delivery. API conferencePatterns for automating API delivery. API conference
Patterns for automating API delivery. API conferencessuser9e7c64
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingShane Coughlan
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 

Dernier (20)

What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZ
 
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingOpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slides
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 
Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogue
 
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
 
SoftTeco - Software Development Company Profile
SoftTeco - Software Development Company ProfileSoftTeco - Software Development Company Profile
SoftTeco - Software Development Company Profile
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero results
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecture
 
Patterns for automating API delivery. API conference
Patterns for automating API delivery. API conferencePatterns for automating API delivery. API conference
Patterns for automating API delivery. API conference
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 

Real time Messages at Scale with Apache Kafka and Couchbase

  • 1. RealTime Messages at Scale with Apache Kafka Will Gardella Product Manager
  • 6. ©2015 Couchbase Inc. 6 Agenda  You might need Kafka if…  Kafka architecture  Background - Couchbase  Couchbase & Kafka  Behind the Scenes  Demo  An Example Producer and Consumer
  • 7. What’s Apache Kafka for? You might need Kafka if…
  • 8. ©2015 Couchbase Inc. 8 You might need Kafka if… Photo Credit: Cory Doctorow https://www.flickr.com/photos/doctorow/14638938
  • 9. ©2015 Couchbase Inc. 9 Different speeds for different systems  NoSQL  RDBMS  Cache  Search  Apps  Metrics  Logs  Hadoop  Relational Data Warehouse
  • 12. ©2015 Couchbase Inc. 12 Kafka Architecture Broker 1 Consumer Producer Producer Producer Consumer Consumer Consumer Kafka Cluster Broker 2 Broker 3
  • 13. ©2015 Couchbase Inc. 13 Kafka Architecture Broker 1 Consumer Zookeeper Producer Producer Producer Consumer Consumer Consumer Kafka Cluster Broker 2 Broker 3 Topic 1 – Partition 1 Topic 2 – Partition 2 Topic 2 – Partition 1 Topic 3 – Partition 1 Topic 1 – Partition 2 Topic 3 – Partition 2
  • 14. Couchbase Server 4.0 A brief Introduction
  • 15. ©2015 Couchbase Inc. 15 Couchbase Server 4.0 for modern applications Combines the flexibility of JSON, the power of SQL, and the scale of NoSQL Develop with Agility Operate at Any Scale  Flexible JSON data model  Dynamic schema support  Powerful query language that extends SQL to JSON  Sub-millisecond latencies at scale  Elastic scaling on commodity servers  High availability
  • 16. ©2015 Couchbase Inc. 16 Couchbase Server Defined The first NoSQL database that enables you to develop with agility and operate at any scale. Managed Cache Key-Value Store Document Database Embedded Database Sync Management
  • 17. ©2015 Couchbase Inc. 17 The Power Of The Flexible JSON Schema Ability to store data in multiple ways • Denormalized single document, as opposed to normalizing data across multiple table • Dynamic Schema to add new values when needed
  • 18. ©2015 Couchbase Inc. 18 Couchbase and Other Big Data Systems data scientist / engineersup to 1010 application users NoSQL Database 101- 102 Kafka Hadoop Spark Elasticsearch EDW
  • 19. Kafka & Couchbase Use Cases
  • 20. ©2015 Couchbase Inc. 21 Couchbase & Kafka Use Cases  Couchbase as the Master Database – Changes in the bucket update data elsewhere  Triggers / Event Handling – Handle events like deletions / expirations externally – E.g. expiration & replicated session tokens  Real-time Data Integration – Extract from Couchbase, transform and load data in real-time  Real-time Data Processing – Extract from a bucket, process in real-time and load back to another bucket
  • 21. The Couchbase Kafka Connector How it works
  • 22. ©2015 Couchbase Inc. 26 Database Change Protocol (DCP) Couchbase Server’s internal data sync mechanism since Couchbase Server 3.x  Used for – Intra-Cluster Replication – Indexing – XDCR (Cross Datacenter Replication for HA/DR) – Some connectors, including Kafka and Spark • Use Couchbase 2.x Java SDK JVM Core IO DCP handling library  Sends mutations – Mutations = creation, update, or delete of an item – Each mutation that occurs in a vBucket has a sequence number Important: DCP not supported for external clients!
  • 23. An Example Producer and Consumer ConnectingCouchbase via Kafka to anApplication
  • 24. ©2015 Couchbase Inc. 35 Kafka Generator Example
  • 25. ©2015 Couchbase Inc. 36 Kafka Producer Example
  • 26. ©2015 Couchbase Inc. 37 Kafka Producer Example
  • 27. ©2015 Couchbase Inc. 38 A Kafka Consumer Example
  • 28. ©2015 Couchbase Inc. 39 A Kafka Consumer Example
  • 29. Demo
  • 30. ©2015 Couchbase Inc. 41 Couchbase Kafka Connector Roadmap Available Now: 1.2 GA  Kafka Producer or Consumer  Stream events  Filters  Transform events 41 Code: https://github.com/couchbase/couchbase-kafka-connector/ Issues: https://issues.couchbase.com/projects/KAFKAC Docs: http://developer.couchbase.com/documentation/server/4.1/connectors/kafka-1.2/kafka-intro.html Planned  Monthly maintenance releases Under discussion  Merge code for Storm connector  Adopt Kafka Connect (Kafka 0.9)  ???
  • 31. ©2015 Couchbase Inc. 42 Learn More - Couchbase Kafka Connector Confluent’s Ewen Cheslack-Postava at Couchbase Connect 2015  Great high level intro to Kafka in ~20 minutes  https://youtu.be/fFPVwYKUTHs Couchbase and Kafka - Up and Running in 10 Minutes  Run through the sample code yourself  http://blog.couchbase.com/2015/november/kafka-and-couchbase-up-and-running-in-10-minutes Product docs  http://developer.couchbase.com/documentation/server/4.1/connectors/kafka-1.2/kafka-intro.html Avalon Consulting blog and Github repo  http://blogs.avalonconsult.com/blog/big-data/purchase-transaction-alerting-with-couchbase-and- kafka/  https://github.com/Avalon-Consulting-LLC/couchbase-kafka 42

Notes de l'éditeur

  1. Don’t forget to intro
  2. If you tweet about Kafka, you may be followed by this weird Franz Kafka bot…
  3. … but the real Franz Kafka died of tuberculosis in 1924…
  4. Interestingly, this Franz Kafka bot sounds quite a bit like the real Franz Kafka. Not overly cheery… I thought perhaps it was real quotes from just quotes from his work
  5. But no – it just sounds like it could be him…
  6. It helps you decouple systems in time – systems can be asynchronous, but this is more than that Consuming systems don’t have to be on or even exist at the time that producers are making messages Schema registry that describes what is known about data produced in different systems Publish / subscribe system Hooking up application server logs, caches, databases, and so forth You don’t want each system to have to be matched or hand integrated with every other system and service, with different adapter codes, different error handling behavior, logging, etc. And how do share metadata? What do you do with different revisions of That’s madness Imagine trying to add a new service that needs to read from 10 other services…. Organizationally that’s difficult, and every team has the potential to make different decisions about what systems to use…
  7. Kafka helps mitigate different expectations of speed and size of data being ingested in various systems Hadoop – HDFS can take tons of data, but not in tiny pieces – it’s a batch oriented system NoSQL databases like Couchbase can scale to billions of users with sub millisecond response times but not with bulk load Compare application server logs, a bulk database extraction, processing a stream of Twitter messages There can be issues with integrations where the slowest
  8. Vision is to have scalable, low latency pub sub message queue as standard interface for realtime streaming data Hadoop, HDFS specifically, fills this role for batch systems and led to a large ecosystem of useful tools that can interoperate via Hadoop data storage Kafka does the same for realtime data, and can scale to handle your entire organizations data. Kafka acts as the hub and applications hang off of it, exchanging data through Kafka We refer to this architecture as a stream data platform. Reminder: On this slide – you need to talk about the differences between Couchbase and Hadoop – they are complementary, they solve different problems Messaging: Decouple data processing from data producers Log Aggregation: A log as stream of messages Stream Processing: Consume data from one topic and put the filtered/transformed data into another one Click Stream Analysis: Page views/searches as real-time publish-subscribe feeds
  9. Publish Subscribe Broker Stores messages Failover: Leader vs. Follower Load balanced Producer Publish data/messages to the topic Consumer Applications/processes/threads those are subscribed to the topic Can be grouped (consumer groups) in order to process messages in parallel Multiple consumer instances can load balance reading the partitions of a topic Consumer groups are elastic and fault tolerant
  10. Topic Distributed and partitioned message queue Topics are partitioned so they can scale across multiple servers Partitions are also replicated for fault tolerance This is what the producers actually write to and what the consumers actually read Scales the Kafka brokers High performance: Log Kafka operates only on logs – they are always append only logs, and messages are always read sequentially Do not track the per message read state – you don’t need to because access is sequential Retention based on policy – either time based or size based Not keeping per message state Multiple consumers reading from the same log means that multiple consumers can do what they need to do (they know where they left off, Kafka doesn’t need to). This is like DCP in Couchbase Consumer Applications/processes/threads those are subscribed to the topic Can be grouped (consumer groups) in order to process messages in parallel Multiple consumer instances can load balance reading the partitions of a topic Consumer groups are elastic and fault tolerant Zookeeper – Distributed Synchronization and configuration store – Needed to partition topics and to support consumer groups (where multiple consumers work together to process (ingest) a Kafka Topic in parallel
  11. Multiple data models N1QL - SQL-Like query language Multiple indexes SDKs, ODBC / JDBC drivers and frameworks Push-button scalability Consistent high-performance Always on 24x7 with HA - DR Easy Administration with Web UI, Rest API and CLI
  12. KEY POINT: COUCHBASE HAS YOU COVERED FOR YOUR GENERAL PURPOSE DB NEEDS. FROM CACHING TO KV STORE, TO JSON DOCUMENT STORE, TO MOBILE APPS. NO OTHER NOSQL DB VENDOR HAS THIS BREADTH AND DEPTH OF TECHNOLOGY The purpose of this slide is to discuss the high level concepts of Couchbase, and if the SE wants to discuss what parts of Couchbase make up each concept. It is not to go over specific technologies like N1QL, ODBC, etc
  13. KEY POINT: YOU HAVE THE OPTION TO REPRESENT DATA QUITE DIFFERENTLY USING JSON AS OPPOSED TO A RELATIONAL DATABASE. - Where in relational databases you might have to have multiple tables to best represent your data, in JSON you can model your data like an object might already be in your programming language of choice. No ORM (Object Relational Model) needed. You can do relationships in Couchbase, but they are different than in a relational database and outside of the scope of an intro call normally. Make sure to stress that normalization is still something that can be done in Couchbase where it makes sense for the application, but this diagram is something that helps people coming from relational understand what is possible for JSON.
  14. Work people do in these systems - Training ML models ETL / Data wrangling Aggregations Reporting / BI Kafka is a data multiplexer – some people are still going to want to do this, but it’s designed for higher latency applications with a known high complexity (e.g. ebay – many different consumers for information) Traditional data warehouse – definitely will be a different programming language – how do you make sense of the data feed? You get into the problems that making changes on one side introduces tons of complexity on the other Downsides – maturity is not 100% on the Spark side, still in active development in the Couchbase side KV / N1QL
  15. KEY POINT: ENTERPRISES ARE USING COUCHBASE ACROSS A RANGE OF MISSION CRITICAL USE CASES. As the slide shows, Couchbase supports a wide range of use cases, from Profile Management to HA Cache. Each use case has its own set of requirements – some need very high performance, some need very high availability, some need flexibility of the data model. The ability to meet all of these requirements is what has driven adoption of Couchbase by large enterprise companies You should memorize a few things about a customer use per case so you can quickly go through these. What you want is a sound bite per use case.
  16. 1. All your data is managed in Couchbase and the other systems record these changes – for example, a users purchase might be logg 2. A user’s web session is being stored in a Couchbase bucket and you want to react on it – for example – delete the session in another system like people do in Single Sign On Couchbase can handle 100,000’s of operations per second 3. Real Time data integration For example, you want to do a quick check on purchases to see if there’s anything suspicious about them – that may be done in another system 4. In this case, it’s important to note that Couchbase can be a Kafka Consumer or a Kafka Producer, so doing tasks like ML – flow data out, train models and flow data back into Couchbase. This is similar to number 3, but the difference is you’re loading something back into Couchbase so that users can quickly interact with it. You may have systems that build recommendations but then flow those back into Couchbase so that the next visitors get a slightly better mix of product offers Write data to a topic, process it with a framework and load it back into another separate bucket to serve users
  17. Skip if short of time – don’t need to cover anything besides DCP Punchline is, this mechanism allows Couchbase to scale elastically and without downtime while still enabling any client to find exactly where the active copy of a piece of data is (using the cluster map) Multiple buckets can exist within a single cluster of nodes (1, 2 or 3 extra copies) Each data set has 1024 Virtual Buckets (vBuckets) Each vBucket contains 1/1024th portion of the data set vBuckets do not have a fixed physical server location
  18. Add lots of notes
  19. Add lots of notes
  20. What can possibly go wrong if you write your own connector with DCP? A lot – First of all, you need to be able to drink from the firehose. Couchbase 100K’s of messages per second – the Kafka brokers sit there and soak up those messages and can write them out the other end at whatever speed your consuming systems are capable of DCP is written for memory to memory type replication – if you’re writing to a system that can’t keep up, the client needs to do some fancy footwork to make everything come out ok
  21. What can possibly go wrong? A lot – First of all, you need to be able to drink from the firehose. Couchbase 100K’s of messages per second – the Kafka brokers sit there and soak up those messages and can write them out the other end at whatever speed your consuming systems are capable of DCP is written for memory to memory type replication – if you’re writing to a system that can’t keep up, the client needs to do some fancy footwork to make everything come out ok
  22. This just does the work of creating some messages – for demo purposes, I can type things in here and see them show up as documents in Couchbase The keys are random – and limited to 10 DCP is a way of doing mutations, so sometimes we are going to overwrite existing docs and sometimes we will end up making new docs Overwrites are captured as sequence numbers (when combined with a document key, you have version information) similar to the offset in Kafka
  23. Producer is going to grab the docs and send them to Kafka
  24. This filter that we’re using prints out the dcpEvent to the console so we can read it but otherwise does no filtering You can add logic to the filter to mark events false, in which case they won’t be written to Kafka
  25. Finally, this is attaching to my Kafka vagrant image on port 9092 and subscribing to the topic default, partition 0 (we only have one partition)
  26. Fully transparent cluster and bucket management, including direct access if needed