Streaming Data with Apache Kafka & MongoDB

Data Streaming with
Apache Kafka &
MongoDB
AndrewMorgan–MongoDBProduct
Marketing
DavidTucker–Director,PartnerEngineering
andAlliancesatConfluent
13th September2016

Agenda
Target Audience
Apache Kafka
MongoDB
Integrating MongoDB and Kafka
Kafka – What’s Next
Next Steps

Apache Kafka /
Confluent Platform

What does Kafka
do?
Producers
Consumers
Kafka Connect
Kafka Connect
Topic
Your interfaces to the world
Connected to your systems in real time

What is Streaming Data
Synchronous Req/Response
0 – 100s ms
Near Real Time
> 100s ms
Offline Batch
> 1 hour
KAFKA
Stream Data Platform
Search
RDBMS
Apps Monitoring
Real-time AnalyticsNoSQL Stream Processing
HADOOP
Data Lake
Impala
DWH
Hive
Spark Map-Reduce

Confluent’s Offerings
Core
Connect
Streams
Java Client
Kafka
Confluent Platform EnterpriseConfluent Platform
Stream MonitoringMore Clients
Message DeliveryREST Proxy
Stream MonitoringSchema Registry
Connector ManagementPre-Built Connectors

Confluent Platform: It’s Kafka ++
Feature Benefit Apache Kafka Confluent Platform
Confluent Platform
Enterprise
Apache Kafka
High throughput, low latency, high availability, secure distributed message
system
Kafka Connect
Advanced framework for connecting external sources/destinations into
Kafka
Java Client Provides easy integration into Java applications
Kafka Streams
Simple library that enables streaming application development within the
Kafka framework
Additional Clients Supports non-Java clients; C, C++, Python, etc.
REST Proxy
Provides universal access to Kafka from any network connected device via
HTTP
Schema Registry
Central registry for the format of Kafka data – guarantees all data is always
consumable
Pre-Built Connectors
HDFS, JDBC and other connectors fully Certified
and fully supported by Confluent
Confluent Control Center Includes Connector Management and Stream Monitoring
Support
Enterprise class support to keep your Kafka environment running at top
performance
Community Community 24x7x365
Free Free Subscription

Common Kafka Use Cases
Data transport and integration
• Log data
• Database changes
• Sensors and device data
• Monitoring streams
• Call data records
• Stock ticker data
Real-time stream processing
• Monitoring
• Asynchronous applications
• Fraud and security

People Using Kafka Today
Financial Services
Entertainment & Media
Consumer Tech
Travel & Leisure
Enterprise Tech
Telecom Retail

Relational
Expressive Query Language
& Secondary Indexes
Strong Consistency
Enterprise Management
& Integrations

The World Has Changed
Data Risk Time Cost

NoSQL
Scalability
& Performance
Always On,
Global Deployments
FlexibilityExpressive Query Language
& Secondary Indexes
Strong Consistency
& Integrations

Nexus Architecture
Scalability
& Performance
Always On,
Global Deployments
FlexibilityExpressive Query Language
& Secondary Indexes
Strong Consistency
& Integrations

Where MongoDB Fits
Prod
3
2
4
123
...
Topic A
Prod
9
6
7
123
...
Topic B
Filter
Filter
Merge
5
3
4
123
...
Topic C
Analyze
4
9
6
123
...
Topic D
Take
Action
Take
Action

Where MongoDB Fits
Prod
3
2
4
123
...
Topic A
Prod
9
6
7
123
...
Topic B
Filter
Filter
Merge
5
3
4
123
...
Topic C
Analyze
4
9
6
123
...
Topic D
Take
Action
Store
Results
Operational Database

Where MongoDB Fits
Prod
3
2
4
123
...
Topic A
Prod
9
6
7
123
...
Topic B
Filter
Filter
Merge
5
3
4
123
...
Topic C
Analyze
4
9
6
123
...
Topic D
Take
Action
Store
Results
Key
Events

Where MongoDB Fits
Prod
3
2
4
123
...
Topic A
Prod
9
6
7
123
...
Topic B
Filter
Filter
Merge
5
3
4
123
...
Topic C
Analyze
4
9
6
123
...
Topic D
Take
Action
Store
Results
Key
Events
Reference Data

Where K-Streams Fits
Prod
3
2
4
123
...
Topic A
Prod
9
6
7
123
...
Topic B
5
3
4
123
...
Topic C
Analyze
4
9
6
123
...
Topic D
Take
Action
Store
Results
Key
Events
Reference Data
Kafka
Streams

MessageQueue
Customer Data Mgmt Mobile App IoT App Live Dashboards
Raw Data
Processed
Events
Distributed
Processing
Frameworks
Millisecond latency. Expressive querying & flexible indexing against subsets
of data. Updates-in place. In-database aggregations & transformations
Multi-minute latency with scans across TB/PB of data. No indexes. Data
stored in 128MB blocks. Write-once-read-many & append-only storage model
Sensors
User Data
Clickstreams
Logs
Churn
Analysis
Enriched
Customer
Profiles
Risk
Modeling
Predictive
Analytics
Real-Time Access
Batch Processing, Batch Views
Design Pattern: Operationalized Data Lake
Kafka
Streams

MessageQueue
Raw Data
Processed
Events
Sensors
User Data
Clickstreams
Logs
Churn
Analysis
Enriched
Customer
Profiles
Risk
Modeling
Predictive
Analytics
Real-Time Access
Configure where to
land incoming data
Distributed
Processing
Frameworks
Kafka
Streams

MessageQueue
Raw Data
Processed
Events
Sensors
User Data
Clickstreams
Logs
Churn
Analysis
Enriched
Customer
Profiles
Risk
Modeling
Predictive
Analytics
Real-Time Access
Raw data processed to
generate analytics models
Distributed
Processing
Frameworks
Kafka
Streams

MessageQueue
Raw Data
Processed
Events
Sensors
User Data
Clickstreams
Logs
Churn
Analysis
Enriched
Customer
Profiles
Risk
Modeling
Predictive
Analytics
Real-Time Access
MongoDB exposes
analytics models to
operational apps.
Handles real time
updates
Distributed
Processing
Frameworks
Kafka
Streams

MessageQueue
Raw Data
Processed
Events
Sensors
User Data
Clickstreams
Logs
Churn
Analysis
Enriched
Customer
Profiles
Risk
Modeling
Predictive
Analytics
Real-Time Access
Compute new
models against
MongoDB &
HDFS
Distributed
Processing
Frameworks
Kafka
Streams

https://www.mongodb.c
om/presentations/repla
cing-traditional-
technologies-mongodb-
single-platform-all-
financial-data-ahl

http://www.slideshare.n
et/danharvey/change-
data-capture-with-
mongodb-and-kafka

Kafka Connectors
• Confluent-supported connectors (included in CP)
• Community-written connectors (just a sampling)
JDBC

Kafka Futures
• Apache Core
• Admin API (KIP-4)
• Exactly-once delivery semantics
• Time-based topic indexing
• Kafka Streams
• Exactly-once processing semantics
• Interactive Queries: enable real-time sharing of application state with
other applications
• Confluent Platform Enterprise
• Multi-cluster views and alerting added to Control Center

MongoDB Atlas
Database as a service for MongoDB
MongoDB Atlas is…
• Automated: The easiest way to build, launch, and scale apps on MongoDB
• Flexible: The only database as a service with all you need for modern applications
• Secured: Multiple levels of security available to give you peace of mind
• Scalable: Deliver massive scalability with zero downtime as you grow
• Highly available: Your deployments are fault-tolerant and self-healing by default
• High performance: The performance you need for your most demanding workloads

MongoDB Atlas Features
• Spin up a cluster in
seconds
• Replicated & always-
on deployments
• Fully elastic: scale
out or up in a few
clicks with zero
downtime
• Automatic patches &
simplified upgrades
for the newest
MongoDB features
• Authenticated &
encrypted
• Continuous backup
with point-in-time
recovery
• Fine-grained
monitoring &
custom alerts
Safe & SecureRun for You
• On-demand pricing
model; billed by the
hour
• Multi-cloud support
(AWS available with
others coming
soon)
• Part of a suite of
products & services
designed for all
phases of your app;
migrate easily to
different
environments
(private cloud, on-
prem, etc) when
needed
No Lock-In
Database as a service for MongoDB

MongoDB Enterprise Advanced
• MongoDB Ops
Manager or
MongoDB Cloud
Manager Premium
• MongoDB Compass
• MongoDB
Connector for BI
• Encrypted Storage
Engine
• LDAP / Kerberos
Integration
• DDL & DML
Auditing
• FIPS 140-2 Support
SecurityTooling
• 24 x 7 Support
• 1 hr SLA
• Emergency
Patches
• Customer Success
Program
• On-Demand
Training
Support License
• Commercial
License

Resources
• Data Streaming with Apache Kafka & MongoDB
• https://www.mongodb.com/collateral/data-streaming-with-apache-
kafka-and-mongodb
• Implementing a Kafka Consumer for MongoDB
• https://www.mongodb.com/blog/post/mongodb-and-data-streaming-
implementing-a-mongodb-kafka-consumer
• Tailing the Oplog on a sharded MongoDB Cluster
• https://www.mongodb.com/blog/post/tailing-mongodb-oplog-sharded-
clusters

Old Billingsgate, London
15th November
mongodb.com/europe
Use my discount code for 20% off: andrewmorgan20

Document Data Model
Relational MongoDB
{ customer_id : 1,
first_name : "Mark",
last_name : "Smith",
city : "San Francisco",
phones: [
{
number : “1-212-777-1212”,
dnc : true,
type : “home”
},
number : “1-212-777-1213”,
type : “cell”
}]
}
Customer ID First Name Last Name City
0 John Doe New York
1 Mark Smith San Francisco
2 Jay Black Newark
3 Meagan White London
4 Edward Daniels Boston
Phone Number Type DNC Customer ID
1-212-555-1212 home T 0
1-212-555-1213 home T 0
1-212-555-1214 cell F 0
1-212-777-1212 home T 1
1-212-777-1213 cell (null) 1
1-212-888-1212 home F 2

Document Model Benefits
{
customer_id : 1,
phones: [
{
number : “1-212-777-1212”,
dnc : true,
type : “home”
},
number : “1-212-777-1213”,
type : “cell”
}]
}
Agility and flexibility
Data model supports business change
Rapidly iterate to meet new requirements
Intuitive, natural data representation
Eliminates ORM layer
Developers are more productive
Reduces the need for joins, disk seeks
Programming is more simple
Performance delivered at scale

Rich Functionality
MongoDB
Expressive Queries
• Find anyone with phone # “1-212…”
• Check if the person with number “555…” is on the “do not
call” list
Geospatial
• Find the best offer for the customer at geo coordinates of 42nd
St. and 6th Ave
Text Search • Find all tweets that mention the firm within the last 2 days
Aggregation • Count and sort number of customers by city
Native Binary
JSON support
• Add an additional phone number to Mark Smith’s without
rewriting the document
• Select just the mobile phone number in the list
• Sort on the modified date
{ customer_id : 1,
phones: [ {
number : “1-212-777-1212”,
dnc : true,
type : “home”
},
{
number : “1-212-777-1213”,
type : “cell”
}]
}
Left outer join
($lookup)
• Query for all San Francisco residences, lookup their
transactions, and sum the amount by person

MongoDB Technical Capabilities
Application
Driver
Mongos
Primary
Secondary
Secondary
Shard 1
Primary
Secondary
Secondary
Shard 2
…
Primary
Secondary
Secondary
Shard N
db.customer.insert({…})
db.customer.find({
name: ”John Smith”})
1.Dynamic Document Schema
{ name: “John Smith”,
date: “2013-08-01”,
address: “10 3rd St.”,
phone: {
home: 1234567890,
mobile: 1234568138 }
}
2. Native language drivers
4. High performance
- Data locality
- Indexes
- RAM
3. High availability
- Replica sets
5. Horizontal scalability
- Sharding
… …

MongoDB Use Cases
Single View Internet of Things Mobile Real-Time Analytics
Catalog Personalization Content Management

Streaming Data with Apache Kafka & MongoDB

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Streaming Data with Apache Kafka & MongoDB

Similaire à Streaming Data with Apache Kafka & MongoDB (20)

Plus de confluent

Plus de confluent (20)

Dernier

Dernier (20)

Streaming Data with Apache Kafka & MongoDB

Notes de l'éditeur