Apache Kafka® and Analytics in a Connected IoT World, Kai Waehner, Sr. Solutions Engineer Advanced Technology Group, Confluent
https://www.meetup.com/Berlin-Apache-Kafka-Meetup-by-Confluent/events/273166575/
Apache Kafka® and Analytics in a Connected IoT World
1. Apache Kafka and Analytics
in a Connected IoT World
Kai Waehner
Technology Evangelist
contact@kai-waehner.de
LinkedIn
@KaiWaehner
www.confluent.io
www.kai-waehner.de
3. IoT and Event Streaming – @KaiWaehner - www.kai-waehner.de
5
STREAM
PROCESSING
Create and store
materialized views
Filter
Analyze in-flight
Time
C CC
4. IoT and Event Streaming – @KaiWaehner - www.kai-waehner.de
TRADITIONAL
DATABASE
EVENT STREAM
PROCESSING
SELECT * FROM
DB_TABLE
CREATE TABLE T
AS SELECT * FROM
EVENT_STREAM
Active Query: Passive Data:
DB Table
Active Data: Passive Query:
Event Stream
5. IoT and Event Streaming – @KaiWaehner - www.kai-waehner.de
TABLES STREAMS
USER
JAY
SUE
FRED
CREDIT_SCORE
695
430
710V1
V3
V2
PAYMENTS
42
18
65
...
USER
JAY
SUE
FRED
...
6. IoT and Event Streaming – @KaiWaehner - www.kai-waehner.de
PUSH PULL
APP
Jay’s credit score is 670
Jay’s credit score is 710
Jay’s credit score is 695
What is Jay’s credit score now?
695
APP
7. IoT and Event Streaming – @KaiWaehner - www.kai-waehner.de
The Log ConnectorsConnectors
Producer Consumer
Streaming Engine
Apache Kafka - The Rise of an Event Streaming Platform
9
=
Messaging
+
Storage
+
Integration
+
Processing
8. IoT and Event Streaming – @KaiWaehner - www.kai-waehner.de
Apache Kafka at Scale at Tech Giants
> 7 trillion messages / day > 6 Petabytes / day
“You name it”
* Kafka is not just used for big data
** Kafka Is not just used by tech giants
11
9. IoT and Event Streaming – @KaiWaehner - www.kai-waehner.de
10 Reasons for Event Streaming with Apache Kafka
Real Time
Scalable
Cost Reduction
24/7 – Zero downtime, zero data loss
Decoupling – Storage, Domain-driven Design
Data (re-)processing and stateful client applications
Integration – Connectivity to IoT, legacy, big data, everything
Hybrid Architecture – On Premises, multi cloud, edge computing
Fully managed cloud
No vendor locking
12
10. IoT and Event Streaming – @KaiWaehner - www.kai-waehner.de
Device management
Unreliable networks
Connectivity beyond standards
Lightweight edge hardware
…
is not an IoT Platform!
12. IoT and Event Streaming – @KaiWaehner - www.kai-waehner.de
Ride-Sharing
More than just Messaging! Data correlation in real-time
for map-matching, ETA, cost calculation, and much more…
https://eng.lyft.com/a-new-real-time-map-matching-algorithm-at-lyft-da593ab7b006
13. IoT and Event Streaming – @KaiWaehner - www.kai-waehner.de
Connected Car Infrastructure
18
https://www.youtube.com/watch?v=yGLKi3TMJv8
• Real Time Data Analysis
• Swarm Intelligence
• Collaboration with Partners
• Predictive AI
• …
14. IoT and Event Streaming – @KaiWaehner - www.kai-waehner.de
Tesla
Trillions of messages per day for IoT use cases
https://www.confluent.io/kafka-summit-san-francisco-2019/0-60-teslas-streaming-data-platform/
https://www.confluent.io/blog/stream-processing-iot-data-best-practices-and-techniques/
15. IoT and Event Streaming – @KaiWaehner - www.kai-waehner.de
Track, manage, and locate
tools and other equipment
anytime and anywhere from
the warehouse to the jobsite https://www.confluent.io/customers/bosch/
https://events.confluent.io/online-talks/bosch-power-toolse-nables-real-time-analytics-on-iot-event-streams
16. DB Musterfirma | Vorname Name | Abteilung | Datum ("Einfügen > Kopf- und Fußzeile")
22Deutsche Bahn AG | Reisendeninformation
Consistent
real-time information
for travellers
across Germany
RI-Plattform
17. DB Musterfirma | Vorname Name | Abteilung | Datum ("Einfügen > Kopf- und Fußzeile")
23
Customer timetable
Operational
timetable
Assignments
Railway station
knowledge
Dispositions
Train positions
Matching
Aggregation
Consolidation
Apache
Kafka
Analysis
Railway station
Trains
Mobile Apps
Employees
Deutsche Bahn AG | Reisendeninformation
RI-Plattform
18. IoT and Event Streaming – @KaiWaehner - www.kai-waehner.de
19. IoT and Event Streaming – @KaiWaehner - www.kai-waehner.de
20. IoT and Event Streaming – @KaiWaehner - www.kai-waehner.de
Food Value Chain
IoT-Based and Data-Driven
Single source of truth
across the food value chain
(in the factories, and across regions)
Business critical
operations
(tracking, calculations, alerts, …)
https://www.confluent.io/blog/creating-iot-based-data-driven-food-value-chain-with-confluent-cloud/
21. IoT and Event Streaming – @KaiWaehner - www.kai-waehner.de
Postmodern ERP (coined by Gartner)
Replace legacy, monolithic and highly customized ERP suites
by a mixture of loosely coupled, exchangeable cloud-based and on-premises applications.
TMS
Legacy Proprietary
SOAP Web Services
Supplier
Alert
ForecastInventory Customer
Order
Core ERP
CRM
SaaS
Kafka Interface
MES
Proprietary
HTTP Web Services
LMS
Legacy Homegrown
Database + CDC
SRM
Kafka-native
22. IoT and Event Streaming – @KaiWaehner - www.kai-waehner.de
Real Time Supply Chain and Retailing IoT Platform
@ Mojix
https://www.confluent.io/customers/mojix/
Real-time operational intelligence with complex
event processing
Inventory accuracy increased from 65% to 99%
Omnichannel sales
Built using Confluent Cloud, Kafka, Kafka Connect
and Kafka Streams
Hybrid cloud across the edge – at retail stores and
distribution centers – and the cloud
Variety of sources, including RFID readers, camera
sensors, beacons, mobile devices and routers
23. IoT and Event Streaming – @KaiWaehner - www.kai-waehner.de
Cross-Company Supply Chain Integration
Streaming Replication and API Management
MirrorMaker 2
Confluent Replicator
Cluster Linking
Tier 2
Supplier
OEM Streaming integration
between companies
API Management
(REST et al) is not
appropriate for
streaming data
Infosec and politics are
your biggest hurdle
Tier 1
Supplier
24. IoT and Event Streaming – @KaiWaehner - www.kai-waehner.de
Augmented Reality for Smart Assistence
with Apache Kafka, Kafka Connect and ksqlDB
Pre-Processing and Data Correlation
(Kafka Streams / ksqlDB)
Receive
Command
Operator
(REST Proxy)
MES
(Java)
Send
Live
Metrics
Send
Command
Send
Production
StatusRobots
(C++)
Receive
Correlated
Information
25. IoT and Event Streaming – @KaiWaehner - www.kai-waehner.de
Cybersecurity
The threat is real!
Challenges
Stealing IP
DDoS
Ransomware / wiperware
WannaCry, NotPetya, …
Damage: Billions of dollars
”Supply chain attack”
Industry 4.0
Networking
Communication
Connectivity
Open standards
”Always-on”
26. IoT and Event Streaming – @KaiWaehner - www.kai-waehner.de
Legacy SIEM needs to evolve
ForwarderNetwork traffic
Firewall logs
RDBMS
Application logs
Adaptors
Beats
Sensor Data
Challenges:
● Proprietary forwarders that can only
send data to single source
● Data locked from being shared
● Difficult to scale with growing data
volumes
● Prohibitively high indexing costs
● Unable to filter out noisy data
● Slow batch processing
HTTP proxy logs
27. IoT and Event Streaming – @KaiWaehner - www.kai-waehner.de
AI/ML
Modernized security information and event management (SIEM)
Filter,
transform,
aggregate
APP SIEM Index
Search
Curated streams
Forensic
Archive
HDFS
S3
Big Query
Syslog
CDC
Network traffic
Firewall logs
RDBMS
Application logs
Sensor Data
HTTP proxy logs
QRadar
Arcsight
Splunk
Elastic
28. IoT and Event Streaming – @KaiWaehner - www.kai-waehner.de
BMW Group
Industry-ready NLP Service Framework Based on Kafka
https://www.confluent.io/kafka-summit-lon19/industry-ready-nlp-service-framework-kafka/
29. IoT and Event Streaming – @KaiWaehner - www.kai-waehner.de
Direct streaming ingestion
for model training
with TensorFlow I/O + Kafka Plugin
(no additional data storage
like S3 or HDFS required!)
Time
Model BModel A
Producer
Distributed
Commit Log
Streaming Ingestion and Model Training
with Kafka, Tiered Storage and TensorFlow IO
https://github.com/tensorflow/io
36
Model X
(at a later time)
30. IoT and Event Streaming – @KaiWaehner - www.kai-waehner.de
Confluent Tiered Storage for Kafka
Object Store
Processing Storage
Transactions,
auth, quota
enforcement,
compaction, ...
Local
Remote
Kafka
Apps
Store Forever
Older data is offloaded to inexpensive object
storage, permitting it to be consumed at any time.
Save $$$
Storage limitations, like capacity and duration,
are effectively uncapped.
Instantaneously scale up and down
Your Kafka clusters will be able to automatically
self-balance load and hence elastically scale
(Only available in Confluent Platform)
31. IoT and Event Streaming – @KaiWaehner - www.kai-waehner.de
BI
Tool
AI/ML
Machine Vision for Quality Assurance and Yield Management
Apache Kafka and Applied Machine Learning
Filter, transform
aggregate, orchestrate
APP
Real-time alerting
Sensor Data
SCADA
MES
PLCs
OT
Team
Plant
Manager
Images
from Products
of Assembly Lines
IT
Team
Live
Ops
Machine Vision for
Quality Inspection
Reporting
Backup
Data Science Team
Data Lake
33. IoT and Event Streaming – @KaiWaehner - www.kai-waehner.de
The Rise of Event Streaming
2010
Apache Kafka
created at LinkedIn by
Confluent founders
2014
2020
80%
Fortune 100
Companies
trust and use
Apache Kafka
34. IoT and Event Streaming – @KaiWaehner - www.kai-waehner.de
50
I N V E S T M E N T & T I M E
VALUE
3
4
5
1
2
Event Streaming Maturity Model
50
Initial Awareness /
Pilot (1 Kafka Cluster)
Start to Build Pipeline /
Deliver 1 New Outcome
(1 Kafka Cluster)
Mission-Critical
Deployment
(Stretched, Hybrid,
Multi-Region)
Build Contextual Event-
Driven Apps
(Stretched, Hybrid,
Multi-Region)
Central Nervous System
(Global Kafka)
Product, Support, Training, Partners, Technical Account Management...
35. IoT and Event Streaming – @KaiWaehner - www.kai-waehner.de
51Confluent Platform
Fully Managed Cloud ServiceSelf Managed Software FREEDOM OF CHOICE
COMMITTER-DRIVEN EXPERTISE PartnersTrainingProfessional
Services
Enterprise
Support
Apache Kafka
EFFICIENT
OPERATIONS AT SCALE
PRODUCTION-
STAGE PREREQUISITES
UNRESTRICTED
DEVELOPER PRODUCTIVITY
SQL-based Stream Processing
KSQL (ksqlDB)
Rich Pre-built Ecosystem
Connectors | Hub | Schema Registry
Multi-language Development
non-Java clients | REST Proxy
GUI-driven Mgmt & Monitoring
Control Center
Flexible DevOps Automation
Operator | Ansible
Dynamic Performance &
Elasticity
Auto Data Balancer | Tiered Storage
Enterprise-grade Security
RBAC | Secrets | Audit logs
Data Compatibility
Schema Registry | Schema
Validation
Global Resilience
Multi-Region Clusters | Replicator
Developer Operator Architect
Open Source | Community licensed
PARTNERSHIP
FOR BUSINESS SUCCESS
Complete Engagement Model
Revenue / Cost / Risk Impact
TCO / ROI
Executive Buyer
36. IoT and Event Streaming – @KaiWaehner - www.kai-waehner.de
Global Event Streaming
Aggregate Small Footprint
Edge Deployments with
Replication (Aggregation)
Simplify Disaster Recovery
Operations with
Multi-Region Clusters
with RPO=0 and RTO=0
Stream Data Globally with
Replication and Cluster Linking