SlideShare une entreprise Scribd logo
1  sur  68
Télécharger pour lire hors ligne
@helenaedelson #kafkasummit 1
Leveraging Kafka for Big Data in Real Time Bidding,
Analytics, ML & Campaign Management
for Globally Distributed
Data Flows
Helena Edelson @helenaedelson Kafka Summit 2016
@helenaedelson #kafkasummit
VP of Engineering, Tuplejump
Previously: Sr Cloud / Big Data / Analytics
Engineer: DataStax, CrowdStrike, VMware,
SpringSource...
Event-Driven systems, Analytics, Machine
Learning, Scala
Committer: Kafka Connect Cassandra,
Spark Cassandra Connector
Contributor: Akka, previously: Spring
Integration
Speaker: Kafka Summit, Spark Summit, Strata,
QCon, Scala Days, Scala World, Philly ETE
2
twitter.com/helenaedelson
github.com/helena
slideshare.net/helenaedelson
@helenaedelson #kafkasummit
The Real Topic
3
http://www.slideshare.net/palvaro/ricon-keynote-outwards-from-the-middle-of-the-maze/42
@helenaedelson #kafkasummit
Chaos Of
Distribution
One of the more
fascinating problems is
that of solving the chaos
of distributed systems.
Regardless of the
domain.
4
@helenaedelson #kafkasummit
Aproaching this
within the use case of:
High-Level
Landscape
Platform &
Infrastructure
Strategies and Patterns
Four-Letter Acronyms
Can't Touch This
Architecture
5
@helenaedelson #kafkasummit 6
The Landscape
@helenaedelson #kafkasummit 7
The Digital Ad Industry
@helenaedelson #kafkasummit
An RTB Drive-By
Real time auction for ad
spaces, all devices
High throughput, low-Latency
(similar to FIN Tech but not
quite)
OpenRTB API Spec - but not
everyone uses it
8
Open protocol for
automated trading of
digital media across
platforms, devices, and
advertising solutions
@helenaedelson #kafkasummit 9
Ad
Delivered
to User
In A Nutshell
User
hits a
Publisher's
page
Advertiser
Advertiser
Advertisers
send Bid
Requests
Highest
Bid
Accepted
@helenaedelson #kafkasummit 10
Site:
Ad supported
content
Real Time Exchange &
Auction (SSP):
OpenRTB Server used to
bid
Bidder Service
(DSP):
OpenRTB client
Advertiser:Buyer
wants ad
impressions. Uses
bidders to bid on
behalf
Publisher:Seller
has ad spaces to
sell to highest
bidders
User Devices
ad
request
winning
ad
bid
request
win notice &
settlement price
insert orders
bid
response
winning
ad
RTB Auction for Impressions
@helenaedelson #kafkasummit 11
Time Is Money
RTB:
Maximum response latency of 100 ms
@helenaedelson #kafkasummit 12
Time Is Money
Assume some network latency!
@helenaedelson #kafkasummit
Sampling of RTB
Events
Ad Request
Bid Request - JSON 100 bytes
Compute optimal bid for advertiser
Bid Response - JSON 1000 bytes (may include ad metadata)
Win Notification (may or may not exist) with settlement price
Ad Impression - when the ad is viewed
Ad Click
Ad Conversion
13
@helenaedelson #kafkasummit
Event Streams
Auctions: auction data + bid requests
Ad Impressions: which ad ids were shown
Ad Clicks: which auction ids resulted in a
click
Ad Conversions: streams joined on auction id
Analytics Aggregations & ML to derive
hundreds of metrics and dimensions
14
@helenaedelson #kafkasummit 15
Real Time
Just means Event-Driven or processing events as they arrive.
Does not automatically equal sub-second latency
requirements.
Seen / Ingestion Time
When an event is ingested into the system
Event Time
When an event is created, e.g. on a device.
@helenaedelson #kafkasummit 16
The Platform
@helenaedelson #kafkasummit
Platform Requirements
24 / 7 Uptime
Brokerage model: DSPs only make $ on
successful ad deliveries, so uptime is critical
Security
Enable service across the globe
Handle thousands of concurrent requests per second
Scale to traffic of 700TB per day
Manage 700TB per day of data
Derive Metrics
17
@helenaedelson #kafkasummit
Business Requirements
Support SLAs for bid transactions
Legal constraints - user data crossing borders
The critical path must be fast to win
No data loss on ingestion path
Bid & Campaign Optimization
Frequency Capping
Management UI for Publishers & Advertisers
18
@helenaedelson #kafkasummit
Questions To Answer
% Writes on ingestion, analytics pre-aggregation, etc.
% Reads of raw data by analytics, aggregated views by customer
management UI
How much in memory on RTB app nodes?
Dimensions of data in analytics queries
Optimization Algos
What needs real time feedback loops, what does not
Which data flows are low-lateny/high frequency, which not
Where are potential bottlenecks
19
@helenaedelson #kafkasummit
Constraints
Resources - I need to build highly
functioning teams that are psyched
about the work and working together
Budget
Cloud Resources
JDK Version (What?!)
Existing infrastructure & technologies
that will be replaced later but you have to
deal with now :(
20
Pro Tip:
Pay well,
Allow people
to grow & be
creative
@helenaedelson #kafkasummit 21
Strategies
To Avoid
@helenaedelson #kafkasummit
Beware of the C word
Consistency?
22
Convergence?
@helenaedelson #kafkasummit 23
http://www.slideshare.net/palvaro/ricon-keynote-outwards-from-the-middle-of-the-maze/39
he went
there
@palvaro
@helenaedelson #kafkasummit
Complexity
24
Can't
Ops
your
way
out
of
that
@helenaedelson #kafkasummit 25
Occam's razor:
Simpler theories are preferable to more complex
@helenaedelson #kafkasummit 26
Strategies
@helenaedelson #kafkasummit
Approaches
Eventual/Tunable consistency
Time & Clocks in globally-distributed
systems
Location Transparency
Asynchrony
Pub-Sub
Design for scale
Design for Failure
27
@helenaedelson #kafkasummit
Kafka as Platform Fabric
28
@helenaedelson #kafkasummit
From MVP to Scalable with Kafka
Microservices
Does One Thing, Knows One Thing
Separate low-latency hot path
Separate deploy artifacts
Separate data mgmt clusters by
concern
analytics, timeseries, etc.
CQRS: Separate Read Write paths
29
Scalpel...
Separate The Monolith
@helenaedelson #kafkasummit
Immutable events stream to Kafka, partitioned
by event type, time, etc.
Subscribers & Publishers
RTB microservices - receives raw, receives
Analytics cluster - receives raw, publishes
aggregates
Management / Reporting nodes
30
Services communicate
indirectly via Kafka
@helenaedelson #kafkasummit
CQRS: Command Query
Responsibility Segregation
Decouple Write streams from Read streams
Different schemas / data structures
Writers (Publishers) publish without having
awareness who needs to receive it or how
to reach them (location, protocol...)
Readers (Subscribers) should be able to
subscribe and asynchronously receive
from topics of interest
31
@helenaedelson #kafkasummit 32
Eventually Consistent Across DCs
US-East-1
MirrorMaker
EU-west-1
RTB
micro
services
RTB
micro
services
RTB
micro
services
Publishers
Subscribers
Subscribers
Publishers
Kafka Cluster Per Region
ZK
ZK
Mgmt
micro
services
Mgmt
micro
services
Mgmt
micro
services
Query Layer
Analytics & ML Cluster
Timeseries Cluster
Spark
Streaming
& ML
Cassandra
Cross DC
Replication
Topology
Aware
Spark
Streaming
& ML
Cassandra
Spark
Streaming
& ML
Cassandra
Cross DC
Replication
Topology
Aware
Spark
Streaming
& ML
Cassandra
Compute Layer
@helenaedelson #kafkasummit 33
MirrorMaker
RTB
micro
services
RTB
micro
services
RTB
micro
services
Publishers
Subscribers
Subscribers
Publishers
C*
C*
Eventually Consistent Across DCs
Mgmt
micro
services
Mgmt
micro
services
Mgmt
micro
services
US-East-1
EU-west-1
Kafka Cluster Per Region
Analytics & ML Cluster
Timeseries Cluster
Spark
Streaming
& ML
Cassandra
Cross DC
Replication
Topology
Aware
Spark
Streaming
& ML
Cassandra
Spark
Streaming
& ML
Cassandra
Cross DC
Replication
Topology
Aware
Spark
Streaming
& ML
Cassandra
Compute Layer
Query Layer
@helenaedelson #kafkasummit
Kafka Cross Datacenter
Mirroring
bin/kafka-run-class.sh kafka.tools.MirrorMaker --consumer.config config/
consumer_source_cluster.properties --producer.config config/
producer_target_cluster.properties --whitelist bidrequests --num.producers 2 --
num.streams 4
34
Publish messages from various datacenters
around the world
@helenaedelson #kafkasummit
Users in the US and UK connect DCs in their geo region for
lower latency
Both DCs are part of the same cluster for X-DC Replication
Configure LB policies to prefer local DC
LOCAL_QUORUM reads
Data is available cluster-wide for backup, analytics, and to
account for user travel across regions
35
Cassandra Cross DC Replication
It's out of the box. Multi-region live backups for free:
[ NetworkTopologyStrategy ]
@helenaedelson #kafkasummit 36
Cassandra Cross DC Replication
Keep EU User Data in the EU
CREATE KEYSPACE rtb WITH REPLICATION = {
‘class’: ‘NetworkTopologyStrategy’,
‘eu-east-dc’: ‘3’,‘eu-west-dc’: ‘3’
};
@helenaedelson #kafkasummit 37
Cassandra Time Windowed Buckets with TTL
CREATE TABLE rtb.fu_events (
id int,
seen_time timeuuid,
event_time timestamp,
PRIMARY KEY (id,date)
) WITH CLUSTERING ORDER BY (event_time DESC)
AND compaction = {
'compaction_window_unit': 'DAY',
'compaction_window_size': '3',
'class':'com.jeffjirsa.cassandra.db.compaction.TimeWindowCompactionStrategy'
}
AND compression = {
'crc_check_chance': '0.5',
'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'
}
AND bloom_filter_fp_chance = 0.01
AND caching = '{"keys":"ALL", "rows_per_partition":"100"}'
AND dclocal_read_repair_chance = 0.0
AND default_time_to_live = 60
AND gc_grace_seconds = 0
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';
3 DAY buckets -
larger SSTables on
disk minimizes
bootstrapping issues
when adding nodes
to a cluster
3 MINUTE buckets
1 HOUR buckets
1 DAY buckets
MICROSECOND resolution:
@helenaedelson #kafkasummit 38
Want Can Or Currently Use Status But
Kafka Security Kafka Security TLS, Kerberos, SASL, Auth,
Encryption, Authentication
v0.9.0
Thanks Jun!
Integrated Streaming Kafka Streams processing inside Kafka, no alternate
cluster setup or ops.
v0.10
Thanks Guozhang!
It's java :( Iw
Cassandra CDC Cassandra CDC.
Triggers? Tiggers are a pre-commit
hook :(
The Epic JIRA:
https://issues.apache.org/jira/browse/
CASSANDRA-8844
no comment
And... Kafka Streams &
Kafka Connect Integration
..wait for it..
no comment
Always on, X-DC
Replication, Flexible
Topologies
Kafka, Cassandra
OOTB
Fault Tolerance Kafka, Spark, Mesos, Cassandra,
Akka
Baked In
Location Transparency Kafka, Cassandra, Akka Check!
Asynchrony Kafka, Cassandra, Akka Check!
Decoupling Kafka, Akka Check!
Pub-Sub Kafka, Cassandra, Akka Check!
Immutability Kafka, Akka, Scala Check!
My Nerdy Chart v2.0
@helenaedelson #kafkasummit
Kafka Streams
in v 0.10
39
val builder = new KStreamBuilder()

val stream: KStream[K,V] = builder.stream(des, des, "raw.data.topic")
.flatMapValues(value -> Arrays.asList(value.toLowerCase.split(" ")
.map((k,v) -> new KeyValue(k,v))
.countByKey(ser, ser, des, des, "kTable")
.toStream
stream.to("results.topic", ...)
val streams = new KafkaStreams(builder, props)
streams.start()
@helenaedelson #kafkasummit
Kafka Streams &
Kafka Connect?
40
val builder = new KStreamBuilder()
val stream1: KStream[K,V] = builder.stream(new CassandraConnect(configs))
.flatMapValues(..)
.map((k,v) -> new KeyValue(k,v))
.countByKey(ser, ser, des, des, "kTable")
.toStream
stream.to("results.topic", ...)
val streams = new KafkaStreams(builder, props)
streams.start()
YES
@helenaedelson #kafkasummit 41
/** Writes records from Kafka to Cassandra asynchronously and non-blocking. */
override def put(records: JCollection[SinkRecord]): Unit
/** Returns a list of records when available by polling for new records. */
override def poll: JList[SourceRecord])
https://github.com/tuplejump/kafka-connect-cassandra
@helenaedelson #kafkasummit
Frequency Capping
1. Count the number of times user X has seen ad Y from
Advertiser A's Campaign C
2. Limit the max number of impressions of an ad within
T1...T2
42
Use Case:
Continuously count impressions grouped by campaign across DCs
low-latency reads & writes
Must scale
Cross DC Counters
Translation: Distributed Counters
@helenaedelson #kafkasummit
Redis? Broke under the load
Aerospike? Great candidate
Eventuate? Interesting, much lighter
Kafka streams when it's out? Interesting, already in the
infra
Flink? Very interesting but...
Cassandra Counters - not applicable for this
43
Frequency Capping
@helenaedelson #kafkasummit
As a distributed counting microservice
As a key-value store for in-memory caching
Fast reads - Very read heavy
99% reads are < 1 ms latency (sweet)
30,000 writes per second
350,000 reads per second on 7 nodes
Replication factor 2:
Cross datacenter replication (XDC), SSD-backed
Excellent few posts by Dag, Tapads CTO on in-memory
infrastructure + Ad Tech: (see resources slide)
44
Aerospike
@helenaedelson #kafkasummit
CRDT: Conflict Free
Replicated Data Type
State-based: objects require only eventual communication
between pairs of replicas
Operation-based: replication requires reliable broadcast
communication with delivery in a well-defined delivery
order
Both guaranteed to converge towards common, correct state
Keep replicas available for writes during a network partition
requires resolution of conflicting writes when the partition
heals
45
@helenaedelson #kafkasummit
Eventuate
A toolkit for building distributed, HA & partition-tolerant event-sourced applications.
Developed by Martin Krasser (@mrt1nz) for Red Bull Media (open source)
Interactive, automated conflict resolution (via op-based CRDTs)
Separates command side of an app from its query side (CQRS)
Primary Goals: preserving causality, idempotency & event ordering guarantees even under
chaotic conditions
AP of CAP - conflicts cannot be prevented & must be resolved.
Causality - tracked with Vector Clocks
Adapters provide connectivity to other stream processing solutions
Can currently chose Cassandra if desired
Kafka coming soon!
46
@helenaedelson #kafkasummit
Replication of application state through
async event replication across locations
Locations consume replicated events to re-
construct application state locally
Multiple locations concurrently update as
multi-master
47
Eventuate as
Distributed CRDT Microservice
@helenaedelson #kafkasummit 48
Applications can continue
writing to a local replica during
a network partition
-> To Cassandra
-> To Kafka
(soon)
Pass To Pipeline:
@helenaedelson #kafkasummit 49
import scala.concurrent.Future

import akka.actor.{ActorRef, ActorSystem}

import com.rbmhtechnology.eventuate.crdt.{CRDTServiceOps, Counter, CounterService}
class CappingService(val id: String, override val log: ActorRef)

(implicit val system: ActorSystem,

val integral: Integral[Int],

override val ops: CRDTServiceOps[Counter[Int], Int])

extends CounterService[Int](id, log) {



/** Increment only op: adds `delta` to the counter identified by `id`
* and returns the updated counter value.
*/

def increment(id: String, delta: Int): Future[Int] =

value(id) flatMap {

case v if v >= 0 && (delta > 0 || delta > v) =>

update(id, delta)

case v =>

Future.successful(v)

}



start()



}
import scala.concurrent.Future
import akka.actor.ActorSystem
val a = new CappingService(id1, eventLog)

a.increment(id1, 3) // Future(3) 3 impressions

a.value(id1) // Future(3) 3 impressions

a.increment(id1, -2) // increments only, idempotent.
val b = new CappingService(id2, eventLog)
b.value(id1) // Future(a.value(id1))
Knows the same count over n-instances,
all geo-locations, for the same id
class CounterService[A : Integral](val replicaId: String, val log: ActorRef) {
def value(id: String): Future[A] = { ... }
def update(id: String, delta: A): Future[A] = { ... }
}
@helenaedelson #kafkasummit 50
Eventuate
@helenaedelson #kafkasummit
Eventuate Takeaway
It's just a jar!
OOTB async internal component messaging
and fault tolerance
Integrate with relevant microservices
No store/cache cluster to deploy, just keep
monitoring your apps
Written in Scala
Built on Akka - a toolkit for building highly
concurrent, distributed, and resilient event-
driven applications on the JVM
51
@helenaedelson #kafkasummit 52
Analytics & ML
@helenaedelson #kafkasummit
Refresher: Sampling of
RTB Events
Ad Request
Bid Request - JSON 100 bytes
Compute optimal bid for advertiser
Bid Response - JSON 1000 bytes (may include ad metadata)
Win Notification (may or may not exist) with settlement price
Ad Impression - when the ad is viewed
Ad Click
Ad Conversion
53
@helenaedelson #kafkasummit 54
OpenRTB: objects in the Bid Request model
@helenaedelson #kafkasummit
TopK most high performing
campaigns
Number of views served in the last 7
days, by country, by city
What determined successful ad
conversions
Age distribution per campaign
55
Streaming Analytics
@helenaedelson #kafkasummit
Spark Streaming Kafka
class KafkaStreamingActor(ssc: StreamingContext) extends MyAggregationActor {
val stream = KafkaUtils.createDirectStream(...).map(RawData(_))
stream
.foreachRDD(_.toDF.write.format("filodb.spark")
.option("dataset", "rawdata")
.save())

/* Pre-Aggregate data in the stream for fast querying and aggregation later
stream.map(hour =>

(hour.wsid, hour.year, hour.month, hour.day, hour.oneHourPrecip)
).saveToCassandra(timeseriesKeyspace, dailyPrecipTable)
}
56
Can write to
Cassandra,
FiloDB...
@helenaedelson #kafkasummit
Machine Learning
Train on 1+ week of data for
Recommendations
Bid Optimization
Campaign Optimization
Consumer Profiling
...and much more
57
@helenaedelson #kafkasummit
Machine Learning
The probability of an ad, from a specific
ISP, OS, website, demographic, etc.
resulting in a conversion
Which attributes of impressions are good
predictors of better ad performance?
58
@helenaedelson #kafkasummit
Bid Optimization &
Predictive Models
Which impressions should an Advertiser bid for?
Per campaign, per country it may run in..?
What is the best bid for each impression
59
@helenaedelson #kafkasummit 60
Compute
optimal bid
price
Train the
model
Score
bid requests
Determine
value of
bid reqest
Train on every
bid req attribute
Based on Campaign
Objectives
Against Budget Send bid decision
to bidder
Machine Learning
@helenaedelson #kafkasummit
Spark Streaming, MLLib & FiloDB
61
val ssc = new StreamingContext(sparkConf, Seconds(5))

val kafkaStream = KafkaUtils.createDirectStream[..](..)
.map(transformFunc)
.map(LabeledPoint.parse)
kafkaStream.foreachRDD(_.toDF.write.format("filodb.spark")
.option("dataset", "training").save())


val model = new StreamingLinearRegressionWithSGD()

.setInitialWeights(Vectors.dense(weights))
.trainOn(dataStream.join(historicalEvents))
model.predictOnValues(dataStream.map(lp => (lp.label, lp.features)))
.insertIntoFilo("predictions")
@helenaedelson #kafkasummit
700 Queries Per Second:
Spark Streaming & FiloDB
Even for datasets with 15 million rows! Using FiloDB's
InMemoryColumnStore
Single host / MBP
5GB RAM
SQL to DataFrame caching
https://github.com/tuplejump/FiloDB
Evan Chan's (@velvia) blog post
NoLambda: A new architecture combining streaming, ad hoc,
machine-learning, and batch analytics
62
@helenaedelson #kafkasummit 63
Eventually Consistent Across DCs
US-East-1
MirrorMaker
EU-west-1
RTB
micro
services
RTB
micro
services
RTB
micro
services
Publishers
Subscribers
Subscribers
Publishers
Kafka Cluster Per Region
ZK
ZK
Mgmt
micro
services
Mgmt
micro
services
Mgmt
micro
services
Query Layer
Analytics & ML Cluster
Timeseries Cluster
Spark
Streaming
& ML
Cassandra
Cross DC
Replication
Topology
Aware
Spark
Streaming
& ML
Cassandra
Spark
Streaming
& ML
Cassandra
Cross DC
Replication
Topology
Aware
Spark
Streaming
& ML
Cassandra
Compute Layer
@helenaedelson #kafkasummit
Self-Healing Systems
Massive event spikes & bursty traffic
Fast producers / slow consumers
Network partitioning & out of sync
systems
DC down
Not DDOS'ing ourselves from fast
streams No data loss when auto-
scaling down
64
@helenaedelson #kafkasummit
Byzantine Fault Tolerance?
65
Looks like I'll
miss standup
@helenaedelson #kafkasummit
Everything fails, all the time
Monitor
Everything
66
@helenaedelson #kafkasummit
Non-Monotonic Snapshot Isolation: scalable and strong consistency
for geo-replicated transactional systems
Conflict-free Replicated Data Types
Implementing operation-based CRDTs
http://codebetter.com/gregyoung/2010/02/16/cqrs-task-based-uis-event-sourcing-agh
http://martinfowler.com/bliki/CQRS.html
http://github.com/openrtb/OpenRTB
http://akka.io
http://rbmhtechnology.github.io/eventuate
https://github.com/RBMHTechnology/eventuate
http://rbmhtechnology.github.io/eventuate/user-guide.html#commutative-replicated-data-types
http://www.planetcassandra.org/data-replication-in-nosql-databases-explained
http://wikibon.org/wiki/v/Optimizing_Infrastructure_for_Analytics-Driven_Real-Time_Decision_Making
Resources
67
twitter.com/helenaedelson
github.com/helena
slideshare.net/helenaedelson
Thanks!
@helenaedelson #kafkasummit

Contenu connexe

Tendances

MuleSoft Online Meetup - MuleSoft integration with snowflake and kafka
MuleSoft Online Meetup - MuleSoft integration with snowflake and kafkaMuleSoft Online Meetup - MuleSoft integration with snowflake and kafka
MuleSoft Online Meetup - MuleSoft integration with snowflake and kafkaRoyston Lobo
 
Digital Operating Model & IT4IT
Digital Operating Model & IT4ITDigital Operating Model & IT4IT
Digital Operating Model & IT4ITDavid Favelle
 
Microservices Architecture - Bangkok 2018
Microservices Architecture - Bangkok 2018Microservices Architecture - Bangkok 2018
Microservices Architecture - Bangkok 2018Araf Karsh Hamid
 
Micro services Architecture
Micro services ArchitectureMicro services Architecture
Micro services ArchitectureAraf Karsh Hamid
 
Event Driven Microservices architecture
Event Driven Microservices architectureEvent Driven Microservices architecture
Event Driven Microservices architectureNikhilBarthwal4
 
How to govern and secure a Data Mesh?
How to govern and secure a Data Mesh?How to govern and secure a Data Mesh?
How to govern and secure a Data Mesh?confluent
 
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache KafkaThe Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache KafkaKai Wähner
 
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...HostedbyConfluent
 
Critical Review of Open Group IT4IT Reference Architecture
Critical Review of Open Group IT4IT Reference ArchitectureCritical Review of Open Group IT4IT Reference Architecture
Critical Review of Open Group IT4IT Reference ArchitectureAlan McSweeney
 
A pattern language for microservices - June 2021
A pattern language for microservices - June 2021 A pattern language for microservices - June 2021
A pattern language for microservices - June 2021 Chris Richardson
 
Design Guidelines for Data Mesh and Decentralized Data Organizations
Design Guidelines for Data Mesh and Decentralized Data OrganizationsDesign Guidelines for Data Mesh and Decentralized Data Organizations
Design Guidelines for Data Mesh and Decentralized Data OrganizationsDenodo
 
Platform Strategy to Deliver Digital Experiences on Azure
Platform Strategy to Deliver Digital Experiences on AzurePlatform Strategy to Deliver Digital Experiences on Azure
Platform Strategy to Deliver Digital Experiences on AzureWSO2
 
Application modernization patterns with apache kafka, debezium, and kubernete...
Application modernization patterns with apache kafka, debezium, and kubernete...Application modernization patterns with apache kafka, debezium, and kubernete...
Application modernization patterns with apache kafka, debezium, and kubernete...Bilgin Ibryam
 
Real Time Data Strategy and Architecture
Real Time Data Strategy and ArchitectureReal Time Data Strategy and Architecture
Real Time Data Strategy and ArchitectureAlan McSweeney
 
DevOps + DataOps = Digital Transformation
DevOps + DataOps = Digital Transformation DevOps + DataOps = Digital Transformation
DevOps + DataOps = Digital Transformation Delphix
 
IT4IT real life examples & myths and rumors dispelled
IT4IT real life examples & myths and rumors dispelledIT4IT real life examples & myths and rumors dispelled
IT4IT real life examples & myths and rumors dispelledTony Price
 
Observability for Modern Applications (CON306-R1) - AWS re:Invent 2018
Observability for Modern Applications (CON306-R1) - AWS re:Invent 2018Observability for Modern Applications (CON306-R1) - AWS re:Invent 2018
Observability for Modern Applications (CON306-R1) - AWS re:Invent 2018Amazon Web Services
 
DevOps and Continuous Delivery Reference Architectures (including Nexus and o...
DevOps and Continuous Delivery Reference Architectures (including Nexus and o...DevOps and Continuous Delivery Reference Architectures (including Nexus and o...
DevOps and Continuous Delivery Reference Architectures (including Nexus and o...Sonatype
 

Tendances (20)

MuleSoft Online Meetup - MuleSoft integration with snowflake and kafka
MuleSoft Online Meetup - MuleSoft integration with snowflake and kafkaMuleSoft Online Meetup - MuleSoft integration with snowflake and kafka
MuleSoft Online Meetup - MuleSoft integration with snowflake and kafka
 
Data Mesh 101
Data Mesh 101Data Mesh 101
Data Mesh 101
 
Digital Operating Model & IT4IT
Digital Operating Model & IT4ITDigital Operating Model & IT4IT
Digital Operating Model & IT4IT
 
Microservices Architecture - Bangkok 2018
Microservices Architecture - Bangkok 2018Microservices Architecture - Bangkok 2018
Microservices Architecture - Bangkok 2018
 
Micro services Architecture
Micro services ArchitectureMicro services Architecture
Micro services Architecture
 
Event Driven Microservices architecture
Event Driven Microservices architectureEvent Driven Microservices architecture
Event Driven Microservices architecture
 
How to govern and secure a Data Mesh?
How to govern and secure a Data Mesh?How to govern and secure a Data Mesh?
How to govern and secure a Data Mesh?
 
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache KafkaThe Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
 
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
 
Critical Review of Open Group IT4IT Reference Architecture
Critical Review of Open Group IT4IT Reference ArchitectureCritical Review of Open Group IT4IT Reference Architecture
Critical Review of Open Group IT4IT Reference Architecture
 
A pattern language for microservices - June 2021
A pattern language for microservices - June 2021 A pattern language for microservices - June 2021
A pattern language for microservices - June 2021
 
Design Guidelines for Data Mesh and Decentralized Data Organizations
Design Guidelines for Data Mesh and Decentralized Data OrganizationsDesign Guidelines for Data Mesh and Decentralized Data Organizations
Design Guidelines for Data Mesh and Decentralized Data Organizations
 
Platform Strategy to Deliver Digital Experiences on Azure
Platform Strategy to Deliver Digital Experiences on AzurePlatform Strategy to Deliver Digital Experiences on Azure
Platform Strategy to Deliver Digital Experiences on Azure
 
Application modernization patterns with apache kafka, debezium, and kubernete...
Application modernization patterns with apache kafka, debezium, and kubernete...Application modernization patterns with apache kafka, debezium, and kubernete...
Application modernization patterns with apache kafka, debezium, and kubernete...
 
Real Time Data Strategy and Architecture
Real Time Data Strategy and ArchitectureReal Time Data Strategy and Architecture
Real Time Data Strategy and Architecture
 
DevOps + DataOps = Digital Transformation
DevOps + DataOps = Digital Transformation DevOps + DataOps = Digital Transformation
DevOps + DataOps = Digital Transformation
 
IT4IT real life examples & myths and rumors dispelled
IT4IT real life examples & myths and rumors dispelledIT4IT real life examples & myths and rumors dispelled
IT4IT real life examples & myths and rumors dispelled
 
Observability for Modern Applications (CON306-R1) - AWS re:Invent 2018
Observability for Modern Applications (CON306-R1) - AWS re:Invent 2018Observability for Modern Applications (CON306-R1) - AWS re:Invent 2018
Observability for Modern Applications (CON306-R1) - AWS re:Invent 2018
 
The future of AIOps
The future of AIOpsThe future of AIOps
The future of AIOps
 
DevOps and Continuous Delivery Reference Architectures (including Nexus and o...
DevOps and Continuous Delivery Reference Architectures (including Nexus and o...DevOps and Continuous Delivery Reference Architectures (including Nexus and o...
DevOps and Continuous Delivery Reference Architectures (including Nexus and o...
 

En vedette

Disorder And Tolerance In Distributed Systems At Scale
Disorder And Tolerance In Distributed Systems At ScaleDisorder And Tolerance In Distributed Systems At Scale
Disorder And Tolerance In Distributed Systems At ScaleHelena Edelson
 
Streaming Big Data & Analytics For Scale
Streaming Big Data & Analytics For ScaleStreaming Big Data & Analytics For Scale
Streaming Big Data & Analytics For ScaleHelena Edelson
 
Building Reactive Distributed Systems For Streaming Big Data, Analytics & Mac...
Building Reactive Distributed Systems For Streaming Big Data, Analytics & Mac...Building Reactive Distributed Systems For Streaming Big Data, Analytics & Mac...
Building Reactive Distributed Systems For Streaming Big Data, Analytics & Mac...Helena Edelson
 
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch AnalysisNoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch AnalysisHelena Edelson
 
Idiro Analytics - Identifying Families using Social Network Analysis and Big ...
Idiro Analytics - Identifying Families using Social Network Analysis and Big ...Idiro Analytics - Identifying Families using Social Network Analysis and Big ...
Idiro Analytics - Identifying Families using Social Network Analysis and Big ...Idiro Analytics
 
Big Telco - Yousun Jeong
Big Telco - Yousun JeongBig Telco - Yousun Jeong
Big Telco - Yousun JeongSpark Summit
 
Deriving economic value for CSPs with Big Data [read-only]
Deriving economic value for CSPs with Big Data [read-only]Deriving economic value for CSPs with Big Data [read-only]
Deriving economic value for CSPs with Big Data [read-only]Flytxt
 
Idiro Analytics - Analytics & Big Data
Idiro Analytics - Analytics & Big DataIdiro Analytics - Analytics & Big Data
Idiro Analytics - Analytics & Big DataIdiro Analytics
 
Telco Churn Roi V3
Telco Churn Roi V3Telco Churn Roi V3
Telco Churn Roi V3hkaul
 
Idiro Analytics - What is Rotational Churn and how can we tackle it?
Idiro Analytics - What is Rotational Churn and how can we tackle it?Idiro Analytics - What is Rotational Churn and how can we tackle it?
Idiro Analytics - What is Rotational Churn and how can we tackle it?Idiro Analytics
 
Subscriber Churn Prediction Model using Social Network Analysis In Telecommun...
Subscriber Churn Prediction Model using Social Network Analysis In Telecommun...Subscriber Churn Prediction Model using Social Network Analysis In Telecommun...
Subscriber Churn Prediction Model using Social Network Analysis In Telecommun...BAINIDA
 
Idiro Analytics - Social Network Analysis for Online Gaming
Idiro Analytics - Social Network Analysis for Online GamingIdiro Analytics - Social Network Analysis for Online Gaming
Idiro Analytics - Social Network Analysis for Online GamingIdiro Analytics
 
Churn Analysis in Telecom Industry
Churn Analysis in Telecom IndustryChurn Analysis in Telecom Industry
Churn Analysis in Telecom IndustrySatyam Barsaiyan
 
Social Network Analysis for Telecoms
Social Network Analysis for TelecomsSocial Network Analysis for Telecoms
Social Network Analysis for TelecomsDataspora
 
Predicting churn in telco industry: machine learning approach - Marko Mitić
 Predicting churn in telco industry: machine learning approach - Marko Mitić Predicting churn in telco industry: machine learning approach - Marko Mitić
Predicting churn in telco industry: machine learning approach - Marko MitićInstitute of Contemporary Sciences
 
Decide on technology stack & data architecture
Decide on technology stack & data architectureDecide on technology stack & data architecture
Decide on technology stack & data architectureSV.CO
 
How to use your CRM for upselling and cross-selling
How to use your CRM for upselling and cross-sellingHow to use your CRM for upselling and cross-selling
How to use your CRM for upselling and cross-sellingRedspire Ltd
 
Big Data: Social Network Analysis
Big Data: Social Network AnalysisBig Data: Social Network Analysis
Big Data: Social Network AnalysisMichel Bruley
 
Big Data Analytics : A Social Network Approach
Big Data Analytics : A Social Network ApproachBig Data Analytics : A Social Network Approach
Big Data Analytics : A Social Network ApproachAndry Alamsyah
 

En vedette (20)

Disorder And Tolerance In Distributed Systems At Scale
Disorder And Tolerance In Distributed Systems At ScaleDisorder And Tolerance In Distributed Systems At Scale
Disorder And Tolerance In Distributed Systems At Scale
 
Streaming Big Data & Analytics For Scale
Streaming Big Data & Analytics For ScaleStreaming Big Data & Analytics For Scale
Streaming Big Data & Analytics For Scale
 
Building Reactive Distributed Systems For Streaming Big Data, Analytics & Mac...
Building Reactive Distributed Systems For Streaming Big Data, Analytics & Mac...Building Reactive Distributed Systems For Streaming Big Data, Analytics & Mac...
Building Reactive Distributed Systems For Streaming Big Data, Analytics & Mac...
 
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch AnalysisNoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
 
Idiro Analytics - Identifying Families using Social Network Analysis and Big ...
Idiro Analytics - Identifying Families using Social Network Analysis and Big ...Idiro Analytics - Identifying Families using Social Network Analysis and Big ...
Idiro Analytics - Identifying Families using Social Network Analysis and Big ...
 
Big Telco - Yousun Jeong
Big Telco - Yousun JeongBig Telco - Yousun Jeong
Big Telco - Yousun Jeong
 
Deriving economic value for CSPs with Big Data [read-only]
Deriving economic value for CSPs with Big Data [read-only]Deriving economic value for CSPs with Big Data [read-only]
Deriving economic value for CSPs with Big Data [read-only]
 
Idiro Analytics - Analytics & Big Data
Idiro Analytics - Analytics & Big DataIdiro Analytics - Analytics & Big Data
Idiro Analytics - Analytics & Big Data
 
Telco Churn Roi V3
Telco Churn Roi V3Telco Churn Roi V3
Telco Churn Roi V3
 
Idiro Analytics - What is Rotational Churn and how can we tackle it?
Idiro Analytics - What is Rotational Churn and how can we tackle it?Idiro Analytics - What is Rotational Churn and how can we tackle it?
Idiro Analytics - What is Rotational Churn and how can we tackle it?
 
Churn modelling
Churn modellingChurn modelling
Churn modelling
 
Subscriber Churn Prediction Model using Social Network Analysis In Telecommun...
Subscriber Churn Prediction Model using Social Network Analysis In Telecommun...Subscriber Churn Prediction Model using Social Network Analysis In Telecommun...
Subscriber Churn Prediction Model using Social Network Analysis In Telecommun...
 
Idiro Analytics - Social Network Analysis for Online Gaming
Idiro Analytics - Social Network Analysis for Online GamingIdiro Analytics - Social Network Analysis for Online Gaming
Idiro Analytics - Social Network Analysis for Online Gaming
 
Churn Analysis in Telecom Industry
Churn Analysis in Telecom IndustryChurn Analysis in Telecom Industry
Churn Analysis in Telecom Industry
 
Social Network Analysis for Telecoms
Social Network Analysis for TelecomsSocial Network Analysis for Telecoms
Social Network Analysis for Telecoms
 
Predicting churn in telco industry: machine learning approach - Marko Mitić
 Predicting churn in telco industry: machine learning approach - Marko Mitić Predicting churn in telco industry: machine learning approach - Marko Mitić
Predicting churn in telco industry: machine learning approach - Marko Mitić
 
Decide on technology stack & data architecture
Decide on technology stack & data architectureDecide on technology stack & data architecture
Decide on technology stack & data architecture
 
How to use your CRM for upselling and cross-selling
How to use your CRM for upselling and cross-sellingHow to use your CRM for upselling and cross-selling
How to use your CRM for upselling and cross-selling
 
Big Data: Social Network Analysis
Big Data: Social Network AnalysisBig Data: Social Network Analysis
Big Data: Social Network Analysis
 
Big Data Analytics : A Social Network Approach
Big Data Analytics : A Social Network ApproachBig Data Analytics : A Social Network Approach
Big Data Analytics : A Social Network Approach
 

Similaire à Leveraging Kafka for Big Data in Real Time Bidding, Analytics, ML & Campaign Management for Globally Distributed Data Flows

Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...confluent
 
Rethinking Streaming Analytics For Scale
Rethinking Streaming Analytics For ScaleRethinking Streaming Analytics For Scale
Rethinking Streaming Analytics For ScaleHelena Edelson
 
Keynote Roberto Delamora - AWS Cloud Experience Argentina
Keynote Roberto Delamora - AWS Cloud Experience ArgentinaKeynote Roberto Delamora - AWS Cloud Experience Argentina
Keynote Roberto Delamora - AWS Cloud Experience ArgentinaAmazon Web Services LATAM
 
Data Analytics at Altocloud
Data Analytics at Altocloud Data Analytics at Altocloud
Data Analytics at Altocloud Altocloud
 
Leveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern AnalyticsLeveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern Analyticsconfluent
 
Orchestrate data with agility and responsiveness. Learn how to manage a commo...
Orchestrate data with agility and responsiveness. Learn how to manage a commo...Orchestrate data with agility and responsiveness. Learn how to manage a commo...
Orchestrate data with agility and responsiveness. Learn how to manage a commo...Skender Kollcaku
 
Cloud Native London 2019 Faas composition using Kafka and cloud-events
Cloud Native London 2019 Faas composition using Kafka and cloud-eventsCloud Native London 2019 Faas composition using Kafka and cloud-events
Cloud Native London 2019 Faas composition using Kafka and cloud-eventsNeil Avery
 
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...Timothy Spann
 
Orchestration Ownage - RSAC 2017
Orchestration Ownage - RSAC 2017Orchestration Ownage - RSAC 2017
Orchestration Ownage - RSAC 2017Bryce Kunz
 
Rethinking Streaming Analytics for Scale
Rethinking Streaming Analytics for ScaleRethinking Streaming Analytics for Scale
Rethinking Streaming Analytics for ScaleC4Media
 
Shared time-series-analysis-using-an-event-streaming-platform -_v2
Shared   time-series-analysis-using-an-event-streaming-platform -_v2Shared   time-series-analysis-using-an-event-streaming-platform -_v2
Shared time-series-analysis-using-an-event-streaming-platform -_v2confluent
 
Time series-analysis-using-an-event-streaming-platform -_v3_final
Time series-analysis-using-an-event-streaming-platform -_v3_finalTime series-analysis-using-an-event-streaming-platform -_v3_final
Time series-analysis-using-an-event-streaming-platform -_v3_finalconfluent
 
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkDBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkTimothy Spann
 
Re-engineering Engineering: from a cathedral to a bazaar?
Re-engineering Engineering: from a cathedral to a bazaar?Re-engineering Engineering: from a cathedral to a bazaar?
Re-engineering Engineering: from a cathedral to a bazaar?Open Networking Summits
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena EdelsonStreaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena EdelsonSpark Summit
 
PartnerSkillUp_Enable a Streaming CDC Solution
PartnerSkillUp_Enable a Streaming CDC SolutionPartnerSkillUp_Enable a Streaming CDC Solution
PartnerSkillUp_Enable a Streaming CDC SolutionTimothy Spann
 
FlinkDTW: Time-series Pattern Search at Scale Using Dynamic Time Warping - Ch...
FlinkDTW: Time-series Pattern Search at Scale Using Dynamic Time Warping - Ch...FlinkDTW: Time-series Pattern Search at Scale Using Dynamic Time Warping - Ch...
FlinkDTW: Time-series Pattern Search at Scale Using Dynamic Time Warping - Ch...Flink Forward
 
Apache Kafka® and the Data Mesh
Apache Kafka® and the Data MeshApache Kafka® and the Data Mesh
Apache Kafka® and the Data MeshConfluentInc1
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and AkkaStreaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and AkkaHelena Edelson
 
Serverless integration anatomy
Serverless integration anatomyServerless integration anatomy
Serverless integration anatomyChristina Lin
 

Similaire à Leveraging Kafka for Big Data in Real Time Bidding, Analytics, ML & Campaign Management for Globally Distributed Data Flows (20)

Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
 
Rethinking Streaming Analytics For Scale
Rethinking Streaming Analytics For ScaleRethinking Streaming Analytics For Scale
Rethinking Streaming Analytics For Scale
 
Keynote Roberto Delamora - AWS Cloud Experience Argentina
Keynote Roberto Delamora - AWS Cloud Experience ArgentinaKeynote Roberto Delamora - AWS Cloud Experience Argentina
Keynote Roberto Delamora - AWS Cloud Experience Argentina
 
Data Analytics at Altocloud
Data Analytics at Altocloud Data Analytics at Altocloud
Data Analytics at Altocloud
 
Leveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern AnalyticsLeveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern Analytics
 
Orchestrate data with agility and responsiveness. Learn how to manage a commo...
Orchestrate data with agility and responsiveness. Learn how to manage a commo...Orchestrate data with agility and responsiveness. Learn how to manage a commo...
Orchestrate data with agility and responsiveness. Learn how to manage a commo...
 
Cloud Native London 2019 Faas composition using Kafka and cloud-events
Cloud Native London 2019 Faas composition using Kafka and cloud-eventsCloud Native London 2019 Faas composition using Kafka and cloud-events
Cloud Native London 2019 Faas composition using Kafka and cloud-events
 
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
 
Orchestration Ownage - RSAC 2017
Orchestration Ownage - RSAC 2017Orchestration Ownage - RSAC 2017
Orchestration Ownage - RSAC 2017
 
Rethinking Streaming Analytics for Scale
Rethinking Streaming Analytics for ScaleRethinking Streaming Analytics for Scale
Rethinking Streaming Analytics for Scale
 
Shared time-series-analysis-using-an-event-streaming-platform -_v2
Shared   time-series-analysis-using-an-event-streaming-platform -_v2Shared   time-series-analysis-using-an-event-streaming-platform -_v2
Shared time-series-analysis-using-an-event-streaming-platform -_v2
 
Time series-analysis-using-an-event-streaming-platform -_v3_final
Time series-analysis-using-an-event-streaming-platform -_v3_finalTime series-analysis-using-an-event-streaming-platform -_v3_final
Time series-analysis-using-an-event-streaming-platform -_v3_final
 
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkDBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
 
Re-engineering Engineering: from a cathedral to a bazaar?
Re-engineering Engineering: from a cathedral to a bazaar?Re-engineering Engineering: from a cathedral to a bazaar?
Re-engineering Engineering: from a cathedral to a bazaar?
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena EdelsonStreaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
 
PartnerSkillUp_Enable a Streaming CDC Solution
PartnerSkillUp_Enable a Streaming CDC SolutionPartnerSkillUp_Enable a Streaming CDC Solution
PartnerSkillUp_Enable a Streaming CDC Solution
 
FlinkDTW: Time-series Pattern Search at Scale Using Dynamic Time Warping - Ch...
FlinkDTW: Time-series Pattern Search at Scale Using Dynamic Time Warping - Ch...FlinkDTW: Time-series Pattern Search at Scale Using Dynamic Time Warping - Ch...
FlinkDTW: Time-series Pattern Search at Scale Using Dynamic Time Warping - Ch...
 
Apache Kafka® and the Data Mesh
Apache Kafka® and the Data MeshApache Kafka® and the Data Mesh
Apache Kafka® and the Data Mesh
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and AkkaStreaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and Akka
 
Serverless integration anatomy
Serverless integration anatomyServerless integration anatomy
Serverless integration anatomy
 

Plus de Helena Edelson

Toward Predictability and Stability
Toward Predictability and StabilityToward Predictability and Stability
Toward Predictability and StabilityHelena Edelson
 
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...Helena Edelson
 
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, ScalaLambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, ScalaHelena Edelson
 
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Helena Edelson
 
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)Helena Edelson
 
Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...
Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...
Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...Helena Edelson
 

Plus de Helena Edelson (7)

Toward Predictability and Stability
Toward Predictability and StabilityToward Predictability and Stability
Toward Predictability and Stability
 
Patterns In The Chaos
Patterns In The ChaosPatterns In The Chaos
Patterns In The Chaos
 
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
 
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, ScalaLambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
 
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
 
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
 
Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...
Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...
Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...
 

Dernier

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 

Dernier (20)

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 

Leveraging Kafka for Big Data in Real Time Bidding, Analytics, ML & Campaign Management for Globally Distributed Data Flows

  • 1. @helenaedelson #kafkasummit 1 Leveraging Kafka for Big Data in Real Time Bidding, Analytics, ML & Campaign Management for Globally Distributed Data Flows Helena Edelson @helenaedelson Kafka Summit 2016
  • 2. @helenaedelson #kafkasummit VP of Engineering, Tuplejump Previously: Sr Cloud / Big Data / Analytics Engineer: DataStax, CrowdStrike, VMware, SpringSource... Event-Driven systems, Analytics, Machine Learning, Scala Committer: Kafka Connect Cassandra, Spark Cassandra Connector Contributor: Akka, previously: Spring Integration Speaker: Kafka Summit, Spark Summit, Strata, QCon, Scala Days, Scala World, Philly ETE 2 twitter.com/helenaedelson github.com/helena slideshare.net/helenaedelson
  • 3. @helenaedelson #kafkasummit The Real Topic 3 http://www.slideshare.net/palvaro/ricon-keynote-outwards-from-the-middle-of-the-maze/42
  • 4. @helenaedelson #kafkasummit Chaos Of Distribution One of the more fascinating problems is that of solving the chaos of distributed systems. Regardless of the domain. 4
  • 5. @helenaedelson #kafkasummit Aproaching this within the use case of: High-Level Landscape Platform & Infrastructure Strategies and Patterns Four-Letter Acronyms Can't Touch This Architecture 5
  • 7. @helenaedelson #kafkasummit 7 The Digital Ad Industry
  • 8. @helenaedelson #kafkasummit An RTB Drive-By Real time auction for ad spaces, all devices High throughput, low-Latency (similar to FIN Tech but not quite) OpenRTB API Spec - but not everyone uses it 8 Open protocol for automated trading of digital media across platforms, devices, and advertising solutions
  • 9. @helenaedelson #kafkasummit 9 Ad Delivered to User In A Nutshell User hits a Publisher's page Advertiser Advertiser Advertisers send Bid Requests Highest Bid Accepted
  • 10. @helenaedelson #kafkasummit 10 Site: Ad supported content Real Time Exchange & Auction (SSP): OpenRTB Server used to bid Bidder Service (DSP): OpenRTB client Advertiser:Buyer wants ad impressions. Uses bidders to bid on behalf Publisher:Seller has ad spaces to sell to highest bidders User Devices ad request winning ad bid request win notice & settlement price insert orders bid response winning ad RTB Auction for Impressions
  • 11. @helenaedelson #kafkasummit 11 Time Is Money RTB: Maximum response latency of 100 ms
  • 12. @helenaedelson #kafkasummit 12 Time Is Money Assume some network latency!
  • 13. @helenaedelson #kafkasummit Sampling of RTB Events Ad Request Bid Request - JSON 100 bytes Compute optimal bid for advertiser Bid Response - JSON 1000 bytes (may include ad metadata) Win Notification (may or may not exist) with settlement price Ad Impression - when the ad is viewed Ad Click Ad Conversion 13
  • 14. @helenaedelson #kafkasummit Event Streams Auctions: auction data + bid requests Ad Impressions: which ad ids were shown Ad Clicks: which auction ids resulted in a click Ad Conversions: streams joined on auction id Analytics Aggregations & ML to derive hundreds of metrics and dimensions 14
  • 15. @helenaedelson #kafkasummit 15 Real Time Just means Event-Driven or processing events as they arrive. Does not automatically equal sub-second latency requirements. Seen / Ingestion Time When an event is ingested into the system Event Time When an event is created, e.g. on a device.
  • 17. @helenaedelson #kafkasummit Platform Requirements 24 / 7 Uptime Brokerage model: DSPs only make $ on successful ad deliveries, so uptime is critical Security Enable service across the globe Handle thousands of concurrent requests per second Scale to traffic of 700TB per day Manage 700TB per day of data Derive Metrics 17
  • 18. @helenaedelson #kafkasummit Business Requirements Support SLAs for bid transactions Legal constraints - user data crossing borders The critical path must be fast to win No data loss on ingestion path Bid & Campaign Optimization Frequency Capping Management UI for Publishers & Advertisers 18
  • 19. @helenaedelson #kafkasummit Questions To Answer % Writes on ingestion, analytics pre-aggregation, etc. % Reads of raw data by analytics, aggregated views by customer management UI How much in memory on RTB app nodes? Dimensions of data in analytics queries Optimization Algos What needs real time feedback loops, what does not Which data flows are low-lateny/high frequency, which not Where are potential bottlenecks 19
  • 20. @helenaedelson #kafkasummit Constraints Resources - I need to build highly functioning teams that are psyched about the work and working together Budget Cloud Resources JDK Version (What?!) Existing infrastructure & technologies that will be replaced later but you have to deal with now :( 20 Pro Tip: Pay well, Allow people to grow & be creative
  • 22. @helenaedelson #kafkasummit Beware of the C word Consistency? 22 Convergence?
  • 25. @helenaedelson #kafkasummit 25 Occam's razor: Simpler theories are preferable to more complex
  • 27. @helenaedelson #kafkasummit Approaches Eventual/Tunable consistency Time & Clocks in globally-distributed systems Location Transparency Asynchrony Pub-Sub Design for scale Design for Failure 27
  • 29. @helenaedelson #kafkasummit From MVP to Scalable with Kafka Microservices Does One Thing, Knows One Thing Separate low-latency hot path Separate deploy artifacts Separate data mgmt clusters by concern analytics, timeseries, etc. CQRS: Separate Read Write paths 29 Scalpel... Separate The Monolith
  • 30. @helenaedelson #kafkasummit Immutable events stream to Kafka, partitioned by event type, time, etc. Subscribers & Publishers RTB microservices - receives raw, receives Analytics cluster - receives raw, publishes aggregates Management / Reporting nodes 30 Services communicate indirectly via Kafka
  • 31. @helenaedelson #kafkasummit CQRS: Command Query Responsibility Segregation Decouple Write streams from Read streams Different schemas / data structures Writers (Publishers) publish without having awareness who needs to receive it or how to reach them (location, protocol...) Readers (Subscribers) should be able to subscribe and asynchronously receive from topics of interest 31
  • 32. @helenaedelson #kafkasummit 32 Eventually Consistent Across DCs US-East-1 MirrorMaker EU-west-1 RTB micro services RTB micro services RTB micro services Publishers Subscribers Subscribers Publishers Kafka Cluster Per Region ZK ZK Mgmt micro services Mgmt micro services Mgmt micro services Query Layer Analytics & ML Cluster Timeseries Cluster Spark Streaming & ML Cassandra Cross DC Replication Topology Aware Spark Streaming & ML Cassandra Spark Streaming & ML Cassandra Cross DC Replication Topology Aware Spark Streaming & ML Cassandra Compute Layer
  • 33. @helenaedelson #kafkasummit 33 MirrorMaker RTB micro services RTB micro services RTB micro services Publishers Subscribers Subscribers Publishers C* C* Eventually Consistent Across DCs Mgmt micro services Mgmt micro services Mgmt micro services US-East-1 EU-west-1 Kafka Cluster Per Region Analytics & ML Cluster Timeseries Cluster Spark Streaming & ML Cassandra Cross DC Replication Topology Aware Spark Streaming & ML Cassandra Spark Streaming & ML Cassandra Cross DC Replication Topology Aware Spark Streaming & ML Cassandra Compute Layer Query Layer
  • 34. @helenaedelson #kafkasummit Kafka Cross Datacenter Mirroring bin/kafka-run-class.sh kafka.tools.MirrorMaker --consumer.config config/ consumer_source_cluster.properties --producer.config config/ producer_target_cluster.properties --whitelist bidrequests --num.producers 2 -- num.streams 4 34 Publish messages from various datacenters around the world
  • 35. @helenaedelson #kafkasummit Users in the US and UK connect DCs in their geo region for lower latency Both DCs are part of the same cluster for X-DC Replication Configure LB policies to prefer local DC LOCAL_QUORUM reads Data is available cluster-wide for backup, analytics, and to account for user travel across regions 35 Cassandra Cross DC Replication It's out of the box. Multi-region live backups for free: [ NetworkTopologyStrategy ]
  • 36. @helenaedelson #kafkasummit 36 Cassandra Cross DC Replication Keep EU User Data in the EU CREATE KEYSPACE rtb WITH REPLICATION = { ‘class’: ‘NetworkTopologyStrategy’, ‘eu-east-dc’: ‘3’,‘eu-west-dc’: ‘3’ };
  • 37. @helenaedelson #kafkasummit 37 Cassandra Time Windowed Buckets with TTL CREATE TABLE rtb.fu_events ( id int, seen_time timeuuid, event_time timestamp, PRIMARY KEY (id,date) ) WITH CLUSTERING ORDER BY (event_time DESC) AND compaction = { 'compaction_window_unit': 'DAY', 'compaction_window_size': '3', 'class':'com.jeffjirsa.cassandra.db.compaction.TimeWindowCompactionStrategy' } AND compression = { 'crc_check_chance': '0.5', 'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor' } AND bloom_filter_fp_chance = 0.01 AND caching = '{"keys":"ALL", "rows_per_partition":"100"}' AND dclocal_read_repair_chance = 0.0 AND default_time_to_live = 60 AND gc_grace_seconds = 0 AND max_index_interval = 2048 AND memtable_flush_period_in_ms = 0 AND min_index_interval = 128 AND read_repair_chance = 0.0 AND speculative_retry = '99.0PERCENTILE'; 3 DAY buckets - larger SSTables on disk minimizes bootstrapping issues when adding nodes to a cluster 3 MINUTE buckets 1 HOUR buckets 1 DAY buckets MICROSECOND resolution:
  • 38. @helenaedelson #kafkasummit 38 Want Can Or Currently Use Status But Kafka Security Kafka Security TLS, Kerberos, SASL, Auth, Encryption, Authentication v0.9.0 Thanks Jun! Integrated Streaming Kafka Streams processing inside Kafka, no alternate cluster setup or ops. v0.10 Thanks Guozhang! It's java :( Iw Cassandra CDC Cassandra CDC. Triggers? Tiggers are a pre-commit hook :( The Epic JIRA: https://issues.apache.org/jira/browse/ CASSANDRA-8844 no comment And... Kafka Streams & Kafka Connect Integration ..wait for it.. no comment Always on, X-DC Replication, Flexible Topologies Kafka, Cassandra OOTB Fault Tolerance Kafka, Spark, Mesos, Cassandra, Akka Baked In Location Transparency Kafka, Cassandra, Akka Check! Asynchrony Kafka, Cassandra, Akka Check! Decoupling Kafka, Akka Check! Pub-Sub Kafka, Cassandra, Akka Check! Immutability Kafka, Akka, Scala Check! My Nerdy Chart v2.0
  • 39. @helenaedelson #kafkasummit Kafka Streams in v 0.10 39 val builder = new KStreamBuilder()
 val stream: KStream[K,V] = builder.stream(des, des, "raw.data.topic") .flatMapValues(value -> Arrays.asList(value.toLowerCase.split(" ") .map((k,v) -> new KeyValue(k,v)) .countByKey(ser, ser, des, des, "kTable") .toStream stream.to("results.topic", ...) val streams = new KafkaStreams(builder, props) streams.start()
  • 40. @helenaedelson #kafkasummit Kafka Streams & Kafka Connect? 40 val builder = new KStreamBuilder() val stream1: KStream[K,V] = builder.stream(new CassandraConnect(configs)) .flatMapValues(..) .map((k,v) -> new KeyValue(k,v)) .countByKey(ser, ser, des, des, "kTable") .toStream stream.to("results.topic", ...) val streams = new KafkaStreams(builder, props) streams.start() YES
  • 41. @helenaedelson #kafkasummit 41 /** Writes records from Kafka to Cassandra asynchronously and non-blocking. */ override def put(records: JCollection[SinkRecord]): Unit /** Returns a list of records when available by polling for new records. */ override def poll: JList[SourceRecord]) https://github.com/tuplejump/kafka-connect-cassandra
  • 42. @helenaedelson #kafkasummit Frequency Capping 1. Count the number of times user X has seen ad Y from Advertiser A's Campaign C 2. Limit the max number of impressions of an ad within T1...T2 42 Use Case: Continuously count impressions grouped by campaign across DCs low-latency reads & writes Must scale Cross DC Counters Translation: Distributed Counters
  • 43. @helenaedelson #kafkasummit Redis? Broke under the load Aerospike? Great candidate Eventuate? Interesting, much lighter Kafka streams when it's out? Interesting, already in the infra Flink? Very interesting but... Cassandra Counters - not applicable for this 43 Frequency Capping
  • 44. @helenaedelson #kafkasummit As a distributed counting microservice As a key-value store for in-memory caching Fast reads - Very read heavy 99% reads are < 1 ms latency (sweet) 30,000 writes per second 350,000 reads per second on 7 nodes Replication factor 2: Cross datacenter replication (XDC), SSD-backed Excellent few posts by Dag, Tapads CTO on in-memory infrastructure + Ad Tech: (see resources slide) 44 Aerospike
  • 45. @helenaedelson #kafkasummit CRDT: Conflict Free Replicated Data Type State-based: objects require only eventual communication between pairs of replicas Operation-based: replication requires reliable broadcast communication with delivery in a well-defined delivery order Both guaranteed to converge towards common, correct state Keep replicas available for writes during a network partition requires resolution of conflicting writes when the partition heals 45
  • 46. @helenaedelson #kafkasummit Eventuate A toolkit for building distributed, HA & partition-tolerant event-sourced applications. Developed by Martin Krasser (@mrt1nz) for Red Bull Media (open source) Interactive, automated conflict resolution (via op-based CRDTs) Separates command side of an app from its query side (CQRS) Primary Goals: preserving causality, idempotency & event ordering guarantees even under chaotic conditions AP of CAP - conflicts cannot be prevented & must be resolved. Causality - tracked with Vector Clocks Adapters provide connectivity to other stream processing solutions Can currently chose Cassandra if desired Kafka coming soon! 46
  • 47. @helenaedelson #kafkasummit Replication of application state through async event replication across locations Locations consume replicated events to re- construct application state locally Multiple locations concurrently update as multi-master 47 Eventuate as Distributed CRDT Microservice
  • 48. @helenaedelson #kafkasummit 48 Applications can continue writing to a local replica during a network partition -> To Cassandra -> To Kafka (soon) Pass To Pipeline:
  • 49. @helenaedelson #kafkasummit 49 import scala.concurrent.Future
 import akka.actor.{ActorRef, ActorSystem}
 import com.rbmhtechnology.eventuate.crdt.{CRDTServiceOps, Counter, CounterService} class CappingService(val id: String, override val log: ActorRef)
 (implicit val system: ActorSystem,
 val integral: Integral[Int],
 override val ops: CRDTServiceOps[Counter[Int], Int])
 extends CounterService[Int](id, log) {
 
 /** Increment only op: adds `delta` to the counter identified by `id` * and returns the updated counter value. */
 def increment(id: String, delta: Int): Future[Int] =
 value(id) flatMap {
 case v if v >= 0 && (delta > 0 || delta > v) =>
 update(id, delta)
 case v =>
 Future.successful(v)
 }
 
 start()
 
 } import scala.concurrent.Future import akka.actor.ActorSystem val a = new CappingService(id1, eventLog)
 a.increment(id1, 3) // Future(3) 3 impressions
 a.value(id1) // Future(3) 3 impressions
 a.increment(id1, -2) // increments only, idempotent. val b = new CappingService(id2, eventLog) b.value(id1) // Future(a.value(id1)) Knows the same count over n-instances, all geo-locations, for the same id class CounterService[A : Integral](val replicaId: String, val log: ActorRef) { def value(id: String): Future[A] = { ... } def update(id: String, delta: A): Future[A] = { ... } }
  • 51. @helenaedelson #kafkasummit Eventuate Takeaway It's just a jar! OOTB async internal component messaging and fault tolerance Integrate with relevant microservices No store/cache cluster to deploy, just keep monitoring your apps Written in Scala Built on Akka - a toolkit for building highly concurrent, distributed, and resilient event- driven applications on the JVM 51
  • 53. @helenaedelson #kafkasummit Refresher: Sampling of RTB Events Ad Request Bid Request - JSON 100 bytes Compute optimal bid for advertiser Bid Response - JSON 1000 bytes (may include ad metadata) Win Notification (may or may not exist) with settlement price Ad Impression - when the ad is viewed Ad Click Ad Conversion 53
  • 54. @helenaedelson #kafkasummit 54 OpenRTB: objects in the Bid Request model
  • 55. @helenaedelson #kafkasummit TopK most high performing campaigns Number of views served in the last 7 days, by country, by city What determined successful ad conversions Age distribution per campaign 55 Streaming Analytics
  • 56. @helenaedelson #kafkasummit Spark Streaming Kafka class KafkaStreamingActor(ssc: StreamingContext) extends MyAggregationActor { val stream = KafkaUtils.createDirectStream(...).map(RawData(_)) stream .foreachRDD(_.toDF.write.format("filodb.spark") .option("dataset", "rawdata") .save())
 /* Pre-Aggregate data in the stream for fast querying and aggregation later stream.map(hour =>
 (hour.wsid, hour.year, hour.month, hour.day, hour.oneHourPrecip) ).saveToCassandra(timeseriesKeyspace, dailyPrecipTable) } 56 Can write to Cassandra, FiloDB...
  • 57. @helenaedelson #kafkasummit Machine Learning Train on 1+ week of data for Recommendations Bid Optimization Campaign Optimization Consumer Profiling ...and much more 57
  • 58. @helenaedelson #kafkasummit Machine Learning The probability of an ad, from a specific ISP, OS, website, demographic, etc. resulting in a conversion Which attributes of impressions are good predictors of better ad performance? 58
  • 59. @helenaedelson #kafkasummit Bid Optimization & Predictive Models Which impressions should an Advertiser bid for? Per campaign, per country it may run in..? What is the best bid for each impression 59
  • 60. @helenaedelson #kafkasummit 60 Compute optimal bid price Train the model Score bid requests Determine value of bid reqest Train on every bid req attribute Based on Campaign Objectives Against Budget Send bid decision to bidder Machine Learning
  • 61. @helenaedelson #kafkasummit Spark Streaming, MLLib & FiloDB 61 val ssc = new StreamingContext(sparkConf, Seconds(5))
 val kafkaStream = KafkaUtils.createDirectStream[..](..) .map(transformFunc) .map(LabeledPoint.parse) kafkaStream.foreachRDD(_.toDF.write.format("filodb.spark") .option("dataset", "training").save()) 
 val model = new StreamingLinearRegressionWithSGD()
 .setInitialWeights(Vectors.dense(weights)) .trainOn(dataStream.join(historicalEvents)) model.predictOnValues(dataStream.map(lp => (lp.label, lp.features))) .insertIntoFilo("predictions")
  • 62. @helenaedelson #kafkasummit 700 Queries Per Second: Spark Streaming & FiloDB Even for datasets with 15 million rows! Using FiloDB's InMemoryColumnStore Single host / MBP 5GB RAM SQL to DataFrame caching https://github.com/tuplejump/FiloDB Evan Chan's (@velvia) blog post NoLambda: A new architecture combining streaming, ad hoc, machine-learning, and batch analytics 62
  • 63. @helenaedelson #kafkasummit 63 Eventually Consistent Across DCs US-East-1 MirrorMaker EU-west-1 RTB micro services RTB micro services RTB micro services Publishers Subscribers Subscribers Publishers Kafka Cluster Per Region ZK ZK Mgmt micro services Mgmt micro services Mgmt micro services Query Layer Analytics & ML Cluster Timeseries Cluster Spark Streaming & ML Cassandra Cross DC Replication Topology Aware Spark Streaming & ML Cassandra Spark Streaming & ML Cassandra Cross DC Replication Topology Aware Spark Streaming & ML Cassandra Compute Layer
  • 64. @helenaedelson #kafkasummit Self-Healing Systems Massive event spikes & bursty traffic Fast producers / slow consumers Network partitioning & out of sync systems DC down Not DDOS'ing ourselves from fast streams No data loss when auto- scaling down 64
  • 65. @helenaedelson #kafkasummit Byzantine Fault Tolerance? 65 Looks like I'll miss standup
  • 66. @helenaedelson #kafkasummit Everything fails, all the time Monitor Everything 66
  • 67. @helenaedelson #kafkasummit Non-Monotonic Snapshot Isolation: scalable and strong consistency for geo-replicated transactional systems Conflict-free Replicated Data Types Implementing operation-based CRDTs http://codebetter.com/gregyoung/2010/02/16/cqrs-task-based-uis-event-sourcing-agh http://martinfowler.com/bliki/CQRS.html http://github.com/openrtb/OpenRTB http://akka.io http://rbmhtechnology.github.io/eventuate https://github.com/RBMHTechnology/eventuate http://rbmhtechnology.github.io/eventuate/user-guide.html#commutative-replicated-data-types http://www.planetcassandra.org/data-replication-in-nosql-databases-explained http://wikibon.org/wiki/v/Optimizing_Infrastructure_for_Analytics-Driven_Real-Time_Decision_Making Resources 67