The gambling industry has arguably been one of the most comprehensively affected by the internet revolution, and if an organization such as William Hill hadn't adapted successfully it would have disappeared. We call this, “Going Reactive.”
The company's latest innovations are very cutting edge platforms for personalization, recommendation, and big data, which are based on Akka, Scala, Play Framework, Kafka, Cassandra, Spark, and Mesos.
Using Spark, Kafka, Cassandra and Akka on Mesos for Real-Time Personalization
1. Presented by Patrick Di Loreto
R&D Engineering Lead
14th June 2015
Site: https://developer.williamhill.com/
BLOG: http://patricknoir.blogspot.com
Twitter: https://twitter.com/patricknoir
Using Spark, Kafka, Cassandra and Akka on
Mesos for Real-Time Personalization
2. • WH Labs
• Omnia – Data Management Platform
– Omnia Chronos – A distributed Integration Middleware with Akka and Kafka
– Omnia Fates – The long term memory with Apache Cassandra
– Omnia NeoCortex – Real time and Machine Learning using Apache Spark
– Omnia Hermes – Serving layer with Akka CQRS
– Omnia Infrastructure - Mesos, Marathon and Docker
Introduction
9. Omnia Chronos
Is in charge to collect the data from
different sources and organise them
into a stream of observable events.
Observable [ ]
• Social
media
• Facebook
• Twi+er
• Affiliates
• Page
viewing
• Ar:cles
read,
following
and
followers,
bets
etc…
• Sports
related
• Tweets
• News
• Gaming
• Web
Analy:cs
• Ac:vi:es
with
in
our
applica:ons
Internal
Product
Centric
External
Customer
Centric
{
“type”
:
“bet”,
“version”
:
“1.0”
“Ame”
:
“2015-‐06-‐03
08:00:31”,
“acquisiAonTime:
“
.
.
.”,
“source”
:
“WHBetSystem”
“payload”
:
{
…
any
valid
json
}
}
10. Omnia Chronos
In Chronos you define streams that collect data and convert/
persist into a stream of Observable[Incident].
Chronos
Stream
3
Stream
2
Stream
1
Stream
12. Omnia Chronos
• Each stream is an actor which supervises its children:
– Adapter Actor
– Converter Actor
– Persistence Manager Actor
• Streams Actor are referential transparent with the usage of
Akka Cluster: We have extended Akka Cluster to migrate the
Stream Actors based on resource KPIs
• Data are persisted in Kafka for durability
• Chronos is built on top of Akka, ScalaRx and Play framework:
planning migration to Akka Streaming
14. Fates represents the long term memory of Omnia. Is in charge to organise all the incidents recorded by Chronos into
timelines and create new information as views by using machine learning, logical reasoning and time series analysis.
• A timeline represents the history, the sequence of incidents performed by a specific entity over the time. Timelines
are organised per categories. An example of timeline can be the customer timeline, which might contain all the bets
placed, deposit and withdraw activities, tweets etc... performed by the specific customer.
A timeline category is not limited just to customers, it can be anything, for example: Sport Event: football match,
competition
• Views are the result of job task that elaborates data from:
– Timelines
– Other Views
Omnia Fates
15. Timelines are created from timeline streams, each timeline stream read data from a Chronos stream and
fed the right timeline.
Omnia FatesChronos
Fates
16. • Fates persist timelines of incidents.
• Column Family Name: <TimelineCategory>_tl
• Key Definition: ( (entityId, date), timestamp )
• The partition key is a strong hash key : well balanced Cassandra Cluster
• Composite key: incidents are ordered by timestamp under a specific entity within a day
(date = yyyy-MM-dd )
Omnia Fates - Cassandra
17. Omnia Fates
• We build views with job able to do:
Jobs are performed on top of NeoCortex
Logical
Reasoning
• Deduc:on
• Induc:on
• Abduc:on
Time
line
analysis
• Trends
• Cycles
• Seasonality
Other
ML
• Classifica:on
• Clustering
• Predic:ons
19. Omnia Neo Cortex
• Neo Cortex is a library developed on top of Apache Spark in order to provide to the
developers an easy way to write micro services on top of Omnia.
• In NeoCortex we use the distribute nature of Spark to perform fast, real time data
processing and we hide to the developer the problematic relative to the connection to
the source system (Chronos) and the publishing layer
• Typeclass definition for: Timeline, View, ChronosStream etc…
• Typeclass definition for Algebrical structures:
– Monoids, Rings, Groups, providing advanced functions for: moving averages,
ARX, ARMA etc
23. Hermes
Is the layer on which data get represented for consumption: B2B and B2C. At its
foundation micro-services, notifications and data as API are key aspects of the design
Scalable and simple full duplex communication for the web
Express the correlation between the entities of the model
Inspired by Falcor (Netflix) and GraphQL (Facebook)
27. Use Omnia on Omnia
Mesos
Marathon
Docker
(Applica:on
Repository)
Docker
Omnia
App
Docker
Omnia
App
Docker
Omnia
App
Chronos
NeoCortex
(Speed
Layer)
Fates
(Batch
Layer)
JMX
JMX
JMX
Health
Stream