SlideShare une entreprise Scribd logo
1  sur  29
Télécharger pour lire hors ligne
Natural Sparksmanship
Art of Making an Analytics Enterprise cross the chasm
Rajesh Krishnan
AIMIA Inc.
Our Business
Communication
Personalisation
InsightsLoyalty
Our Business - notes
• AIMIA is data driven marketing & loyalty analytics business, regarded as a leader
by Forrester in their loyalty management wave.
– We manage UK’s largest coalition loyalty program and also manage proprietary loyalty
for customers across verticals and across geography
– We provide insights through advanced analytics to retailers & CPGs across the globe.
– We manage marketing & smart customer communications through campaigns for our
clients utilising the analytics to drive loyalty.
• Personalisation underpins the three focus areas.
• Huge influx of data today means there is opportunity to know customer better.
Personalisation can be done like never before.
Our Analytics
Data
Analytics
Products
Diagnostic
Predictive
Prescriptive
Descriptive
Our Analytics - notes
• Analytics has been the backbone of the business.
• We perform the full spectrum of analytics
– What happened (Descriptive)
– Why it happened (Diagnostic)
– What will happen (Predictive)
– What should we do (Prescriptive)
• Our barn has different breeds of horses that are trained to win different forms of
the sport. Be it Show jumping, Endurance racing or Dressage.
The Scenario
The challenge
Survival instinct Winning habit
The Challenge - notes
• We have a smart custom analytics solution called Offer Engine
– Crunches ~20billion rows of transaction data.
– Calculates probability and ranks 30 billions Customer-Product combinations
– Proven to produce impressive Marketing RoI for our clients.
• It is an algorithm built using SQL running on in-memory MPP database + shell scripts.
• We want productise & implement across our customer base.
• Major technical hurdles on development. Less flexibility to use different data sources
with different velocity. Prolonged time to take code from development to production.
• In short we wanted to find this new breed of horse which has more speed, agility &
flexibility so that it can win this personalisation game anywhere & not just in one field.
The Objective
• Rewrite the award winning concept
“Best Use of Customer Analytics/Data in a Loyalty Programme ” – 2014 Loyalty awards
• Create a flexible framework
• Custom code to a configurable product
• Better performance at a lower cost
The Journey
The schematic
Metadata
Customer
Products
Transactions
Offers
Audience
Algorithm 1
Algorithm 2
Algorithm N
Pre-
Processing
Splitter
Splitter
Join
Join
Join
Ensemble
Output /
Channels
The Schematic - notes
• The transaction & campaign data from the Enterprise Data Warehouse gets pre-
processed to feed to the algorithm.
• The algorithms get trained on the training data from the above.
• When a set of campaign audience & the current offers in store are available, they
are sent to different algorithms based on the configuration.
• An Ensemble aggregates & produce the final ranking.
• It was so obvious to rebuild the key components of the pipeline in Spark to make it
a successful product after what we heard in Spark meetups.
• The best machine learning implementation using Spark presented in these meet
ups from various domains such as banking to e-commerce were convincing.
The Roles
The Owner The Sponsor
The Trainer The Sparksman
The Jockey The field expert
The bettor The client
The Roles – notes (1)
• The challenge in enterprise adoption of Spark is different. It is a high stakes game.
• Understanding the roles is a key first step to become a great Sparksman.
• The Owner / The Sponsor:
– Probably the most important person to make your project a success.
– They are a partnership or a single owner. It is really nice to have a single owner to start
with for ease of communication.
– Need to convince the identified technology is best suited.
– You better have the perfect statistics & proof that your new breed of horse is best suited
to win even before asking to buy one to test it.
• The Trainer / The Sparksman
– We who makes this tech work for our product with thorough knowledge of the market.
– The horseman who truly understands the breed & prepares to win the specific sport.
The Roles – notes (2)
• The Jockey / The field expert:
– The operations team that makes the product run at scale & deals with issues.
– The marketing/ sales team that exactly understands the pulse of client.
– They know the + & - of the product.
– They need to believe this new tech makes their life easy & better.
– Well the Jockey plays a key role in making the horse win the race and it is a perfect
partnership between the horseman & jockey is required to make any breed win.
• The bettor / The client
– The client collects the market info and purchases the best product & expect it to help us
win their customers. They want your product to be flexible, performant & secure.
– The bettor analyses the horse info and bet on it for few races. He wants to win big
money. They want your horse to be the best of breed, well fit & shows potential.
The stages
Fear
Trust
Comfort
The Stage – notes
• Like it or not, we need to go through the following three stages for success.
• Fear
– Established technology with expertise available who can bring the best in it. Why change?
– It is not easy to make the tech work on-prem especially with lack of expertise.
– When you train a young horse, it is natural that we will have the fear of understanding what goes on its
head and find the best environment to make it shine.
• Trust
– This comes with well designed experiments of your different queries.
– So be prepared for early stage failures when you set up Spark on your laptop, a cluster on VMs on prem
without any expertise.
– Try with leading Spark distribution vendors to make the ramp up easy and painless.
– In the right environment with a natural horseman for the horse to gain trust & show its prowess quick.
• Comfort
– Need to simulate real scenario & benchmark to get comfortable. Eg. Understand effect of data skewness
in full data.
– The horse needs few real race experience for it to get used to other horses & the audience.
The Approach characteristics
Patience
Strong Focus Unconventional
Positivity
Timely course correction
The characteristics – notes (1)
• Here are the top 5 characteristics of a Sparksman
• Strong Focus
– We all know it is important for any project. But this needs higher emphasis for Spark Projects. Why?
– Spark as generic execution framework can solve lot of different pieces of the analytics pipeline. (ETL, Data
science, Streaming or even other associated benefits such as dynamic & linear scalability).
– It is easy to get distracted with the capability of Spark. So, define key problem and work to produce
solution for it.
– You pick the sport you are preparing the horse for and teach the essential skills needed to do it. As the
horse has it inherently the nature to run, jump & dance, we should not try to make the same horse trained
for race, show jump & dressage all at the same time.
• Unconventional Approach
– If you come from a traditional RDBMS/ SQL background to manipulate data, be prepared to unlearn.
– If you would like to use machine learning concepts, taking the same approach as you took in a SQL engine
on MPP database will not work or give the best benefits you expect from Spark.
– Eg. We used the predicate pushdown concept to make faster & efficient dataframes wherever possible.
– While you may not figure out what is happening, if you don’t try new options with your horse, you will not
know what works & what not. Great trainers have always tried unorthodox things.
The characteristics – notes (2)
• Patience & Positivity
– Do not give up when it does not work although it is incredibly boring at time to the same thing again.
– Repeat the task with tuning one parameter or one aspect of the code at a time. Repetition is key.
– There are times at which the it won’t work for things you don’t know or not have control of.
– Eg. Many a times our code worked for reasonable sized data & scaled linearly but breaks down at some point. GC
times were suddenly high.
– Various approaches for many days did not help until we figured out it is not the code or Spark but data skewness
is the issue. So giving the benefit of doubt to the tech worked.
– Horses resist to do tasks and sometimes their behaviour bemuses and frustrates you. You cannot give up at the first
sign of resistance and need to have the patience. Also great trainers give benefit of doubt to the horse and be
positive.
• Timely course correction
– Keep the emotion out of the solution. Don’t be rigid about your solution.
– It is important to know when to stop repeating the execution by tweaking parameters & take a new approach to
re-write the code.
– Eg: When our code tweaking didn’t work, we figured out some aggregation fails within a window function on the
full data set. So, we need to avoid window function & change approach when the dataframe is massive.
– Great trainers keep their emotion out & never take frustration on the horse but do some corrections to their
approach say by using a stronger bridle or different crop.
The set-up
2 3
45
1
Complete
Data
Enterprise
DWH
Masked
Essential Data
Linux
Server
The Engine
AWS
Cloud
Source
The Results
The Achievement
80 - 85% reduction in code base
50% less Memory & 25% less Compute used
20% performance gain (with intermediate stages on disk)
Huge savings on cost per run
The enabler
Deep
Spark
Expertise
Dev tool with
Integrated
Visualisation &
Collaboration
Large scale
optimised
Spark clusters
Unified web
interface for
Dev, Test &
Operations
Language
choice
ML Lib
25
Collect 200 bonus points when
you buy any newspaper at the
store today
Customer picks up
newspaper
One key takeaway
“There is no mysticism, no magic, or
only one method in the realm of good
horsemanship.
It’s knowing that everything you think
you know about horses may change
with the very next horse.”
-Buck Brannaman
Final few slides - notes
• We will start using real time data like weather / location of the customer in our personalisation.
• Based on our experiments we know it is a lot of data to consume & analyse in real time.
• This makes Spark very appealing, given 2.0 introduces structured streaming.
• Also the end-to-end security roadmap from databricks might remove our need for data masking.
• Spark projects vary in size, shape & complexity. Spark as a technology is evolving at great speed.
• So, be prepared to unlearn & re-learn, be willing & you can make your analytics enterprise shine
with Spark.
Credits
Musa Bilal for the inspiration
Prasad Deshpande for sharing the passion
Stuart Pearson for true Laissez-faire leadership
John Menhinick & Simon Hawkes for unparalleled support
THANK YOU.
@krajeshiyer
www.linkedin.com/in/rajkri

Contenu connexe

Tendances

Building a Location Based Social Graph in Spark at InMobi-(Seinjuti Chatterje...
Building a Location Based Social Graph in Spark at InMobi-(Seinjuti Chatterje...Building a Location Based Social Graph in Spark at InMobi-(Seinjuti Chatterje...
Building a Location Based Social Graph in Spark at InMobi-(Seinjuti Chatterje...Spark Summit
 
#GeodeSummit - Spring Data GemFire API Current and Future
#GeodeSummit - Spring Data GemFire API Current and Future#GeodeSummit - Spring Data GemFire API Current and Future
#GeodeSummit - Spring Data GemFire API Current and FuturePivotalOpenSourceHub
 
#GeodeSummit - Integration & Future Direction for Spring Cloud Data Flow & Geode
#GeodeSummit - Integration & Future Direction for Spring Cloud Data Flow & Geode#GeodeSummit - Integration & Future Direction for Spring Cloud Data Flow & Geode
#GeodeSummit - Integration & Future Direction for Spring Cloud Data Flow & GeodePivotalOpenSourceHub
 
Podila mesos con-northamerica_sep2017
Podila mesos con-northamerica_sep2017Podila mesos con-northamerica_sep2017
Podila mesos con-northamerica_sep2017Sharma Podila
 
Top 10 Application Problems
Top 10 Application ProblemsTop 10 Application Problems
Top 10 Application ProblemsAppDynamics
 
12-Step Program for Scaling Web Applications on PostgreSQL
12-Step Program for Scaling Web Applications on PostgreSQL12-Step Program for Scaling Web Applications on PostgreSQL
12-Step Program for Scaling Web Applications on PostgreSQLKonstantin Gredeskoul
 
Hazelcast for Terracotta Users
Hazelcast for Terracotta UsersHazelcast for Terracotta Users
Hazelcast for Terracotta UsersHazelcast
 
Transactional Streaming: If you can compute it, you can probably stream it.
Transactional Streaming: If you can compute it, you can probably stream it.Transactional Streaming: If you can compute it, you can probably stream it.
Transactional Streaming: If you can compute it, you can probably stream it.jhugg
 
Cloudera Manager Webinar | Cloudera Enterprise 3.7
Cloudera Manager Webinar | Cloudera Enterprise 3.7Cloudera Manager Webinar | Cloudera Enterprise 3.7
Cloudera Manager Webinar | Cloudera Enterprise 3.7Cloudera, Inc.
 
Webinar Slides: High Noon at AWS — Amazon RDS vs. Tungsten Clustering with My...
Webinar Slides: High Noon at AWS — Amazon RDS vs. Tungsten Clustering with My...Webinar Slides: High Noon at AWS — Amazon RDS vs. Tungsten Clustering with My...
Webinar Slides: High Noon at AWS — Amazon RDS vs. Tungsten Clustering with My...Continuent
 
SQL AlwaysON for SharePoint HA/DR on Azure Global Azure Bootcamp 2017 Eisenac...
SQL AlwaysON for SharePoint HA/DR on Azure Global Azure Bootcamp 2017 Eisenac...SQL AlwaysON for SharePoint HA/DR on Azure Global Azure Bootcamp 2017 Eisenac...
SQL AlwaysON for SharePoint HA/DR on Azure Global Azure Bootcamp 2017 Eisenac...Lars Platzdasch
 
UEMB210: Software Delivery: Best Practices
UEMB210: Software Delivery: Best PracticesUEMB210: Software Delivery: Best Practices
UEMB210: Software Delivery: Best PracticesIvanti
 
Putting Kafka Into Overdrive
Putting Kafka Into OverdrivePutting Kafka Into Overdrive
Putting Kafka Into OverdriveTodd Palino
 
Resource Management in Impala - StampedeCon 2016
Resource Management in Impala - StampedeCon 2016Resource Management in Impala - StampedeCon 2016
Resource Management in Impala - StampedeCon 2016StampedeCon
 
Your Guide to Streaming - The Engineer's Perspective
Your Guide to Streaming - The Engineer's PerspectiveYour Guide to Streaming - The Engineer's Perspective
Your Guide to Streaming - The Engineer's PerspectiveIlya Ganelin
 
Spark Tuning for Enterprise System Administrators
Spark Tuning for Enterprise System AdministratorsSpark Tuning for Enterprise System Administrators
Spark Tuning for Enterprise System AdministratorsAnya Bida
 

Tendances (20)

PASS Summit 2020
PASS Summit 2020PASS Summit 2020
PASS Summit 2020
 
How to Win When Migrating to Azure
How to Win When Migrating to AzureHow to Win When Migrating to Azure
How to Win When Migrating to Azure
 
Building a Location Based Social Graph in Spark at InMobi-(Seinjuti Chatterje...
Building a Location Based Social Graph in Spark at InMobi-(Seinjuti Chatterje...Building a Location Based Social Graph in Spark at InMobi-(Seinjuti Chatterje...
Building a Location Based Social Graph in Spark at InMobi-(Seinjuti Chatterje...
 
#GeodeSummit - Spring Data GemFire API Current and Future
#GeodeSummit - Spring Data GemFire API Current and Future#GeodeSummit - Spring Data GemFire API Current and Future
#GeodeSummit - Spring Data GemFire API Current and Future
 
Azure Databases with IaaS
Azure Databases with IaaSAzure Databases with IaaS
Azure Databases with IaaS
 
#GeodeSummit - Integration & Future Direction for Spring Cloud Data Flow & Geode
#GeodeSummit - Integration & Future Direction for Spring Cloud Data Flow & Geode#GeodeSummit - Integration & Future Direction for Spring Cloud Data Flow & Geode
#GeodeSummit - Integration & Future Direction for Spring Cloud Data Flow & Geode
 
Cloudera Impala
Cloudera ImpalaCloudera Impala
Cloudera Impala
 
Podila mesos con-northamerica_sep2017
Podila mesos con-northamerica_sep2017Podila mesos con-northamerica_sep2017
Podila mesos con-northamerica_sep2017
 
Top 10 Application Problems
Top 10 Application ProblemsTop 10 Application Problems
Top 10 Application Problems
 
12-Step Program for Scaling Web Applications on PostgreSQL
12-Step Program for Scaling Web Applications on PostgreSQL12-Step Program for Scaling Web Applications on PostgreSQL
12-Step Program for Scaling Web Applications on PostgreSQL
 
Hazelcast for Terracotta Users
Hazelcast for Terracotta UsersHazelcast for Terracotta Users
Hazelcast for Terracotta Users
 
Transactional Streaming: If you can compute it, you can probably stream it.
Transactional Streaming: If you can compute it, you can probably stream it.Transactional Streaming: If you can compute it, you can probably stream it.
Transactional Streaming: If you can compute it, you can probably stream it.
 
Cloudera Manager Webinar | Cloudera Enterprise 3.7
Cloudera Manager Webinar | Cloudera Enterprise 3.7Cloudera Manager Webinar | Cloudera Enterprise 3.7
Cloudera Manager Webinar | Cloudera Enterprise 3.7
 
Webinar Slides: High Noon at AWS — Amazon RDS vs. Tungsten Clustering with My...
Webinar Slides: High Noon at AWS — Amazon RDS vs. Tungsten Clustering with My...Webinar Slides: High Noon at AWS — Amazon RDS vs. Tungsten Clustering with My...
Webinar Slides: High Noon at AWS — Amazon RDS vs. Tungsten Clustering with My...
 
SQL AlwaysON for SharePoint HA/DR on Azure Global Azure Bootcamp 2017 Eisenac...
SQL AlwaysON for SharePoint HA/DR on Azure Global Azure Bootcamp 2017 Eisenac...SQL AlwaysON for SharePoint HA/DR on Azure Global Azure Bootcamp 2017 Eisenac...
SQL AlwaysON for SharePoint HA/DR on Azure Global Azure Bootcamp 2017 Eisenac...
 
UEMB210: Software Delivery: Best Practices
UEMB210: Software Delivery: Best PracticesUEMB210: Software Delivery: Best Practices
UEMB210: Software Delivery: Best Practices
 
Putting Kafka Into Overdrive
Putting Kafka Into OverdrivePutting Kafka Into Overdrive
Putting Kafka Into Overdrive
 
Resource Management in Impala - StampedeCon 2016
Resource Management in Impala - StampedeCon 2016Resource Management in Impala - StampedeCon 2016
Resource Management in Impala - StampedeCon 2016
 
Your Guide to Streaming - The Engineer's Perspective
Your Guide to Streaming - The Engineer's PerspectiveYour Guide to Streaming - The Engineer's Perspective
Your Guide to Streaming - The Engineer's Perspective
 
Spark Tuning for Enterprise System Administrators
Spark Tuning for Enterprise System AdministratorsSpark Tuning for Enterprise System Administrators
Spark Tuning for Enterprise System Administrators
 

En vedette

Distributed Data Processing using Spark by Panos Labropoulos_and Sarod Yataw...
Distributed Data Processing using Spark by  Panos Labropoulos_and Sarod Yataw...Distributed Data Processing using Spark by  Panos Labropoulos_and Sarod Yataw...
Distributed Data Processing using Spark by Panos Labropoulos_and Sarod Yataw...Spark Summit
 
A Scaleable Implemenation of Deep Leaning on Spark- Alexander Ulanov
A Scaleable Implemenation of Deep Leaning on Spark- Alexander UlanovA Scaleable Implemenation of Deep Leaning on Spark- Alexander Ulanov
A Scaleable Implemenation of Deep Leaning on Spark- Alexander UlanovSpark Summit
 
Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadin
Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadinSpark and Cassandra: An Amazing Apache Love Story by Patrick McFadin
Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadinSpark Summit
 
Netflix branding stumbles
Netflix branding stumblesNetflix branding stumbles
Netflix branding stumblesMayur Verma
 
Distributed Heterogeneous Mixture Learning On Spark
Distributed Heterogeneous Mixture Learning On SparkDistributed Heterogeneous Mixture Learning On Spark
Distributed Heterogeneous Mixture Learning On SparkSpark Summit
 
Breaking Down Analytical and Computational Barriers Across the Energy Industr...
Breaking Down Analytical and Computational Barriers Across the Energy Industr...Breaking Down Analytical and Computational Barriers Across the Energy Industr...
Breaking Down Analytical and Computational Barriers Across the Energy Industr...Spark Summit
 
Sparkling Random Ferns by P Dendek and M Fedoryszak
Sparkling Random Ferns by  P Dendek and M FedoryszakSparkling Random Ferns by  P Dendek and M Fedoryszak
Sparkling Random Ferns by P Dendek and M FedoryszakSpark Summit
 
Data Science at Scale by Sarah Guido
Data Science at Scale by Sarah GuidoData Science at Scale by Sarah Guido
Data Science at Scale by Sarah GuidoSpark Summit
 
Leveraging Docker and CoreOS to provide always available Cassandra at Instacl...
Leveraging Docker and CoreOS to provide always available Cassandra at Instacl...Leveraging Docker and CoreOS to provide always available Cassandra at Instacl...
Leveraging Docker and CoreOS to provide always available Cassandra at Instacl...DataStax
 
Distributed Heterogeneous Mixture Learning On Spark
Distributed Heterogeneous Mixture Learning On SparkDistributed Heterogeneous Mixture Learning On Spark
Distributed Heterogeneous Mixture Learning On SparkSpark Summit
 
Sparking Science up with Research Recommendations by Maya Hristakeva
Sparking Science up with Research Recommendations by Maya HristakevaSparking Science up with Research Recommendations by Maya Hristakeva
Sparking Science up with Research Recommendations by Maya HristakevaSpark Summit
 
Inside Apache SystemML by Frederick Reiss
Inside Apache SystemML by Frederick ReissInside Apache SystemML by Frederick Reiss
Inside Apache SystemML by Frederick ReissSpark Summit
 
Student Presentation Sample (Netflix) -- Information Security 365/765 -- UW-M...
Student Presentation Sample (Netflix) -- Information Security 365/765 -- UW-M...Student Presentation Sample (Netflix) -- Information Security 365/765 -- UW-M...
Student Presentation Sample (Netflix) -- Information Security 365/765 -- UW-M...Nicholas Davis
 
Shifting Data Science into High Gear
Shifting Data Science into High GearShifting Data Science into High Gear
Shifting Data Science into High GearSpark Summit
 
PowerStream: Propelling Energy Innovation with Predictive Analytics
PowerStream: Propelling Energy Innovation with Predictive AnalyticsPowerStream: Propelling Energy Innovation with Predictive Analytics
PowerStream: Propelling Energy Innovation with Predictive AnalyticsSpark Summit
 
Netflix - Book de Campanha
Netflix - Book de CampanhaNetflix - Book de Campanha
Netflix - Book de CampanhaRafael Brandani
 
Highlights and Challenges from Running Spark on Mesos in Production by Morri ...
Highlights and Challenges from Running Spark on Mesos in Production by Morri ...Highlights and Challenges from Running Spark on Mesos in Production by Morri ...
Highlights and Challenges from Running Spark on Mesos in Production by Morri ...Spark Summit
 
The Internet of Everywhere—How IBM The Weather Company Scales
The Internet of Everywhere—How IBM The Weather Company ScalesThe Internet of Everywhere—How IBM The Weather Company Scales
The Internet of Everywhere—How IBM The Weather Company ScalesSpark Summit
 

En vedette (20)

Distributed Data Processing using Spark by Panos Labropoulos_and Sarod Yataw...
Distributed Data Processing using Spark by  Panos Labropoulos_and Sarod Yataw...Distributed Data Processing using Spark by  Panos Labropoulos_and Sarod Yataw...
Distributed Data Processing using Spark by Panos Labropoulos_and Sarod Yataw...
 
A Scaleable Implemenation of Deep Leaning on Spark- Alexander Ulanov
A Scaleable Implemenation of Deep Leaning on Spark- Alexander UlanovA Scaleable Implemenation of Deep Leaning on Spark- Alexander Ulanov
A Scaleable Implemenation of Deep Leaning on Spark- Alexander Ulanov
 
Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadin
Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadinSpark and Cassandra: An Amazing Apache Love Story by Patrick McFadin
Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadin
 
Netflix branding stumbles
Netflix branding stumblesNetflix branding stumbles
Netflix branding stumbles
 
Distributed Heterogeneous Mixture Learning On Spark
Distributed Heterogeneous Mixture Learning On SparkDistributed Heterogeneous Mixture Learning On Spark
Distributed Heterogeneous Mixture Learning On Spark
 
Breaking Down Analytical and Computational Barriers Across the Energy Industr...
Breaking Down Analytical and Computational Barriers Across the Energy Industr...Breaking Down Analytical and Computational Barriers Across the Energy Industr...
Breaking Down Analytical and Computational Barriers Across the Energy Industr...
 
Sparkling Random Ferns by P Dendek and M Fedoryszak
Sparkling Random Ferns by  P Dendek and M FedoryszakSparkling Random Ferns by  P Dendek and M Fedoryszak
Sparkling Random Ferns by P Dendek and M Fedoryszak
 
Data Science at Scale by Sarah Guido
Data Science at Scale by Sarah GuidoData Science at Scale by Sarah Guido
Data Science at Scale by Sarah Guido
 
Netflix in France
Netflix in FranceNetflix in France
Netflix in France
 
Leveraging Docker and CoreOS to provide always available Cassandra at Instacl...
Leveraging Docker and CoreOS to provide always available Cassandra at Instacl...Leveraging Docker and CoreOS to provide always available Cassandra at Instacl...
Leveraging Docker and CoreOS to provide always available Cassandra at Instacl...
 
Distributed Heterogeneous Mixture Learning On Spark
Distributed Heterogeneous Mixture Learning On SparkDistributed Heterogeneous Mixture Learning On Spark
Distributed Heterogeneous Mixture Learning On Spark
 
Sparking Science up with Research Recommendations by Maya Hristakeva
Sparking Science up with Research Recommendations by Maya HristakevaSparking Science up with Research Recommendations by Maya Hristakeva
Sparking Science up with Research Recommendations by Maya Hristakeva
 
ベンダーロックインフリーのビジネスクラウドの世界
ベンダーロックインフリーのビジネスクラウドの世界ベンダーロックインフリーのビジネスクラウドの世界
ベンダーロックインフリーのビジネスクラウドの世界
 
Inside Apache SystemML by Frederick Reiss
Inside Apache SystemML by Frederick ReissInside Apache SystemML by Frederick Reiss
Inside Apache SystemML by Frederick Reiss
 
Student Presentation Sample (Netflix) -- Information Security 365/765 -- UW-M...
Student Presentation Sample (Netflix) -- Information Security 365/765 -- UW-M...Student Presentation Sample (Netflix) -- Information Security 365/765 -- UW-M...
Student Presentation Sample (Netflix) -- Information Security 365/765 -- UW-M...
 
Shifting Data Science into High Gear
Shifting Data Science into High GearShifting Data Science into High Gear
Shifting Data Science into High Gear
 
PowerStream: Propelling Energy Innovation with Predictive Analytics
PowerStream: Propelling Energy Innovation with Predictive AnalyticsPowerStream: Propelling Energy Innovation with Predictive Analytics
PowerStream: Propelling Energy Innovation with Predictive Analytics
 
Netflix - Book de Campanha
Netflix - Book de CampanhaNetflix - Book de Campanha
Netflix - Book de Campanha
 
Highlights and Challenges from Running Spark on Mesos in Production by Morri ...
Highlights and Challenges from Running Spark on Mesos in Production by Morri ...Highlights and Challenges from Running Spark on Mesos in Production by Morri ...
Highlights and Challenges from Running Spark on Mesos in Production by Morri ...
 
The Internet of Everywhere—How IBM The Weather Company Scales
The Internet of Everywhere—How IBM The Weather Company ScalesThe Internet of Everywhere—How IBM The Weather Company Scales
The Internet of Everywhere—How IBM The Weather Company Scales
 

Similaire à Natural Sparksmanship – The Art of Making an Analytics Enterprise Cross the Chasm

Your Roadmap, Your Product Story & Datadriven Product Management
Your Roadmap, Your Product Story & Datadriven Product ManagementYour Roadmap, Your Product Story & Datadriven Product Management
Your Roadmap, Your Product Story & Datadriven Product ManagementProduct School
 
Data Science Introduction by Emerging India Analytics
Data Science Introduction by Emerging India AnalyticsData Science Introduction by Emerging India Analytics
Data Science Introduction by Emerging India AnalyticsAyeshaSharma29
 
Webinar - Know Your Customer - Arya (20160526)
Webinar - Know Your Customer - Arya (20160526)Webinar - Know Your Customer - Arya (20160526)
Webinar - Know Your Customer - Arya (20160526)Turi, Inc.
 
10 Safe Essential Elements to Achieve the Benefits of SAFe
10 Safe Essential Elements to Achieve the Benefits of SAFe10 Safe Essential Elements to Achieve the Benefits of SAFe
10 Safe Essential Elements to Achieve the Benefits of SAFeCprime
 
Measuring Scrum
Measuring ScrumMeasuring Scrum
Measuring Scrumvireg
 
Sap variant configuation training
Sap variant configuation trainingSap variant configuation training
Sap variant configuation trainingKiranReddy325
 
Agile Marketing & Customer Experience
Agile Marketing & Customer ExperienceAgile Marketing & Customer Experience
Agile Marketing & Customer ExperienceRoland Smart
 
Sap rebates by dilip sadh
Sap rebates by dilip sadhSap rebates by dilip sadh
Sap rebates by dilip sadhdilipsadh
 
SpatzAI - Powering Bold Idea-sharing in Teams Spat by Spat
SpatzAI - Powering Bold Idea-sharing in Teams Spat by SpatSpatzAI - Powering Bold Idea-sharing in Teams Spat by Spat
SpatzAI - Powering Bold Idea-sharing in Teams Spat by SpatDesmond Sherlock
 
SpatzAI - Powering Bold Idea-sharing in Teams Spat by Spat
SpatzAI - Powering Bold Idea-sharing in Teams Spat by SpatSpatzAI - Powering Bold Idea-sharing in Teams Spat by Spat
SpatzAI - Powering Bold Idea-sharing in Teams Spat by SpatDesmond Sherlock
 
The Right Product Owner
The Right Product OwnerThe Right Product Owner
The Right Product OwnerRichard Cheng
 
Traders Cockpit Product Details
Traders Cockpit Product DetailsTraders Cockpit Product Details
Traders Cockpit Product DetailsTraders Cockpit
 
SpatzAI - Powering Bold Idea-sharing in Teams Spat by Spat
SpatzAI - Powering Bold Idea-sharing in Teams Spat by SpatSpatzAI - Powering Bold Idea-sharing in Teams Spat by Spat
SpatzAI - Powering Bold Idea-sharing in Teams Spat by SpatDesmond Sherlock
 
Agile and Scrum Basics
Agile and Scrum BasicsAgile and Scrum Basics
Agile and Scrum BasicsMazhar Khan
 
How to sell Services and Product effectively
How to sell Services and Product effectivelyHow to sell Services and Product effectively
How to sell Services and Product effectivelyNeeraj Yadav
 
How to sell Services & Product effectively.
How to sell Services & Product effectively.How to sell Services & Product effectively.
How to sell Services & Product effectively.Neeraj Yadav
 
Machine Learning in Production: Manu Mukerji, Strata CA March 2018
Machine Learning in Production: Manu Mukerji, Strata CA March 2018 Machine Learning in Production: Manu Mukerji, Strata CA March 2018
Machine Learning in Production: Manu Mukerji, Strata CA March 2018 Manu Mukerji
 
2 Tips on Every Sales Stage: Learning from Our Top Wins and Losses, by Sean C...
2 Tips on Every Sales Stage: Learning from Our Top Wins and Losses, by Sean C...2 Tips on Every Sales Stage: Learning from Our Top Wins and Losses, by Sean C...
2 Tips on Every Sales Stage: Learning from Our Top Wins and Losses, by Sean C...Acumatica Cloud ERP
 
Sanjoy Mondal_Purchase Sr. Executive-SCM and SAP MM end user leveal_
Sanjoy Mondal_Purchase Sr. Executive-SCM and SAP MM end user leveal_Sanjoy Mondal_Purchase Sr. Executive-SCM and SAP MM end user leveal_
Sanjoy Mondal_Purchase Sr. Executive-SCM and SAP MM end user leveal_Sanjoy Mondal
 

Similaire à Natural Sparksmanship – The Art of Making an Analytics Enterprise Cross the Chasm (20)

Your Roadmap, Your Product Story & Datadriven Product Management
Your Roadmap, Your Product Story & Datadriven Product ManagementYour Roadmap, Your Product Story & Datadriven Product Management
Your Roadmap, Your Product Story & Datadriven Product Management
 
Data Science Introduction by Emerging India Analytics
Data Science Introduction by Emerging India AnalyticsData Science Introduction by Emerging India Analytics
Data Science Introduction by Emerging India Analytics
 
Webinar - Know Your Customer - Arya (20160526)
Webinar - Know Your Customer - Arya (20160526)Webinar - Know Your Customer - Arya (20160526)
Webinar - Know Your Customer - Arya (20160526)
 
10 Safe Essential Elements to Achieve the Benefits of SAFe
10 Safe Essential Elements to Achieve the Benefits of SAFe10 Safe Essential Elements to Achieve the Benefits of SAFe
10 Safe Essential Elements to Achieve the Benefits of SAFe
 
Measuring Scrum
Measuring ScrumMeasuring Scrum
Measuring Scrum
 
Sap variant configuation training
Sap variant configuation trainingSap variant configuation training
Sap variant configuation training
 
Agile Marketing & Customer Experience
Agile Marketing & Customer ExperienceAgile Marketing & Customer Experience
Agile Marketing & Customer Experience
 
Sap rebates by dilip sadh
Sap rebates by dilip sadhSap rebates by dilip sadh
Sap rebates by dilip sadh
 
SpatzAI - Powering Bold Idea-sharing in Teams Spat by Spat
SpatzAI - Powering Bold Idea-sharing in Teams Spat by SpatSpatzAI - Powering Bold Idea-sharing in Teams Spat by Spat
SpatzAI - Powering Bold Idea-sharing in Teams Spat by Spat
 
SpatzAI - Powering Bold Idea-sharing in Teams Spat by Spat
SpatzAI - Powering Bold Idea-sharing in Teams Spat by SpatSpatzAI - Powering Bold Idea-sharing in Teams Spat by Spat
SpatzAI - Powering Bold Idea-sharing in Teams Spat by Spat
 
The Right Product Owner
The Right Product OwnerThe Right Product Owner
The Right Product Owner
 
Traders Cockpit Product Details
Traders Cockpit Product DetailsTraders Cockpit Product Details
Traders Cockpit Product Details
 
SpatzAI - Powering Bold Idea-sharing in Teams Spat by Spat
SpatzAI - Powering Bold Idea-sharing in Teams Spat by SpatSpatzAI - Powering Bold Idea-sharing in Teams Spat by Spat
SpatzAI - Powering Bold Idea-sharing in Teams Spat by Spat
 
Agile and Scrum Basics
Agile and Scrum BasicsAgile and Scrum Basics
Agile and Scrum Basics
 
Tiff Lee Resume
Tiff Lee ResumeTiff Lee Resume
Tiff Lee Resume
 
How to sell Services and Product effectively
How to sell Services and Product effectivelyHow to sell Services and Product effectively
How to sell Services and Product effectively
 
How to sell Services & Product effectively.
How to sell Services & Product effectively.How to sell Services & Product effectively.
How to sell Services & Product effectively.
 
Machine Learning in Production: Manu Mukerji, Strata CA March 2018
Machine Learning in Production: Manu Mukerji, Strata CA March 2018 Machine Learning in Production: Manu Mukerji, Strata CA March 2018
Machine Learning in Production: Manu Mukerji, Strata CA March 2018
 
2 Tips on Every Sales Stage: Learning from Our Top Wins and Losses, by Sean C...
2 Tips on Every Sales Stage: Learning from Our Top Wins and Losses, by Sean C...2 Tips on Every Sales Stage: Learning from Our Top Wins and Losses, by Sean C...
2 Tips on Every Sales Stage: Learning from Our Top Wins and Losses, by Sean C...
 
Sanjoy Mondal_Purchase Sr. Executive-SCM and SAP MM end user leveal_
Sanjoy Mondal_Purchase Sr. Executive-SCM and SAP MM end user leveal_Sanjoy Mondal_Purchase Sr. Executive-SCM and SAP MM end user leveal_
Sanjoy Mondal_Purchase Sr. Executive-SCM and SAP MM end user leveal_
 

Plus de Spark Summit

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang Spark Summit
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...Spark Summit
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang WuSpark Summit
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya RaghavendraSpark Summit
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...Spark Summit
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...Spark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingSpark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingSpark Summit
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...Spark Summit
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakSpark Summit
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimSpark Summit
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraSpark Summit
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Spark Summit
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...Spark Summit
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spark Summit
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovGoal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovSpark Summit
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Spark Summit
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkGetting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkSpark Summit
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Spark Summit
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...Spark Summit
 

Plus de Spark Summit (20)

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub Wozniak
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin Kim
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovGoal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim Simeonov
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkGetting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir Volk
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
 

Dernier

Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelBoston Institute of Analytics
 
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfWorld Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfsimulationsindia
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectBoston Institute of Analytics
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaManalVerma4
 
Rithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfRithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfrahulyadav957181
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfblazblazml
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data VisualizationKianJazayeri1
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...Jack Cole
 

Dernier (20)

Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
 
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfWorld Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis Project
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in India
 
Rithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfRithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdf
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data Visualization
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
 

Natural Sparksmanship – The Art of Making an Analytics Enterprise Cross the Chasm

  • 1. Natural Sparksmanship Art of Making an Analytics Enterprise cross the chasm Rajesh Krishnan AIMIA Inc.
  • 3. Our Business - notes • AIMIA is data driven marketing & loyalty analytics business, regarded as a leader by Forrester in their loyalty management wave. – We manage UK’s largest coalition loyalty program and also manage proprietary loyalty for customers across verticals and across geography – We provide insights through advanced analytics to retailers & CPGs across the globe. – We manage marketing & smart customer communications through campaigns for our clients utilising the analytics to drive loyalty. • Personalisation underpins the three focus areas. • Huge influx of data today means there is opportunity to know customer better. Personalisation can be done like never before.
  • 5. Our Analytics - notes • Analytics has been the backbone of the business. • We perform the full spectrum of analytics – What happened (Descriptive) – Why it happened (Diagnostic) – What will happen (Predictive) – What should we do (Prescriptive) • Our barn has different breeds of horses that are trained to win different forms of the sport. Be it Show jumping, Endurance racing or Dressage.
  • 8. The Challenge - notes • We have a smart custom analytics solution called Offer Engine – Crunches ~20billion rows of transaction data. – Calculates probability and ranks 30 billions Customer-Product combinations – Proven to produce impressive Marketing RoI for our clients. • It is an algorithm built using SQL running on in-memory MPP database + shell scripts. • We want productise & implement across our customer base. • Major technical hurdles on development. Less flexibility to use different data sources with different velocity. Prolonged time to take code from development to production. • In short we wanted to find this new breed of horse which has more speed, agility & flexibility so that it can win this personalisation game anywhere & not just in one field.
  • 9. The Objective • Rewrite the award winning concept “Best Use of Customer Analytics/Data in a Loyalty Programme ” – 2014 Loyalty awards • Create a flexible framework • Custom code to a configurable product • Better performance at a lower cost
  • 11. The schematic Metadata Customer Products Transactions Offers Audience Algorithm 1 Algorithm 2 Algorithm N Pre- Processing Splitter Splitter Join Join Join Ensemble Output / Channels
  • 12. The Schematic - notes • The transaction & campaign data from the Enterprise Data Warehouse gets pre- processed to feed to the algorithm. • The algorithms get trained on the training data from the above. • When a set of campaign audience & the current offers in store are available, they are sent to different algorithms based on the configuration. • An Ensemble aggregates & produce the final ranking. • It was so obvious to rebuild the key components of the pipeline in Spark to make it a successful product after what we heard in Spark meetups. • The best machine learning implementation using Spark presented in these meet ups from various domains such as banking to e-commerce were convincing.
  • 13. The Roles The Owner The Sponsor The Trainer The Sparksman The Jockey The field expert The bettor The client
  • 14. The Roles – notes (1) • The challenge in enterprise adoption of Spark is different. It is a high stakes game. • Understanding the roles is a key first step to become a great Sparksman. • The Owner / The Sponsor: – Probably the most important person to make your project a success. – They are a partnership or a single owner. It is really nice to have a single owner to start with for ease of communication. – Need to convince the identified technology is best suited. – You better have the perfect statistics & proof that your new breed of horse is best suited to win even before asking to buy one to test it. • The Trainer / The Sparksman – We who makes this tech work for our product with thorough knowledge of the market. – The horseman who truly understands the breed & prepares to win the specific sport.
  • 15. The Roles – notes (2) • The Jockey / The field expert: – The operations team that makes the product run at scale & deals with issues. – The marketing/ sales team that exactly understands the pulse of client. – They know the + & - of the product. – They need to believe this new tech makes their life easy & better. – Well the Jockey plays a key role in making the horse win the race and it is a perfect partnership between the horseman & jockey is required to make any breed win. • The bettor / The client – The client collects the market info and purchases the best product & expect it to help us win their customers. They want your product to be flexible, performant & secure. – The bettor analyses the horse info and bet on it for few races. He wants to win big money. They want your horse to be the best of breed, well fit & shows potential.
  • 17. The Stage – notes • Like it or not, we need to go through the following three stages for success. • Fear – Established technology with expertise available who can bring the best in it. Why change? – It is not easy to make the tech work on-prem especially with lack of expertise. – When you train a young horse, it is natural that we will have the fear of understanding what goes on its head and find the best environment to make it shine. • Trust – This comes with well designed experiments of your different queries. – So be prepared for early stage failures when you set up Spark on your laptop, a cluster on VMs on prem without any expertise. – Try with leading Spark distribution vendors to make the ramp up easy and painless. – In the right environment with a natural horseman for the horse to gain trust & show its prowess quick. • Comfort – Need to simulate real scenario & benchmark to get comfortable. Eg. Understand effect of data skewness in full data. – The horse needs few real race experience for it to get used to other horses & the audience.
  • 18. The Approach characteristics Patience Strong Focus Unconventional Positivity Timely course correction
  • 19. The characteristics – notes (1) • Here are the top 5 characteristics of a Sparksman • Strong Focus – We all know it is important for any project. But this needs higher emphasis for Spark Projects. Why? – Spark as generic execution framework can solve lot of different pieces of the analytics pipeline. (ETL, Data science, Streaming or even other associated benefits such as dynamic & linear scalability). – It is easy to get distracted with the capability of Spark. So, define key problem and work to produce solution for it. – You pick the sport you are preparing the horse for and teach the essential skills needed to do it. As the horse has it inherently the nature to run, jump & dance, we should not try to make the same horse trained for race, show jump & dressage all at the same time. • Unconventional Approach – If you come from a traditional RDBMS/ SQL background to manipulate data, be prepared to unlearn. – If you would like to use machine learning concepts, taking the same approach as you took in a SQL engine on MPP database will not work or give the best benefits you expect from Spark. – Eg. We used the predicate pushdown concept to make faster & efficient dataframes wherever possible. – While you may not figure out what is happening, if you don’t try new options with your horse, you will not know what works & what not. Great trainers have always tried unorthodox things.
  • 20. The characteristics – notes (2) • Patience & Positivity – Do not give up when it does not work although it is incredibly boring at time to the same thing again. – Repeat the task with tuning one parameter or one aspect of the code at a time. Repetition is key. – There are times at which the it won’t work for things you don’t know or not have control of. – Eg. Many a times our code worked for reasonable sized data & scaled linearly but breaks down at some point. GC times were suddenly high. – Various approaches for many days did not help until we figured out it is not the code or Spark but data skewness is the issue. So giving the benefit of doubt to the tech worked. – Horses resist to do tasks and sometimes their behaviour bemuses and frustrates you. You cannot give up at the first sign of resistance and need to have the patience. Also great trainers give benefit of doubt to the horse and be positive. • Timely course correction – Keep the emotion out of the solution. Don’t be rigid about your solution. – It is important to know when to stop repeating the execution by tweaking parameters & take a new approach to re-write the code. – Eg: When our code tweaking didn’t work, we figured out some aggregation fails within a window function on the full data set. So, we need to avoid window function & change approach when the dataframe is massive. – Great trainers keep their emotion out & never take frustration on the horse but do some corrections to their approach say by using a stronger bridle or different crop.
  • 21. The set-up 2 3 45 1 Complete Data Enterprise DWH Masked Essential Data Linux Server The Engine AWS Cloud Source
  • 23. The Achievement 80 - 85% reduction in code base 50% less Memory & 25% less Compute used 20% performance gain (with intermediate stages on disk) Huge savings on cost per run
  • 24. The enabler Deep Spark Expertise Dev tool with Integrated Visualisation & Collaboration Large scale optimised Spark clusters Unified web interface for Dev, Test & Operations Language choice ML Lib
  • 25. 25 Collect 200 bonus points when you buy any newspaper at the store today Customer picks up newspaper
  • 26. One key takeaway “There is no mysticism, no magic, or only one method in the realm of good horsemanship. It’s knowing that everything you think you know about horses may change with the very next horse.” -Buck Brannaman
  • 27. Final few slides - notes • We will start using real time data like weather / location of the customer in our personalisation. • Based on our experiments we know it is a lot of data to consume & analyse in real time. • This makes Spark very appealing, given 2.0 introduces structured streaming. • Also the end-to-end security roadmap from databricks might remove our need for data masking. • Spark projects vary in size, shape & complexity. Spark as a technology is evolving at great speed. • So, be prepared to unlearn & re-learn, be willing & you can make your analytics enterprise shine with Spark.
  • 28. Credits Musa Bilal for the inspiration Prasad Deshpande for sharing the passion Stuart Pearson for true Laissez-faire leadership John Menhinick & Simon Hawkes for unparalleled support