5. What is driving Big Data?
1. Rising volumes of data
2. Falling cost of data management tools
3. Rising number of Data Scientists
6. #1: Data volumes are growing
Growth in Unstructured Data Types of Unstructured Data
• Social Media
• Clickstream data
• Machine-Generated Data (e.g. logs)
• Internal Documents
• Notes (e.g. Patient Charts)
• Images
• Video
• Sound
13. #2: Variety
A few examples how combining data can dramatically change the way marketers gain
customer intelligence and measure campaign effectiveness:
1. CRM Data + Web Data = Understand actual lead quality not just lead quantity and
drive more intelligent drip marketing, lead nurturing and re-marketing programs
2. Call-Center Data + Web data = Analyze calls you can avoid and calls you should avoid.
(Example; calls to the Call-Center for simple customer-support or operational needs
that are already serviced online)
3. Past Purchase Data + Web Data = Segment customers based on past buying behavior,
and use this to drive targeted web campaigns to loyal customers.
4. Campaign Data + Web Data = Understand multi-touch attribution – and optimize your
campaign mix based on behaviors.
5. Social Media Data + Web Data = Measure traffic to your website from social media
campaigns and track actual conversions.
Source: “Why Web Analytics is Not Enough.” Quantivo.
17. Types of Analytics you Could Use…
• ARMA • Logistic/Lasso Regression
• CART • Logistic Regression with Adaptive
• CIR++ Platform
• Compression Nets • Monte Carlo Simulation
• Discrete Time • Multinomial Regression
Survival Analysis • Neural Networks
• D-Optimality • Optimization: LP; IP; NLP
• Ensemble Model • Poisson Mixture Model
• Gaussian Mixture Model • Random Forests
• Genetic Algorithm • Restricted Boltzmann Machine
• Gradient Boosted Trees • Sensitivity Trees
• Hierarchical Clustering • SVD
• Kalman Filter • Support Vector Machines
• K-Means
• KNN
• Linear Regression
18. Analytics that are Actually Used
Classification and
regression trees /…
69% 25% 6%
Linear Regression 66% 33%
Logistic regression or
other discrete choice…
61% 29% 10%
Association rules 49% 37% 14%
K-nearest neighbors 36% 42% 21%
Neural networks 30% 36% 34%
Box
Jenkins, Autoregressive…
30% 35% 35%
Exponential smoothing /
double exponential…
22% 43% 34%
Naïve Bayes 21% 43% 36%
Support vector machines 20% 23% 57%
Survival analysis 15% 41% 44%
Monte Carlo Simulations 13% 47% 40%
Frequently Occasionally Not at all
Classification and regression trees / decision trees and Linear Regression are
the most popular predictive analytics techniques used.
Source: Ventana Research Predictive Analytics Benchmark Research
18
20. Five Common Types of Analytics
• Classify
o Segmentation, discriminant analysis
o Clustering
o Unsupervised and supervised machine learning
• Trend
o Time-series analysis
• Optimize
o Find the optimal outcome of an objective function (min/max)
• Predict
o Predict the outcome of a single event
• Simulate
o Explore the consequences of different choices to help drive decision-
making
o Open-ended: Scenario planning, DSS
21. So, what is Analytics?
•Descriptive
•Predictive
•Prescriptive
23. What does Big Data Analytics require?
Data: data availability + storage + integration + data management
tools
+
Analytics: analytic formulas + statistical integrity + analytic
applications
+
Interpretation: business problem + domain expertise + visualization
+ decision-making
This typically requires a team of people with different skillsets.
24. What can you do with Big Data & Analytics?
1. New revenue models
o Ex: Rapleaf scraping the web, collecting contact information and
selling full datasets
1. New user experiences
o Ex: Gmail recommendations for people to CC: on your email
2. Cost optimization (i.e. deliver same product or service at less
cost)
o Ex: Give your financial advisors tools to help automate your
investment decisions
Notes de l'éditeur
In-DB analytics is a trend that is a realityIt makes sense for our product. If they want to use us for analytics but they can’t do in-DB analytics, we’re not a full packageEx: algorithm doing time-series analysis in SAS, customer wants to do time-series algorithm in InfiniDB, we don’t want to code all of the algorithms. We want to integrate with the tools out there