Big Data & Analytics - Use Cases in Mobile, E-commerce, Media and more

Big Data & Analytics
Use Cases in
Mobile, E-commerce, Media and more
Russell Nash
AWS Solutions Architect

Product?
Do we have a product?
Can we ship?
How to develop faster?
Better? Cheaper?
Market?
Can we scale?
What do people do & why?
How do we optimize?

• 10 million guests
• 550,000 properties listed
• Massive growth on AWS
• $776.4M from top investors
• $10B valuation – more than Hyatt

“At Airbnb, we look into all possible ways to
improve our product and user experience. Often
times this involves lots of analytics behind the
http://nerds.airbnb.com/redshift-performance-cost/
scene.”
Henry Cai 蔡明航
Software Engineer, Growth at Airbnb

The best startups use AWS for analytics…

Agenda
• Big Data Overview
• MapReduce / Hadoop
• Case Study: Yelp
• Data Warehousing
• Case Study: Foursquare
• NoSQL
• Case Study: AdRoll
• Streaming
• Case Study: Supercell

KINESIS
EMR Redshift DynamoDB

Structure
High Low
Large
Size
Small
Traditional
Database
Hadoop
NoSQL
MPP DW

Hadoop MPP NoSQL
Structure
Latency
Interfaces

Background
• 2004 – Map Reduce
• 2006 – Hadoop

Input
File
Functions Output
Hadoop cluster
1. Very Flexible
2. Very Scalable
3. Often Transient

Big Data Verticals and Use cases
Media/Advertising
Targeted
Advertising
Image and
Video
Processing
Oil & Gas
Seismic
Analysis
Retail
Recommendation
s
Transactions
Analysis
Life Sciences
Genome
Analysis
Financial Services
Monte Carlo
Simulations
Risk
Analysis
Security
Anti-virus
Fraud
Detection
Image
Recognition
Social
Network/Gaming
User
Demographics
Usage
analysis
In-game
metrics

Deployment Options
On-premise
Cloud
Managed on Cloud

Amazon
Elastic MapReduce
Manageability
Scalability
Cost

400 GB of logs per day
~12 Terabytes per month

1) Load log file data for six months
of user search history into Amazon
S3
Amazon S3
Search ID Search Text Final Selection
12423451 westen Westin
14235235 wisten Westin
54332232 westenn Westin
12423451
14235235
54332232
12423451
14235235
54332232
12423451
14235235
54332232
12423451
14235235
54332232
12423451

Amazon S3 Amazon EMR
Log Files
2) Spin up a 200 node cluster
Hadoop Cluster

3) 200 nodes simultaneously analyze this
data looking for common misspellings
… this takes a few hours
Hadoop Cluster

4) New common misspellings and
suggestions loaded back into S3
Hadoop Cluster
Log Files

5) When the job is done, the cluster is
shut down.
Log Files

E-Commerce Case Study
• Online Marketplace
• EMR
–Weblog analysis
– Recommendations
• Link logs with production database in EMR
“Enables us to focus on developing our…analysis stack
without worrying about the underlying infrastructure”

Hadoop MPP NoSQL
Structure
Latency
Interfaces
Any
Mins-Hours
Programming
SQL-Like
Tools

Background
MPP = Massively Parallel Processing
SQL Databases for analytical workloads
Performance
Scalability
Ease of Use
Cost

1. SQL
2. High Performance
3. Broad Toolset

Amazon Redshift
Manageability
Scalability
Cost

Mobile Case Study
• Location based social app
• 40 Million users
• 4.5 Billion check-ins
• Multi-terabytes of log data

Who is checking in?
0.6
0.5
0.4
0.3
0.2
0.1
0
Gender
Female Male
Age
0 20 40 60 80

When do people go to a place?
Gorilla Coffee
Gray's Papaya
Amorino
Thursday Friday Saturday Sunday

“Using Amazon Redshift has enabled the
company to perform more agile analytics
while saving costs.”

Media Case Study
• Placeshifting and media streaming
• Collect terabytes of event logs
• Viewership, devices etc
• Hadoop for transformation
• Redshift for analysis
“Redshift allows us to turn on a dime”

Performance Evaluation on 2B Rows
Traditional
SQL Database
Amazon
Redshift
Aggregate by month 02:08:35 00:35:46 00:00:12

Hadoop MPP NoSQL
Structure
Latency
Interfaces
Any Full
Mins-Hours Seconds-Minutes
Programming
SQL-Like
Tools
SQL
BI Tools

Background
Databases for webscale transactions
Performance
Flexibility

ID Age State
123 20 CA
345 25 WA
678 40 FL
Relational Table
ID Attributes
123 Age:20, State:CA
345 Age:25, Country: Australia, Gender: F, Smoker: No
678 Age:40
Non-Relational Table

Amazon
DynamoDB
Manageability
Scalability
Cost

Pixel “fires”
Serve ad?
Ad served

If you can’t reply in 100ms… It doesn’t matter anymore!
Network
40
Buffer
20
Processing
40

Hadoop MPP NoSQL
Structure
Latency
Interfaces
Any Full Semi
Mins-Hours Seconds-Minutes Sub-second
Programming
SQL-Like
Tools
SQL Programming
Tools

Use Cases
• Gaming analytics
• Sensor networks analytics
• Ad network analytics
• Log centralization
• Click stream analysis
• Hardware and software appliance metrics
• …more…

Data
Sources
App.4
[Machine
Learning]
AWS Endpoint
App.1
[Aggregate &
De-Duplicate]
Data
Sources
Data
Sources
Data
Sources
App.2
[Metric
Extraction]
S3
DynamoDB
Redshift
App.3
[Sliding
Window
Analysis]
Data
Sources
Availability
Zone
Availability
Zone
Shard 1
Shard 2
Shard N
Availability
Zone
Amazon Kinesis
EMR

“Amazon Kinesis enables our business-critical analytics and dashboard
applications to reliably get the data streams they need, without delays. Amazon
Kinesis also offloads a lot of developer burden in building a real-time, streaming
data ingestion platform, and enables Supercell to focus on delivering games that
delight players worldwide.”
Sami Yliharju, Supercell Services Lead

Big Data Tutorials
aws.amazon.com/big-data
Redshift Free Trial
aws.amazon.com/redshift/free-trial

Big Data & Analytics
Use Cases in
Mobile, e-commerce, media and more
Russell Nash
AWS Solutions Architect

Big Data & Analytics - Use Cases in Mobile, E-commerce, Media and more

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (20)

Similaire à Big Data & Analytics - Use Cases in Mobile, E-commerce, Media and more

Similaire à Big Data & Analytics - Use Cases in Mobile, E-commerce, Media and more (20)

Plus de Amazon Web Services

Plus de Amazon Web Services (20)

Dernier

Dernier (20)

Big Data & Analytics - Use Cases in Mobile, E-commerce, Media and more

Notes de l'éditeur