SlideShare une entreprise Scribd logo
1  sur  55
© 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved.
利用 SageMaker 深度學習容器化
在廣告推播之應用
T r a c k 2 | S e s s i o n 5
Young Yang
ML Specialist SA
Amazon Web Services
Hsuan Chiu
Senior Data Engineer
Data Science
VPON
Machine learning workflow
Data acquisition
curation and
labeling
Data preparation
for training
Design and run
experiments
Model
development
Model
optimization
Deployment
Common machine learning setups
1. Code & frameworks
2. Compute
(CPUs, GPUs)
3. Storage
AWS
CLI
On-premises
AWS
CLI
EC2 instance
AWS
Deep Learning AMIs
Amazon Simple Storage
Service
AWS Cloud
Deep learning is computationally expensive,
but can be scaled-out
How do we go from
AWS
CLI
this,
AWS
CLI
to this
. . .
. . .
. . .
. . .
AWS Cloud
EC2 instance
Scaling-out deep learning training
Parallel experiments Distributed training
Distributing training of
a single model to train
faster
Different models
running parallel to find
the best model
But there are challenges to scaling
AWS
CLI
Code and
dependencies
Infrastructure
management
Cluster
management
. . .
. . .
. . .
. . .
AWS Cloud
Machine learning stack is complex
• “My code requires building several dependencies from source”
• “My code isn’t taking advantage of the GPU/GPUs”
• “Is cuDNN, NCCL installed? Is it the right version?”
• “My code is running slow on CPUs”
• “Oh wait, is it taking advantage of AVX instruction set”
• “I updated my drivers and training is now slower/errors out”
• “My cluster runs a different version of framework/Linux distro”
Makes portability, collaboration, and
scaling training really, really hard!
NVIDIA drivers 436.15
Ubuntu 16.04
TensorFlow 1.13
Keras
horovod
numpy
scipy
others…
Mkl 2019 v3CPU:
cuDNN 7.1
cublas 10
nccl 2
CUDA toolkit 10
GPU:
scikit-learn
pandas
openmpi
Python
My code
Development system
NVIDIA drivers 410.68
Centos 7
Training
cluster
TensorFlow 1.14
Keras
horovod
numpy
scipy
others…
Mkl 2019 v2CPU:
cuDNN 7.5
cublas 10
nccl 2.4
CUDA toolkit 10
GPU:
scikit-learn
pandas
openmpi
Python
My code
Multiple
points
of failureDevelopment
system
Training
cluster
Containers
for
machine
learning
Container runtime
Infrastructure
NVIDIA drivers
Host OS
Packages:
• Training code
• Dependencies
• Configurations
TensorFlow
mkl
cuDNN
cuBLAS
NCCL
CUDA toolkit
CPU:
GPU:
TensorFlow
container
image
Keras
horovod
numpy
scipy
others
scikit-
learn
pandas
openmpi
Python
+
Your training
scripts
ML environments that
are:
• Lightweight
• Portable
• Scalable
• Consistent
TensorFlow
mkl
cuDNN
cuBLAS
NCCL
CUDA toolkit
NVIDIA drivers
Host OS
CPU:
GPU:
Container runtime
TensorFlow
container
image
Keras
horovod
numpy
scipy
others
scikit-
learn
pandas
openmpi
Python
Development system
NVIDIA drivers
Host OS
Container runtime
Training cluster
TensorFlow
mkl
cuDNN
cuBLAS
NCCL
CUDA toolkit
CPU:
GPU:
TensorFlow
container
image
Keras
horovod
numpy
scipy
others
scikit-
learn
pandas
openmpi
Python
+
Your training
scripts
+
Your training
scripts
Container
registry
Amazon ECR
AWS Deep Learning Containers
https://docs.aws.amazon.com/dlami/latest/devguide/deep-learning-containers-
images.html
Prepackaged machine learning
container images fully
configured and validated
Optimized for performance with
latest NVIDIA driver, CUDA
libraries, and Intel libraries
Challenges with scaling deep learning
AWS
CLI
Code and
dependencies
Infrastructure
management
Cluster
management
. . .
. . .
. . .
. . .
AWS Cloud
ML infrastructure and cluster management
Compute
Where the containers run
Amazon
EC2
• Getting started easy, scaling hard
• Rely on IT/Ops for setup management
• DIY setup for ML use-cases
• Optimizing cost: DIY
Jupyter notebook
instances
High-performance
algorithms
Large-scale
training
Optimization One-click
deployment
Fully managed with
auto scaling
ML services
Fully managed service that
covers the entire machine
learning workflow
Amazon SageMaker • Easy, couple of LOC to scale
• Fully managed, no infrastructure
effort
• Designed for machine learning
• Optimizing cost: on-demand / Spot
Management
Deployment, scheduling, scaling,
and management of containerized
applications
Amazon Elastic
Kubernetes Service
Amazon Elastic
Container Service
• Getting started hard, scaling easy
• Rely on IT/Ops for setup management
• DIY setup for ML use-cases
• Optimizing cost: DIY
Hyperparameter search experiment using Amazon SageMaker
Local laptop or
desktop with
Amazon SageMaker
SDK
Custom container
Code files
Docker build
Approach:
1. Build a Docker image with your
training scripts
2. Specify instance type (CPU, GPU)
3. Specify number of instances and
hyperparameters to tune
4. Launch the tuning job
AWS CLI
Container
registry
Amazon ECR
Amazon Simple Storage
ServiceFully managed
Amazon SageMaker
cluster
© 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS如何加速
機器學習專案產品化
Hsuan Chiu
Senior Data Engineer
Data Science
VPON
Agenda
About Vpon
The Critical Question In Digital Marketing
ML Case Study – Gender Prediction
Conclusion
The Leading Big Data Company in Asia
Data Drives Transactions
Milestone
20172014 2015 2016 20182008 2011 2019
 Won Agency &
Advertiser Of The
Year in 4 categories
 Won three Hong
Kong Spark Awards
 Won Agency & Advertiser Of
The Year in 3 categories: 2 Gold
& 1 Sliver
 Festival of Media Global
 MARKies Awards
 ECI Awards
 Top Mobile Awards (TMA)
Founded in
Taipei
Establishment of Hong
Kong andTokyo offices
Won Bronze for Campaign
Greater China Specialist
Agency of the Year
Won Bronze for Campaign
Greater China Specialist
Agency of the Year for two
consecutive years
Establishment of
Singapore office
Top 10 Big Data
Solutions Providers in
the APAC Region
 Won Agency & Advertiser
Of The Year in 4 categories:
3 Golds & 1 Silver
 Campaign Digital Media Awards
 MARKies Awards 2017
 Golden Mouse
 Tiger Roar
 Won Gold for Best In-
App Advertising
 Won Bronze for Best
In-App Advertising
 Won Bronze for Best
Mobile Advertising
Platform
Received$7Min
SeriesAFunding
RaisedUS$10Min
SeriesBFunding
Won 3rd for Forbes
China’s Top 100
Privately Held Small
Businesses
2010
Establishment of
Shanghaioffice
Establishment of
Osaka office
Won one Gold and
two Bronze for
Mob-Ex Awards 2018
Won Gold for Best Location-based at
Mob-Ex Awards 2019
Mob-ex Awards 2017
Japan Office
Expansion
Mob-ex Awards 2016
Launched1st LBS
mobileadnetwork
inAsia
Won Bronze in Best In-app
Advertising
Won Bronze in Best Mobile
Advertising Platform
 Won Gold winner in the Best Data-
driven Marketing Campaign category at
Brain Magazine Awards 2019
 Mediazone’s annual Most Valuable
Services Awards in Hong Kong 2019
‧ “Most Reliable Big Data Analytics and
Application Leader”
‧ “Best Cross-Border Marketing Services”
‧ “Excellence in Corporate Governance
Customer Services”
 Won Silver for Best Idea– Mobile at
Markies Award 2019
Won Silver award for digital
transformation in eASIA Awards 2019
Won Big Data Solution award at the
Capital Magazine’s BOB Awards 2019
Tokyo
Taipei
Shanghai
Hong Kong
Singapore
Osaka
Bangkok
Vpon Big Data Group
The Leading Big Data Company in Asia
900Million
unique devices per month
21Billion
daily biddable inventory
12years
of services across APAC
7offices in Hong Kong,
Shanghai, Singapore,
Taipei, Bangkok,
Tokyo and Osaka
1500renowned brands with
collaborative experiences
Trata DMP - Largest Travel Audience Data Pool in Asia
100M+
Travel Intent
Data in Asia
60M+
China Passport
Holder
1000+
Traveler Tags
Vpon Available Tag Categories
Country
Province/ City
Gender
Income level
Device Language
Age group of family members
Ad Interests
Operation System
Ad Format Preference
Travel Pattern
App Interests
Lifestyle
Age
Fashion Style
DemographicsLocations InterestsBehaviors
Destination Country
Based on multiple combinations of the tags, you can identify some of the
hidden segment groups who may be your potential audiences with high chance.
© 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved.
The Critical Questions In Digital Marketing
Vpon AI Technology
More than 10 machine learning models with real case
applications.
Gender Prediction Model
Lookalike Modeling
Recommendation
Fraud Detection
Automated Location
Discovery
Behavior Prediction
Traffic Accident Prediction Intelligent Route Mining
Text Mining
Traveling behaviors
prediction
...
© 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Research Stage
Research Workflow
Problem
Definition
Dataset
Preparation
EvaluationModel
Training
Dataset
Understanding
Research Stage
• Binary Classification.
• Given a set of available mobile related features during a specific period, we want to predict
whether the device owner behaves more like a man or woman.
• Features
• Device geo info, app usage patterns, device info, active time range, etc.
• Goals
• Accurately identify gender for new incoming devices.
• Improve prediction accuracy for previous mis-labeled devices.
Problem Definition
Research Stage
• Ground-truth Data Analysis
• Gender Label Distribution, Feature Importance, etc.
• Data visualization
• Matplotlib, Tableau.
Dataset Understanding
Research Stage
Research Workflow
Problem
Definition
Dataset
Preparation
EvaluationModel
Training
Dataset
Understanding
Research Stage
• Data Collection
• Collect and retrieve feature data.
• Data Integration
• Enrich raw data with other useful info, e.g. google store metadata, poi.
• Data Cleaning
• Remove null or abnormal values.
Dataset Preparation
Research Stage
Research Workflow
Problem
Definition
Dataset
Preparation
EvaluationModel
Training
Dataset
Understanding
Research Stage
• Feature Engineering
• One-hot encoding, Feature combination, etc.
• Model Training and Parameter Tuning
• Logistic Regression, XGboost, Deep Learning Framework, etc.
• Feature Normalization.
Model Training
Research Stage
Research Workflow
Problem
Definition
Dataset
Preparation
EvaluationModel
Training
Dataset
Understanding
Research Stage
• Model Evaluation
• PR, ROC, F1, etc.
• Prediction Result Evaluation
• 3rd party verification: Google, FB.
Evaluation
Research Stage
Research Workflow
Problem
Definition
Dataset
Preparation
EvaluationModel
Training
Dataset
Understanding
Research Stage
• Research with local machine.
• Pull data from S3 and repeat research work on local machine.
• Research with cloud env.
• EC2.
• EMR.
• SageMaker.
Research Tools and Environments.
Amazon Simple Storage
Service
Client
Amazon Simple Storage
ServiceClient
AWS Cloud
AWS Cloud
VPC
Public subnet
Amazon EC2
Amazon EMR
From Research to Production
Hidden Technical Debt in Machine Learning Systems
Sculley et al., Hidden Technical Debt in Machine Learning Systems. NIPS 2015.
From Research to Production
Code + Model + Data
Breck et al., The ML Test Score: A Rubric for ML Production Readiness and
Technical Debt Reduction. IEEE Big Data 2017.
From Research to Production
Code + Model + Data
From Research to Production
• Composability
• Scalability
• Portability
How to scale to production?
Production Stage
Scalability
From Research to Production
• Pipeline Platform Requirements.
• Reliable.
• Visualization tool.
• Pipeline scripting languages.
Composability
From Research to Production
Composability
Amazon Simple Storage
Service
Client
AWS Cloud
VPC
Public subnet Private subnet
Amazon EMRAmazon EC2
Elastic IP
address
Security group Security group
From Research to Production
• Prediction Service.
• Prediction service is easily to be scaled out.
• Prediction service can process batch or on-line requests interchangeably.
Scalability
From Research to Production
• Multi-platform Model Deployment.
• Model deployment should not be limited to a specific platform.
• Model deployment should be easily to be integrated with other services, e.g. current existed
microservices.
• Model packaging is flexible so that adding self-made functions is achievable.
Portability
From Research to Production
• Open Source ML Pipeline Platform.
• Kubeflow, mlflow, airflow, TFX, etc.
• Prediction Service Framework.
• Sagemaker, self-made restful api service, etc.
Candidate Solutions
From Research to Production
• AWS ML Experts
• Organized 3 one-day offsite workshop together with AWS ML experts.
• Hands-on packaging ML model into container and deploying to SageMaker.
• Practice with cloud9 and SageMaker Notebook.
• Consult with ML marketplace opportunity.
Candidate Solutions
Production Stage
Tradeoff and Decision
ML Pipeline Platform Reason
Compatibility Jenkins. Lowest learning curve.
Portability Jenkins. Lowest platform transferring cost.
Scalability Jenkins. Jenkins can be deployed to k8s.
Production Stage
Tradeoff and Decision
Prediction Service Reason
Compatibility
Deploy customized SageMaker container on
VM.
1. Easily to be integrated with Jenkins.
2. Easily to be deployed back to SageMaker.
Portability
Deploy customized SageMaker container on
VM.
Easily migrate to other platforms.
Scalability Replicate VM and add LB. Easily scale out by LB.
Production Stage
Architecture
Amazon Simple Storage
Service
Client
AWS Cloud
VPC
Public subnet Private subnet
Amazon EMRAmazon EC2
Elastic IP
address
Security group Security group
Security group
Amazon EC2
Elastic IP
address
Production Stage
• Definition
• A containerized application for managing the prediction phase in machine learning product
lifecycle.
• Functions
• Official prediction requests API portal.
• http://[IP]:[port]/api/providers/prediction/create
• http://[IP]:[port]/api/providers/prediction/invoke
• http://[IP]:[port]/api/providers/prediction/check
• Monitoring
• Tracking prediction status.
• Recording and comparing prediction results.
MLapp
© 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Conclusion
• Project time allocation
• 40% on research.
• 30% on model and prediction results validation.
• 30% on mlops and developments.
• Accuracy
• Precision can achieve more than 80%.
• Total gender prediction pipeline execution time
• Less than 30 minutes with monthly data.
Current Status
Conclusion
• Ensure prediction quality at the top.
• Always understand current data distribution.
• Establish monitors to control data quality from the root.
• Keep production engineering work simple and reliable.
Lessons Learned
Thank you!
© 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Young Yang
ML Specialist SA
Amazon Web Services
Hsuan Chiu
Senior Data Engineer
Data Science
VPON

Contenu connexe

Tendances

AWS Summit Singapore Keynote with Stephen Orban - Head of Enterprise Strategy
AWS Summit Singapore Keynote with Stephen Orban - Head of Enterprise StrategyAWS Summit Singapore Keynote with Stephen Orban - Head of Enterprise Strategy
AWS Summit Singapore Keynote with Stephen Orban - Head of Enterprise StrategyAmazon Web Services
 
Financial Services Analytics on AWS
Financial Services Analytics on AWSFinancial Services Analytics on AWS
Financial Services Analytics on AWSAmazon Web Services
 
Powering Real-Time Analytics with Data Virtualization on AWS (ASEAN & ANZ)
Powering Real-Time Analytics with Data Virtualization on AWS (ASEAN & ANZ)Powering Real-Time Analytics with Data Virtualization on AWS (ASEAN & ANZ)
Powering Real-Time Analytics with Data Virtualization on AWS (ASEAN & ANZ)Denodo
 
HBX: Harvard Business School's Digital Education Goes Data-Centric with Amaz...
HBX:  Harvard Business School's Digital Education Goes Data-Centric with Amaz...HBX:  Harvard Business School's Digital Education Goes Data-Centric with Amaz...
HBX: Harvard Business School's Digital Education Goes Data-Centric with Amaz...Amazon Web Services
 
Cloud Computing for the Enterprise
Cloud Computing for the EnterpriseCloud Computing for the Enterprise
Cloud Computing for the EnterpriseAmazon Web Services
 
AWS Webcast - Discover Cloud Computing for Government
AWS Webcast - Discover Cloud Computing for GovernmentAWS Webcast - Discover Cloud Computing for Government
AWS Webcast - Discover Cloud Computing for GovernmentAmazon Web Services
 
Demystifying Cloud Economics – Think Big: How to Build an Investment Case for...
Demystifying Cloud Economics – Think Big: How to Build an Investment Case for...Demystifying Cloud Economics – Think Big: How to Build an Investment Case for...
Demystifying Cloud Economics – Think Big: How to Build an Investment Case for...Amazon Web Services
 
The Modern Day Pressures and Trends Driving Cloud Access Requirements
The Modern Day Pressures and Trends Driving Cloud Access RequirementsThe Modern Day Pressures and Trends Driving Cloud Access Requirements
The Modern Day Pressures and Trends Driving Cloud Access RequirementsAmazon Web Services
 

Tendances (20)

AWS 資料數據與 IoT
AWS 資料數據與 IoTAWS 資料數據與 IoT
AWS 資料數據與 IoT
 
AWS in FSI 2019
AWS in FSI 2019AWS in FSI 2019
AWS in FSI 2019
 
AWS Summit Singapore Keynote with Stephen Orban - Head of Enterprise Strategy
AWS Summit Singapore Keynote with Stephen Orban - Head of Enterprise StrategyAWS Summit Singapore Keynote with Stephen Orban - Head of Enterprise Strategy
AWS Summit Singapore Keynote with Stephen Orban - Head of Enterprise Strategy
 
AWS overview
AWS overviewAWS overview
AWS overview
 
AWS案例分享 – Volkswagen
AWS案例分享 – VolkswagenAWS案例分享 – Volkswagen
AWS案例分享 – Volkswagen
 
Financial Services Analytics on AWS
Financial Services Analytics on AWSFinancial Services Analytics on AWS
Financial Services Analytics on AWS
 
AWS in Financial Services
AWS in Financial ServicesAWS in Financial Services
AWS in Financial Services
 
Powering Real-Time Analytics with Data Virtualization on AWS (ASEAN & ANZ)
Powering Real-Time Analytics with Data Virtualization on AWS (ASEAN & ANZ)Powering Real-Time Analytics with Data Virtualization on AWS (ASEAN & ANZ)
Powering Real-Time Analytics with Data Virtualization on AWS (ASEAN & ANZ)
 
HBX: Harvard Business School's Digital Education Goes Data-Centric with Amaz...
HBX:  Harvard Business School's Digital Education Goes Data-Centric with Amaz...HBX:  Harvard Business School's Digital Education Goes Data-Centric with Amaz...
HBX: Harvard Business School's Digital Education Goes Data-Centric with Amaz...
 
CurrencyCloud and AWS
CurrencyCloud and AWSCurrencyCloud and AWS
CurrencyCloud and AWS
 
Cloud Computing for the Enterprise
Cloud Computing for the EnterpriseCloud Computing for the Enterprise
Cloud Computing for the Enterprise
 
Cloud Foundations
Cloud FoundationsCloud Foundations
Cloud Foundations
 
AWS-IoT-工業智造
 AWS-IoT-工業智造 AWS-IoT-工業智造
AWS-IoT-工業智造
 
AWS Webcast - Discover Cloud Computing for Government
AWS Webcast - Discover Cloud Computing for GovernmentAWS Webcast - Discover Cloud Computing for Government
AWS Webcast - Discover Cloud Computing for Government
 
The Future of Enterprise IT
The Future of Enterprise IT The Future of Enterprise IT
The Future of Enterprise IT
 
Demystifying Cloud Economics – Think Big: How to Build an Investment Case for...
Demystifying Cloud Economics – Think Big: How to Build an Investment Case for...Demystifying Cloud Economics – Think Big: How to Build an Investment Case for...
Demystifying Cloud Economics – Think Big: How to Build an Investment Case for...
 
Defining Your Cloud Strategy
Defining Your Cloud StrategyDefining Your Cloud Strategy
Defining Your Cloud Strategy
 
AWS Adoption in FSI
AWS Adoption in FSIAWS Adoption in FSI
AWS Adoption in FSI
 
The Modern Day Pressures and Trends Driving Cloud Access Requirements
The Modern Day Pressures and Trends Driving Cloud Access RequirementsThe Modern Day Pressures and Trends Driving Cloud Access Requirements
The Modern Day Pressures and Trends Driving Cloud Access Requirements
 
Cost Optimization on AWS
Cost Optimization on AWSCost Optimization on AWS
Cost Optimization on AWS
 

Similaire à Track 2 Session 5_ 利用 SageMaker 深度學習容器化在廣告推播之應用

Scaling Machine Learning from zero to millions of users (May 2019)
Scaling Machine Learning from zero to millions of users (May 2019)Scaling Machine Learning from zero to millions of users (May 2019)
Scaling Machine Learning from zero to millions of users (May 2019)Julien SIMON
 
FSI202 Machine Learning in Capital Markets
FSI202 Machine Learning in Capital MarketsFSI202 Machine Learning in Capital Markets
FSI202 Machine Learning in Capital MarketsAmazon Web Services
 
Leverage the power of machine learning on windows
Leverage the power of machine learning on windowsLeverage the power of machine learning on windows
Leverage the power of machine learning on windowsJosé António Silva
 
Canada DevOps Summit 2020 Presentation Nov_03_2020
Canada DevOps Summit 2020 Presentation Nov_03_2020Canada DevOps Summit 2020 Presentation Nov_03_2020
Canada DevOps Summit 2020 Presentation Nov_03_2020Varun Manik
 
my ppt preentation.pptx
my ppt preentation.pptxmy ppt preentation.pptx
my ppt preentation.pptxSaikiran447644
 
Sviluppa, addestra e distribuisci modelli di Machine learning su qualsiasi scala
Sviluppa, addestra e distribuisci modelli di Machine learning su qualsiasi scalaSviluppa, addestra e distribuisci modelli di Machine learning su qualsiasi scala
Sviluppa, addestra e distribuisci modelli di Machine learning su qualsiasi scalaAmazon Web Services
 
Opening Keynote by Dr. Werner Vogels
Opening Keynote by Dr. Werner VogelsOpening Keynote by Dr. Werner Vogels
Opening Keynote by Dr. Werner VogelsAmazon Web Services
 
Artificial Intelligence - Get Started - 1 episodio
Artificial Intelligence - Get Started - 1 episodioArtificial Intelligence - Get Started - 1 episodio
Artificial Intelligence - Get Started - 1 episodioAmazon Web Services
 
Leverage the power of machine learning on windows
Leverage the power of machine learning on windowsLeverage the power of machine learning on windows
Leverage the power of machine learning on windowsMia Chang
 
Google cloud Study Jam 2023.pptx
Google cloud Study Jam 2023.pptxGoogle cloud Study Jam 2023.pptx
Google cloud Study Jam 2023.pptxGDSCNiT
 
Summit Australia 2019 - Supercharge PowerPlatform with AI - Dipankar Bhattach...
Summit Australia 2019 - Supercharge PowerPlatform with AI - Dipankar Bhattach...Summit Australia 2019 - Supercharge PowerPlatform with AI - Dipankar Bhattach...
Summit Australia 2019 - Supercharge PowerPlatform with AI - Dipankar Bhattach...Andrew Ly
 
Serverless projects at Myplanet
Serverless projects at MyplanetServerless projects at Myplanet
Serverless projects at MyplanetDaniel Zivkovic
 
Build, Train and Deploy Machine Learning Models at Scale (April 2019)
Build, Train and Deploy Machine Learning Models at Scale (April 2019)Build, Train and Deploy Machine Learning Models at Scale (April 2019)
Build, Train and Deploy Machine Learning Models at Scale (April 2019)Julien SIMON
 
Integrating Amazon SageMaker into your Enterprise - AWS Online Tech Talks
Integrating Amazon SageMaker into your Enterprise - AWS Online Tech TalksIntegrating Amazon SageMaker into your Enterprise - AWS Online Tech Talks
Integrating Amazon SageMaker into your Enterprise - AWS Online Tech TalksAmazon Web Services
 

Similaire à Track 2 Session 5_ 利用 SageMaker 深度學習容器化在廣告推播之應用 (20)

Scaling Machine Learning from zero to millions of users (May 2019)
Scaling Machine Learning from zero to millions of users (May 2019)Scaling Machine Learning from zero to millions of users (May 2019)
Scaling Machine Learning from zero to millions of users (May 2019)
 
FSI202 Machine Learning in Capital Markets
FSI202 Machine Learning in Capital MarketsFSI202 Machine Learning in Capital Markets
FSI202 Machine Learning in Capital Markets
 
Leverage the power of machine learning on windows
Leverage the power of machine learning on windowsLeverage the power of machine learning on windows
Leverage the power of machine learning on windows
 
AI & AWS DeepComposer
AI & AWS DeepComposerAI & AWS DeepComposer
AI & AWS DeepComposer
 
Canada DevOps Summit 2020 Presentation Nov_03_2020
Canada DevOps Summit 2020 Presentation Nov_03_2020Canada DevOps Summit 2020 Presentation Nov_03_2020
Canada DevOps Summit 2020 Presentation Nov_03_2020
 
my ppt preentation.pptx
my ppt preentation.pptxmy ppt preentation.pptx
my ppt preentation.pptx
 
Sviluppa, addestra e distribuisci modelli di Machine learning su qualsiasi scala
Sviluppa, addestra e distribuisci modelli di Machine learning su qualsiasi scalaSviluppa, addestra e distribuisci modelli di Machine learning su qualsiasi scala
Sviluppa, addestra e distribuisci modelli di Machine learning su qualsiasi scala
 
Opening Keynote by Dr. Werner Vogels
Opening Keynote by Dr. Werner VogelsOpening Keynote by Dr. Werner Vogels
Opening Keynote by Dr. Werner Vogels
 
Artificial Intelligence - Get Started - 1 episodio
Artificial Intelligence - Get Started - 1 episodioArtificial Intelligence - Get Started - 1 episodio
Artificial Intelligence - Get Started - 1 episodio
 
Machine Learning at the Edge
Machine Learning at the EdgeMachine Learning at the Edge
Machine Learning at the Edge
 
Leverage the power of machine learning on windows
Leverage the power of machine learning on windowsLeverage the power of machine learning on windows
Leverage the power of machine learning on windows
 
Google cloud Study Jam 2023.pptx
Google cloud Study Jam 2023.pptxGoogle cloud Study Jam 2023.pptx
Google cloud Study Jam 2023.pptx
 
Introduction to Sagemaker
Introduction to SagemakerIntroduction to Sagemaker
Introduction to Sagemaker
 
Ml ops on AWS
Ml ops on AWSMl ops on AWS
Ml ops on AWS
 
Keynote & Introduction
Keynote & IntroductionKeynote & Introduction
Keynote & Introduction
 
Summit Australia 2019 - Supercharge PowerPlatform with AI - Dipankar Bhattach...
Summit Australia 2019 - Supercharge PowerPlatform with AI - Dipankar Bhattach...Summit Australia 2019 - Supercharge PowerPlatform with AI - Dipankar Bhattach...
Summit Australia 2019 - Supercharge PowerPlatform with AI - Dipankar Bhattach...
 
Serverless projects at Myplanet
Serverless projects at MyplanetServerless projects at Myplanet
Serverless projects at Myplanet
 
Build, Train and Deploy Machine Learning Models at Scale (April 2019)
Build, Train and Deploy Machine Learning Models at Scale (April 2019)Build, Train and Deploy Machine Learning Models at Scale (April 2019)
Build, Train and Deploy Machine Learning Models at Scale (April 2019)
 
qs_presentation_v_1_0
qs_presentation_v_1_0qs_presentation_v_1_0
qs_presentation_v_1_0
 
Integrating Amazon SageMaker into your Enterprise - AWS Online Tech Talks
Integrating Amazon SageMaker into your Enterprise - AWS Online Tech TalksIntegrating Amazon SageMaker into your Enterprise - AWS Online Tech Talks
Integrating Amazon SageMaker into your Enterprise - AWS Online Tech Talks
 

Plus de Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

Plus de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Track 2 Session 5_ 利用 SageMaker 深度學習容器化在廣告推播之應用

  • 1. © 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved. 利用 SageMaker 深度學習容器化 在廣告推播之應用 T r a c k 2 | S e s s i o n 5 Young Yang ML Specialist SA Amazon Web Services Hsuan Chiu Senior Data Engineer Data Science VPON
  • 2. Machine learning workflow Data acquisition curation and labeling Data preparation for training Design and run experiments Model development Model optimization Deployment
  • 3. Common machine learning setups 1. Code & frameworks 2. Compute (CPUs, GPUs) 3. Storage AWS CLI On-premises AWS CLI EC2 instance AWS Deep Learning AMIs Amazon Simple Storage Service AWS Cloud
  • 4. Deep learning is computationally expensive, but can be scaled-out How do we go from AWS CLI this, AWS CLI to this . . . . . . . . . . . . AWS Cloud EC2 instance
  • 5. Scaling-out deep learning training Parallel experiments Distributed training Distributing training of a single model to train faster Different models running parallel to find the best model
  • 6. But there are challenges to scaling AWS CLI Code and dependencies Infrastructure management Cluster management . . . . . . . . . . . . AWS Cloud
  • 7. Machine learning stack is complex • “My code requires building several dependencies from source” • “My code isn’t taking advantage of the GPU/GPUs” • “Is cuDNN, NCCL installed? Is it the right version?” • “My code is running slow on CPUs” • “Oh wait, is it taking advantage of AVX instruction set” • “I updated my drivers and training is now slower/errors out” • “My cluster runs a different version of framework/Linux distro” Makes portability, collaboration, and scaling training really, really hard!
  • 8. NVIDIA drivers 436.15 Ubuntu 16.04 TensorFlow 1.13 Keras horovod numpy scipy others… Mkl 2019 v3CPU: cuDNN 7.1 cublas 10 nccl 2 CUDA toolkit 10 GPU: scikit-learn pandas openmpi Python My code Development system NVIDIA drivers 410.68 Centos 7 Training cluster TensorFlow 1.14 Keras horovod numpy scipy others… Mkl 2019 v2CPU: cuDNN 7.5 cublas 10 nccl 2.4 CUDA toolkit 10 GPU: scikit-learn pandas openmpi Python My code Multiple points of failureDevelopment system Training cluster
  • 9. Containers for machine learning Container runtime Infrastructure NVIDIA drivers Host OS Packages: • Training code • Dependencies • Configurations TensorFlow mkl cuDNN cuBLAS NCCL CUDA toolkit CPU: GPU: TensorFlow container image Keras horovod numpy scipy others scikit- learn pandas openmpi Python + Your training scripts ML environments that are: • Lightweight • Portable • Scalable • Consistent
  • 10. TensorFlow mkl cuDNN cuBLAS NCCL CUDA toolkit NVIDIA drivers Host OS CPU: GPU: Container runtime TensorFlow container image Keras horovod numpy scipy others scikit- learn pandas openmpi Python Development system NVIDIA drivers Host OS Container runtime Training cluster TensorFlow mkl cuDNN cuBLAS NCCL CUDA toolkit CPU: GPU: TensorFlow container image Keras horovod numpy scipy others scikit- learn pandas openmpi Python + Your training scripts + Your training scripts Container registry Amazon ECR
  • 11. AWS Deep Learning Containers https://docs.aws.amazon.com/dlami/latest/devguide/deep-learning-containers- images.html Prepackaged machine learning container images fully configured and validated Optimized for performance with latest NVIDIA driver, CUDA libraries, and Intel libraries
  • 12. Challenges with scaling deep learning AWS CLI Code and dependencies Infrastructure management Cluster management . . . . . . . . . . . . AWS Cloud
  • 13. ML infrastructure and cluster management Compute Where the containers run Amazon EC2 • Getting started easy, scaling hard • Rely on IT/Ops for setup management • DIY setup for ML use-cases • Optimizing cost: DIY Jupyter notebook instances High-performance algorithms Large-scale training Optimization One-click deployment Fully managed with auto scaling ML services Fully managed service that covers the entire machine learning workflow Amazon SageMaker • Easy, couple of LOC to scale • Fully managed, no infrastructure effort • Designed for machine learning • Optimizing cost: on-demand / Spot Management Deployment, scheduling, scaling, and management of containerized applications Amazon Elastic Kubernetes Service Amazon Elastic Container Service • Getting started hard, scaling easy • Rely on IT/Ops for setup management • DIY setup for ML use-cases • Optimizing cost: DIY
  • 14. Hyperparameter search experiment using Amazon SageMaker Local laptop or desktop with Amazon SageMaker SDK Custom container Code files Docker build Approach: 1. Build a Docker image with your training scripts 2. Specify instance type (CPU, GPU) 3. Specify number of instances and hyperparameters to tune 4. Launch the tuning job AWS CLI Container registry Amazon ECR Amazon Simple Storage ServiceFully managed Amazon SageMaker cluster
  • 15. © 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS如何加速 機器學習專案產品化 Hsuan Chiu Senior Data Engineer Data Science VPON
  • 16. Agenda About Vpon The Critical Question In Digital Marketing ML Case Study – Gender Prediction Conclusion
  • 17. The Leading Big Data Company in Asia Data Drives Transactions
  • 18. Milestone 20172014 2015 2016 20182008 2011 2019  Won Agency & Advertiser Of The Year in 4 categories  Won three Hong Kong Spark Awards  Won Agency & Advertiser Of The Year in 3 categories: 2 Gold & 1 Sliver  Festival of Media Global  MARKies Awards  ECI Awards  Top Mobile Awards (TMA) Founded in Taipei Establishment of Hong Kong andTokyo offices Won Bronze for Campaign Greater China Specialist Agency of the Year Won Bronze for Campaign Greater China Specialist Agency of the Year for two consecutive years Establishment of Singapore office Top 10 Big Data Solutions Providers in the APAC Region  Won Agency & Advertiser Of The Year in 4 categories: 3 Golds & 1 Silver  Campaign Digital Media Awards  MARKies Awards 2017  Golden Mouse  Tiger Roar  Won Gold for Best In- App Advertising  Won Bronze for Best In-App Advertising  Won Bronze for Best Mobile Advertising Platform Received$7Min SeriesAFunding RaisedUS$10Min SeriesBFunding Won 3rd for Forbes China’s Top 100 Privately Held Small Businesses 2010 Establishment of Shanghaioffice Establishment of Osaka office Won one Gold and two Bronze for Mob-Ex Awards 2018 Won Gold for Best Location-based at Mob-Ex Awards 2019 Mob-ex Awards 2017 Japan Office Expansion Mob-ex Awards 2016 Launched1st LBS mobileadnetwork inAsia Won Bronze in Best In-app Advertising Won Bronze in Best Mobile Advertising Platform  Won Gold winner in the Best Data- driven Marketing Campaign category at Brain Magazine Awards 2019  Mediazone’s annual Most Valuable Services Awards in Hong Kong 2019 ‧ “Most Reliable Big Data Analytics and Application Leader” ‧ “Best Cross-Border Marketing Services” ‧ “Excellence in Corporate Governance Customer Services”  Won Silver for Best Idea– Mobile at Markies Award 2019 Won Silver award for digital transformation in eASIA Awards 2019 Won Big Data Solution award at the Capital Magazine’s BOB Awards 2019
  • 19. Tokyo Taipei Shanghai Hong Kong Singapore Osaka Bangkok Vpon Big Data Group The Leading Big Data Company in Asia 900Million unique devices per month 21Billion daily biddable inventory 12years of services across APAC 7offices in Hong Kong, Shanghai, Singapore, Taipei, Bangkok, Tokyo and Osaka 1500renowned brands with collaborative experiences
  • 20. Trata DMP - Largest Travel Audience Data Pool in Asia 100M+ Travel Intent Data in Asia 60M+ China Passport Holder 1000+ Traveler Tags
  • 21. Vpon Available Tag Categories Country Province/ City Gender Income level Device Language Age group of family members Ad Interests Operation System Ad Format Preference Travel Pattern App Interests Lifestyle Age Fashion Style DemographicsLocations InterestsBehaviors Destination Country Based on multiple combinations of the tags, you can identify some of the hidden segment groups who may be your potential audiences with high chance.
  • 22. © 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 23. The Critical Questions In Digital Marketing
  • 24. Vpon AI Technology More than 10 machine learning models with real case applications. Gender Prediction Model Lookalike Modeling Recommendation Fraud Detection Automated Location Discovery Behavior Prediction Traffic Accident Prediction Intelligent Route Mining Text Mining Traveling behaviors prediction ...
  • 25. © 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 27. Research Stage • Binary Classification. • Given a set of available mobile related features during a specific period, we want to predict whether the device owner behaves more like a man or woman. • Features • Device geo info, app usage patterns, device info, active time range, etc. • Goals • Accurately identify gender for new incoming devices. • Improve prediction accuracy for previous mis-labeled devices. Problem Definition
  • 28. Research Stage • Ground-truth Data Analysis • Gender Label Distribution, Feature Importance, etc. • Data visualization • Matplotlib, Tableau. Dataset Understanding
  • 30. Research Stage • Data Collection • Collect and retrieve feature data. • Data Integration • Enrich raw data with other useful info, e.g. google store metadata, poi. • Data Cleaning • Remove null or abnormal values. Dataset Preparation
  • 32. Research Stage • Feature Engineering • One-hot encoding, Feature combination, etc. • Model Training and Parameter Tuning • Logistic Regression, XGboost, Deep Learning Framework, etc. • Feature Normalization. Model Training
  • 34. Research Stage • Model Evaluation • PR, ROC, F1, etc. • Prediction Result Evaluation • 3rd party verification: Google, FB. Evaluation
  • 36. Research Stage • Research with local machine. • Pull data from S3 and repeat research work on local machine. • Research with cloud env. • EC2. • EMR. • SageMaker. Research Tools and Environments. Amazon Simple Storage Service Client Amazon Simple Storage ServiceClient AWS Cloud AWS Cloud VPC Public subnet Amazon EC2 Amazon EMR
  • 37. From Research to Production Hidden Technical Debt in Machine Learning Systems Sculley et al., Hidden Technical Debt in Machine Learning Systems. NIPS 2015.
  • 38. From Research to Production Code + Model + Data Breck et al., The ML Test Score: A Rubric for ML Production Readiness and Technical Debt Reduction. IEEE Big Data 2017.
  • 39. From Research to Production Code + Model + Data
  • 40. From Research to Production • Composability • Scalability • Portability How to scale to production?
  • 42. From Research to Production • Pipeline Platform Requirements. • Reliable. • Visualization tool. • Pipeline scripting languages. Composability
  • 43. From Research to Production Composability Amazon Simple Storage Service Client AWS Cloud VPC Public subnet Private subnet Amazon EMRAmazon EC2 Elastic IP address Security group Security group
  • 44. From Research to Production • Prediction Service. • Prediction service is easily to be scaled out. • Prediction service can process batch or on-line requests interchangeably. Scalability
  • 45. From Research to Production • Multi-platform Model Deployment. • Model deployment should not be limited to a specific platform. • Model deployment should be easily to be integrated with other services, e.g. current existed microservices. • Model packaging is flexible so that adding self-made functions is achievable. Portability
  • 46. From Research to Production • Open Source ML Pipeline Platform. • Kubeflow, mlflow, airflow, TFX, etc. • Prediction Service Framework. • Sagemaker, self-made restful api service, etc. Candidate Solutions
  • 47. From Research to Production • AWS ML Experts • Organized 3 one-day offsite workshop together with AWS ML experts. • Hands-on packaging ML model into container and deploying to SageMaker. • Practice with cloud9 and SageMaker Notebook. • Consult with ML marketplace opportunity. Candidate Solutions
  • 48. Production Stage Tradeoff and Decision ML Pipeline Platform Reason Compatibility Jenkins. Lowest learning curve. Portability Jenkins. Lowest platform transferring cost. Scalability Jenkins. Jenkins can be deployed to k8s.
  • 49. Production Stage Tradeoff and Decision Prediction Service Reason Compatibility Deploy customized SageMaker container on VM. 1. Easily to be integrated with Jenkins. 2. Easily to be deployed back to SageMaker. Portability Deploy customized SageMaker container on VM. Easily migrate to other platforms. Scalability Replicate VM and add LB. Easily scale out by LB.
  • 50. Production Stage Architecture Amazon Simple Storage Service Client AWS Cloud VPC Public subnet Private subnet Amazon EMRAmazon EC2 Elastic IP address Security group Security group Security group Amazon EC2 Elastic IP address
  • 51. Production Stage • Definition • A containerized application for managing the prediction phase in machine learning product lifecycle. • Functions • Official prediction requests API portal. • http://[IP]:[port]/api/providers/prediction/create • http://[IP]:[port]/api/providers/prediction/invoke • http://[IP]:[port]/api/providers/prediction/check • Monitoring • Tracking prediction status. • Recording and comparing prediction results. MLapp
  • 52. © 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 53. Conclusion • Project time allocation • 40% on research. • 30% on model and prediction results validation. • 30% on mlops and developments. • Accuracy • Precision can achieve more than 80%. • Total gender prediction pipeline execution time • Less than 30 minutes with monthly data. Current Status
  • 54. Conclusion • Ensure prediction quality at the top. • Always understand current data distribution. • Establish monitors to control data quality from the root. • Keep production engineering work simple and reliable. Lessons Learned
  • 55. Thank you! © 2020, Amazon Web Services, Inc. or its affiliates. All rights reserved. Young Yang ML Specialist SA Amazon Web Services Hsuan Chiu Senior Data Engineer Data Science VPON

Notes de l'éditeur

  1. 這是今天的摘要 一開始我會先介紹威朋 接著我想先讓大家知道 數位行銷領域 最重要的問題 然後我會分享 AWS如何幫助我們 開發性別預測的模型 以及整個產品化的過程 最後是結論
  2. 首先威朋是亞洲領先的大數據公司
  3. 我們自08年以手機廣告起家,18年我們成功轉型為數據公司 並打入日本市場,奪得各項日本政府合作案, 如JNTO 大阪觀光局等等。
  4. 我們在亞洲 有七個據點。 每個月 可以追蹤到 至少9億支不重複的行動裝置,
  5. 而這些行動裝置遍佈世界各地 以亞洲旅客為主 為此我們提供了超過一千種 數據標籤 來幫助客戶理解 裝置的族群輪廓
  6. 所以威朋提供了上千種標籤 來幫助客戶描繪 目標裝置的族群輪廓
  7. 因為在數位行銷領域上 我們長期觀察到 客戶 和合作夥伴們 最在乎的就是用戶行為輪廓
  8. 而我們 之所以能提供這些標籤 就是因為 我們的資料庫 擁有豐富的數據 這些數據是獲得用戶授權後的 去識別化資料 像是 裝置 地理位置 app操作行為 等等 。隨著時間軸的推進,這些數據呈現出來的用戶族群輪廓也會有所差異。
  9. 而我們的數據團隊 利用這些數據 開發各項演算法 及機器學習的模型 繼而推論出裝置的族群輪廓 並提供標籤服務。 比方說 我們的用戶授權資料是不帶性別的,因此我們需要透過性別預測模型,來判斷現在這個裝置的用戶 其行為是男生 還是女生。
  10. 今天 我想跟各位分享的 便是我們如何透過AWS提供的資源 將性別預測的專案,加速從研究 開發 到可上線服務的過程。
  11. 機器學習專案 在研究階段 概括可分這五步驟 首先是問題定義 和 原始資料集釐清
  12. 所以第一個問題定義上,我們把性別預測 定義成二元分類模型, 也就是 給定一組裝置特徵,模型會判斷這組裝置特徵是像男生 或是像女生。 裝置特徵值 是蒐集 在一個定義好的時間區間內 裝置的各種資料, 比如說 這個裝置的價位區間 是像蘋果這種高端價位 或是平價手機 裝置的app都是什麼類型 用戶使用app的離峰 尖峰時刻等等 隨著時間區間的向後遞推 這些裝置特徵值會動態改變 我們的目標就是隨著時間 根據動態改變的特徵值 逐步改善預測準確度
  13. 在原始資料釐清上 我們必須對性別標籤 和各特徵值進行探索, 性別標籤是來自合作客戶 提供的少量資料 特徵值 是來自前述提到的 用戶裝置數據 探索部分透過視覺化工具 如商用的tableau 或開源的matplotlib呈現
  14. 第三步 是準備模型訓練資料
  15. 從可用特徵值的資料搜集 整理 去除空值 異常職 值域區間正規化 都是這階段的工作
  16. 第四步 是模型訓練。
  17. 模型訓練 會嘗試不同模型類別, 比方說 從最簡單的邏輯斯回歸 看分類效果 判斷特徵值 跟標籤的關聯性。 再逐步調整 特徵值的種類。 並嘗試比較複雜的模型 如 XGBoost或深度學習 來提高預測準確度的可能性 模型訓練成本 和模型的可解釋性都是這步驟的重點
  18. 最後會評估模型表現。
  19. 以分類來說,單就測試資料,最常見的就是Precision Recall F1等。 而模型真實的預測結果 我們會透過第三方工具 如google 和 FB 驗證準確度,讓我們的預測結果更具公信力。
  20. 如果模型評估結果不如預期,隨時都會回到前面的任一階段,最差的就是重頭開始。
  21. 剛剛上述的實驗流程,AWS彈性的提供各項資源,讓研究者 依照各自最有生產力的方式 使用。 因為威朋很早就導入AWS,所以早期小規模的資料探索 研究者會直接從S3 拉資料下來 在本機開jupyter實驗。 稍微大規模非本機可負荷的資料量,就開ec2 或 emr 上搭配客製化的ec2 image或 emr pre-installed package進行實驗。 現在,因為sagemaker已經是十分成熟的工具,搭配cloud9 ide或 sagemaker studio等等,我們的研究人員 可以直接透過這些服務 來建立機器學習模型和資料探索,也就是我上述的五個步驟。這可以大量減少開發時間和資料工程師的工作量,因為上述這些因資料科學需要客製化的環境 AWS都處理掉了。另外sagemaker上可多人協作的開發方式,讓資料科學開發流程變得跟軟體開發一樣有效率,以現在的工具來說,直接用sagemaker執行機器學習研發專案,是我個人非常推薦的方式。 到目前為止是研究階段的工作介紹,現在我們要進入產品化的開發階段。
  22. 機器學習系統產品化是一個複雜的過程,這張圖是截自 2015 NIPS很著名的論文,如圖所描繪的 前述的研究階段產出,其實只佔總體系統非常小一塊,其他的都是要額外開發。
  23. 與傳統軟體系統開發相比,機器學習系統必須多考慮資料分佈 和模型準確度兩件事,只要輸入的資料分布有改變,整個系統的預測結果就會有問題,前面的研究階段就需要重啟。
  24. 更貼切地說,機器學習系統的研究階段和產品化階段兩者耦合度是非常高。以本圖而言,中間橘色的部分是研究階段,右邊藍色部分是產品化階段 ,左邊綠色部分是對資料分布的監控。 研究階段的資料轉換方式 特徵值正規化 和產出的模型 在預測服務中有必須有相對應的模組,可以是直接移植或重新實作, 若預測服務上線後 發現新的預測結果 不如研究階段的測試結果,或是監控模組發現 新進來的資料分布有改變,隨時都必須重啟研究階段,然後預測服務就必須跟著調整。
  25. 為了符合開發成本 和實質應用效益,同時在考慮到剛剛描述的挑戰後,我們規劃產品化的模組必須符合:組合性 擴展性 和 可攜性 。
  26. 而首要不可或缺的兩個模組 第一 是機器學習任務的中控平台 第二 是預測模組。 中控平台是負責組織管理 機器學習的各項任務。 包括從資料流ETL,特徵值萃取,預測結果更新等等。 預測模組則是專職處理預測請求。
  27. 中控平台特別看重組合性,因為中控平台 必須具備高彈性 來組織管理各項任務。 另外平台的視覺化監控工具 和資料流自動化佈署的方式也是很重要的功能。 這個平台也必須對各階段 有可以監控的模組,像是將log 和重要統計數值 整合到既有的服務像aws cloud watch,或搭配視覺化儀表板監控資料分布。
  28. 以我們內部來說,我們很早就採用Jenkins當作中控平台 來調度AWS的資源,像是EMR或S3 以定期處理許多既有的資料流工作。 這些內部資料流,靠著Jenkins來監控,都已達成自動化佈署和即時告警。 所以既有的資料流處理方式搭配jenkins,對我們來說是一種候選方案。
  29. 預測模組強調的 則是擴展能力。 比如說預測服務 必須很容易依照預測請求數量 調整運算資源的分配。 而且 預測模組不能受限於是要批次預測或線上單次預測,這樣才能減少開發成本。
  30. 不論是中控平台或預測模組 都需要具備可攜性, 也就是在部署的時候 可以不受限於特定雲端廠商。 很多公司都像我們一樣是multi-cloud,因為各雲端廠商的資料中心 佈建區域不同 所以依照業務 或各區客戶需求,可移植的系統架構是必須考慮的項目。 另外像是和既有服務的整合性,比方說威朋本身既有的廣告平台,或依自己業務需求 額外添加的客製性, 也都是這兩個模組實做上要考慮的要素。
  31. 在這兩個模組開發前,我們做了很多功課, 平台的部分 我們額外研究了許多現有的開源平台 如 kubeflow, mlflow airflow tfx等等 預測模組 也研究各項restful api service framework的實作方式.
  32. 當AWS知道我們的需求後,特別積極的請了AWS ML專家們,提供相對應的解決方案,這部分我們特別感謝。 當中,我個人覺得最值得學習的 就是第二項 model打包成container的邏輯 然後deploy到sagemaker上的方式 這邊我個人也非常推薦大家去看 amazon sagemaker developer guide 原因是這整個設計的邏輯, 幾乎是amazon自家各式各樣ML project歸納出的精華,aws把這個歸納後的樣板實作到sagemaker上,就技術面來說,都是很值得反覆閱讀的經典,它可以讓你之後在其他類似應用的架構設計上,有很多的啟發。 另外 對於最近一年才開始使用aws 開發ML應用的公司來說,如果要開發ML系統,直接使用sagemaker給定好的樣板 並發布到sagemaker上,是我個人非常建議的方式,原因也是可以省掉很多 瑣碎的客製化技術開發的事情。
  33. 在參考了各項開源專案和AWS的best practice, 最終我們採用了這個開發版本。 中控平台的部份我們還是先使用既有的由jenkins管理ETL的架構,原因是這對我們來說具有最低的學習曲線和轉換成本,jenkins在組織任務上確實也可以滿足我們現有的需求。我們認為,當下 這是最有效率的實作方式。
  34. 預測模組我們決定採用aws workshop所推薦的sagemaker container實作方式,不過我們額外客製化一些功能,像是整合mlflow和告警機制,並且先用標準docker 部署的方式 採用 EC2或EKS為部屬對象,這個目的是可以跟中控平台很快整合,也很容易做scale out. 之後小幅度的修改就很容易部署回sagemaker,由sagemaker完全代管。
  35. 簡單的描繪一下我們系統架構 (描述這張圖) 就像前面有提到的,中控平台還是採用jenkins,簡單但大量的資料流處理 仍然搭配EMR和S3來完成。 與模型特徵值轉換 和預測相關的功能,則交由預測模組,MLapp負責。Mlapp以container的方式部署在EC2或EKS上,並將這預測的任務轉成一項可由jenkins觸發的項目,整合在由jenkins管理的ML pipeleine當中。
  36. Mlapp基本功能有提供restful api的接口和預測階段的監控畫面。
  37. 最後進到了結論
  38. 總體而言,這整個專案的時間 大約分配了40%在研究階段 因為有AWS的資源和sagemaker的各項幫助上,我們大約只有花30%在開發中控平台和預測模組 另外的30%,我們則是拿來驗證和檢討模型預測準確度。 因為預測結果的準確率是整個ML系統首要關鍵的KPI 目前 系統對最新進來裝置資料 至少都可以達到80%的準確率
  39. 最後是一些個人心得分享 在執行這類ML專案 從研發到產品化的過程 首要重視的 永遠是確保預測的準確度。因為一但預測失準 整個系統就失去意義 第二 永遠都要掌握 現在資料分布的狀況。因為一旦資料分布改變,既有的model可能就喪失功能。重啟研究階段是耗時但卻必須的工作,所以越早掌握資料分布的改變,可以越早確保預測的品質。 第三,永遠都要對資料品質從資料最源頭開始監控,每一階段的ETL都需要加掛監控。 最後,在產品開發上,盡可能讓enginnering工作越簡單越穩定,比如說AWS已經提供了sagemaker 或sagemaker studio這類代管服務或ide來開發,盡可能就不要再重造輪子,因為站在巨人的肩膀上,才能一起走得既快又遠。