SlideShare une entreprise Scribd logo
1  sur  18
Télécharger pour lire hors ligne
Trenowanie i wdrażanie modeli uczenia
maszynowego z wykorzystaniem GCP
Maciej Pieńkosz
Data Science Summit 2020 1
What we do at Sotrender
2
Our models
1. Sentiment
2. Hatespeech
3. Topic modelling
4. Keyphrase extractor
5. NER (brands and products)
6. Image Tagger
7. Text Extractor
8. Logo Detector
9. Post Classifier
10. ….
3
ML models lifecycle
1. Planning and project setup
2. Data collection and labeling
3. Modeling and exploration
4. Model training and refinement
5. Testing and evaluation
6. Model deployment
7. Ongoing model maintenance and monitoring
4
https://www.jeremyjordan.me/ml-projects-guide/
Modeling with AI Notebooks
1. We use Google Cloud Platform as our cloud provider
2. AI Platform Notebooks is used for initial data exploration and modeling
3. For the start, we favor faster, simpler model architectures that can be easily built,
validated, iterated and eventually deployed (usually on CPU)
4. Experiment tracking: MlFlow
5
https://databricks.com/blog/2018/06/05/introducing-mlflow-an-open-source-machine-learning-platform.html
https://cloud.google.com/ai-platform-notebooks?hl=id
Structuring training code
• Notebooks disadvantages:
– You pay for the whole time the notebook is running
– Code quality is usually lower
– Hard to parametrize, unit test, and review
• After initial experimentation phase, we try to give more structure to
the model training code:
– Refactor codebase to Python packages and modules and move
to git repository (Gitlab)
– Add tests (more on it later)
– Wrap code into a Docker container
– Use dedicated AI Platform Training service to train in the cloud
6
https://www.jeremyjordan.me/ml-projects-guide/
AI Platform Training with custom containers
7
• Advantages:
– Develop locally, train in the cloud
– Pay only for the time of training
– Broad configuration options
– Job statuses and logs for historical runs
are available in the dashboard
– Easy integration with hyperparameter
tuning
Training job dockerfile
Cloud training script
Google Storage for Models and Datasets
• We use Google Storage as primary Store for models
and datasets
• One bucket per model
• We follow unified bucket and directory structure,
same for every model
– Raw data
– Combined datasets, with predefined splits
– Model files
• Documentation in Knowledge Base (Confluence)
• One can use dedicated systems like DVC, Quilt
8
Additional training tips
• Consider having two validation sets: training-dev and
test-dev, to distinguish between overfitting errors and
distribution shift
• Establish human performance for your task
• Evaluate your model performance on important data
slices
• Do hyperparameter tuning; utilize open source packages
e.g. hyperopt
• Develop a systematic way of analyzing model errors
Recommended resources:
• https://www.coursera.org/learn/machine-learning-projects
• https://www.deeplearning.ai/machine-learning-yearning/
9
https://towardsdatascience.com/some-strategies-for-machine-learning-projects-5f2f32c34635
Model deployment
• Your options:
– Online
– Batch (offline)
• Our approach is to deploy models as services
– Easy to integrate
– Easy to use by other teams
• We serve them as REST service with Flask (or, most
recently, FastApi)
• We wrap them in Docker containers so they can be
easily deployed to cloud and serve with Cloud Run
10
https://mlinproduction.com/batch-inference-vs-online-inference/
Online inference
Batch inference
Cloud Deployment: Cloud Run
• We use Cloud Run to deploy our model services
• Cloud Build for delegating build process to GCP
• GCP has dedicated service for serving models, AI
Platform Prediction, but we use Cloud Run
– It is more flexible for us, we can set up any
environment and add any dependencies
– AI Predictions has limits regarding model
size
– We can add additional endpoints (e.g.
/explain to services)
11
Service dockerfile
Cloud deployment script
Cloud Run c.d.
• Useful features out-of-the box
– Autoscaling
– Multiple Revisions (versions), easy Rollback
– Traffic management
– Multiple Namespaces (dev, prod)
– Resource Monitoring
12
Delivery pipeline automation (CI/CD)
13
• Implemented in Gitlab CI/CD
push Download files
Build image
Run tests
Run static analysis
Push image to registry
Code Review
Canary
rollout
deploy
Testing and evaluation
• Unit and integration tests for:
– Input pipelines
– Preprocessing functions
• “Regression” tests for:
– Performance on validation data
– Predictions on some important, hand-picked examples
– Performance on data slices
14
Monitoring
• System level metrics:
– Resource consumption (RAM, CPU), healthchecks, status codes, latency, etc.
• Data level metrics
– Prediction distributions, input data distributions
– System performance against real time labels (collected automatically or manually)
15
https://mlinproduction.com/
Streamlit
• https://www.streamlit.io/
• Easy tool to create simple web Data Products directly in Python
• You can use it to create Demos, share your work, showcase your models behaviour, debug
• Very intuitive, no Web skills required
16
https://towardsdatascience.com/coding-ml-tools-like-you-code-ml-models-ddba3357eace
Demos with Streamlit
17
Thanks for attending!
18

Contenu connexe

Tendances

Log Data Analysis Platform by Valentin Kropov
Log Data Analysis Platform by Valentin KropovLog Data Analysis Platform by Valentin Kropov
Log Data Analysis Platform by Valentin KropovSoftServe
 
Lessons learned from running Pega in Kubernetes
Lessons learned from running Pega in KubernetesLessons learned from running Pega in Kubernetes
Lessons learned from running Pega in KubernetesCatalin Jora
 
The Killer Feature Store: Orchestrating Spark ML Pipelines and MLflow for Pro...
The Killer Feature Store: Orchestrating Spark ML Pipelines and MLflow for Pro...The Killer Feature Store: Orchestrating Spark ML Pipelines and MLflow for Pro...
The Killer Feature Store: Orchestrating Spark ML Pipelines and MLflow for Pro...Databricks
 
Scala laboratory. Globus. iteration #1
Scala laboratory. Globus. iteration #1Scala laboratory. Globus. iteration #1
Scala laboratory. Globus. iteration #1Vasil Remeniuk
 
From Prototyping to Deployment at Scale with R and sparklyr with Kevin Kuo
From Prototyping to Deployment at Scale with R and sparklyr with Kevin KuoFrom Prototyping to Deployment at Scale with R and sparklyr with Kevin Kuo
From Prototyping to Deployment at Scale with R and sparklyr with Kevin KuoDatabricks
 
Productionizing H2O Models with Apache Spark with Jakub Hava and Michal Maloh...
Productionizing H2O Models with Apache Spark with Jakub Hava and Michal Maloh...Productionizing H2O Models with Apache Spark with Jakub Hava and Michal Maloh...
Productionizing H2O Models with Apache Spark with Jakub Hava and Michal Maloh...Databricks
 
Is This Thing On? A Well State Model for the People
Is This Thing On? A Well State Model for the PeopleIs This Thing On? A Well State Model for the People
Is This Thing On? A Well State Model for the PeopleDatabricks
 
Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...
Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...
Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...SQUADEX
 
Scalable Automatic Machine Learning in H2O
 Scalable Automatic Machine Learning in H2O Scalable Automatic Machine Learning in H2O
Scalable Automatic Machine Learning in H2OSri Ambati
 
Intro to AutoML + Hands-on Lab - Erin LeDell, Machine Learning Scientist, H2O.ai
Intro to AutoML + Hands-on Lab - Erin LeDell, Machine Learning Scientist, H2O.aiIntro to AutoML + Hands-on Lab - Erin LeDell, Machine Learning Scientist, H2O.ai
Intro to AutoML + Hands-on Lab - Erin LeDell, Machine Learning Scientist, H2O.aiSri Ambati
 
Gulp and Compass
Gulp and CompassGulp and Compass
Gulp and Compassfatleaf
 

Tendances (12)

Log Data Analysis Platform by Valentin Kropov
Log Data Analysis Platform by Valentin KropovLog Data Analysis Platform by Valentin Kropov
Log Data Analysis Platform by Valentin Kropov
 
Lessons learned from running Pega in Kubernetes
Lessons learned from running Pega in KubernetesLessons learned from running Pega in Kubernetes
Lessons learned from running Pega in Kubernetes
 
The Killer Feature Store: Orchestrating Spark ML Pipelines and MLflow for Pro...
The Killer Feature Store: Orchestrating Spark ML Pipelines and MLflow for Pro...The Killer Feature Store: Orchestrating Spark ML Pipelines and MLflow for Pro...
The Killer Feature Store: Orchestrating Spark ML Pipelines and MLflow for Pro...
 
Kubeflow repos
Kubeflow reposKubeflow repos
Kubeflow repos
 
Scala laboratory. Globus. iteration #1
Scala laboratory. Globus. iteration #1Scala laboratory. Globus. iteration #1
Scala laboratory. Globus. iteration #1
 
From Prototyping to Deployment at Scale with R and sparklyr with Kevin Kuo
From Prototyping to Deployment at Scale with R and sparklyr with Kevin KuoFrom Prototyping to Deployment at Scale with R and sparklyr with Kevin Kuo
From Prototyping to Deployment at Scale with R and sparklyr with Kevin Kuo
 
Productionizing H2O Models with Apache Spark with Jakub Hava and Michal Maloh...
Productionizing H2O Models with Apache Spark with Jakub Hava and Michal Maloh...Productionizing H2O Models with Apache Spark with Jakub Hava and Michal Maloh...
Productionizing H2O Models with Apache Spark with Jakub Hava and Michal Maloh...
 
Is This Thing On? A Well State Model for the People
Is This Thing On? A Well State Model for the PeopleIs This Thing On? A Well State Model for the People
Is This Thing On? A Well State Model for the People
 
Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...
Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...
Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...
 
Scalable Automatic Machine Learning in H2O
 Scalable Automatic Machine Learning in H2O Scalable Automatic Machine Learning in H2O
Scalable Automatic Machine Learning in H2O
 
Intro to AutoML + Hands-on Lab - Erin LeDell, Machine Learning Scientist, H2O.ai
Intro to AutoML + Hands-on Lab - Erin LeDell, Machine Learning Scientist, H2O.aiIntro to AutoML + Hands-on Lab - Erin LeDell, Machine Learning Scientist, H2O.ai
Intro to AutoML + Hands-on Lab - Erin LeDell, Machine Learning Scientist, H2O.ai
 
Gulp and Compass
Gulp and CompassGulp and Compass
Gulp and Compass
 

Similaire à Machine Learning Model Training and Deployment with Google Cloud Platform

Training and deploying ML models with Google Cloud Platform
Training and deploying ML models with Google Cloud PlatformTraining and deploying ML models with Google Cloud Platform
Training and deploying ML models with Google Cloud PlatformSotrender
 
DevOps for Machine Learning overview en-us
DevOps for Machine Learning overview en-usDevOps for Machine Learning overview en-us
DevOps for Machine Learning overview en-useltonrodriguez11
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
[DSC Europe 23] Petar Zecevic - ML in Production on Databricks
[DSC Europe 23] Petar Zecevic - ML in Production on Databricks[DSC Europe 23] Petar Zecevic - ML in Production on Databricks
[DSC Europe 23] Petar Zecevic - ML in Production on DatabricksDataScienceConferenc1
 
Machine Learning Models in Production
Machine Learning Models in ProductionMachine Learning Models in Production
Machine Learning Models in ProductionDataWorks Summit
 
With Automated ML, is Everyone an ML Engineer?
With Automated ML, is Everyone an ML Engineer?With Automated ML, is Everyone an ML Engineer?
With Automated ML, is Everyone an ML Engineer?Dan Sullivan, Ph.D.
 
Continuous delivery for machine learning
Continuous delivery for machine learningContinuous delivery for machine learning
Continuous delivery for machine learningRajesh Muppalla
 
AI Stack on AWS: Amazon SageMaker and Beyond
AI Stack on AWS: Amazon SageMaker and BeyondAI Stack on AWS: Amazon SageMaker and Beyond
AI Stack on AWS: Amazon SageMaker and BeyondProvectus
 
MLflow with Databricks
MLflow with DatabricksMLflow with Databricks
MLflow with DatabricksLiangjun Jiang
 
Mlflow with databricks
Mlflow with databricksMlflow with databricks
Mlflow with databricksLiangjun Jiang
 
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...DataScienceConferenc1
 
A survey on Machine Learning In Production (July 2018)
A survey on Machine Learning In Production (July 2018)A survey on Machine Learning In Production (July 2018)
A survey on Machine Learning In Production (July 2018)Arnab Biswas
 
Legion - AI Runtime Platform
Legion -  AI Runtime PlatformLegion -  AI Runtime Platform
Legion - AI Runtime PlatformAlexey Kharlamov
 
201908 Overview of Automated ML
201908 Overview of Automated ML201908 Overview of Automated ML
201908 Overview of Automated MLMark Tabladillo
 
Magdalena Stenius: MLOPS Will Change Machine Learning
Magdalena Stenius: MLOPS Will Change Machine LearningMagdalena Stenius: MLOPS Will Change Machine Learning
Magdalena Stenius: MLOPS Will Change Machine LearningLviv Startup Club
 
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdfSlides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdfvitm11
 
Part 3: Models in Production: A Look From Beginning to End
Part 3: Models in Production: A Look From Beginning to EndPart 3: Models in Production: A Look From Beginning to End
Part 3: Models in Production: A Look From Beginning to EndCloudera, Inc.
 
Infrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentInfrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentDatabricks
 

Similaire à Machine Learning Model Training and Deployment with Google Cloud Platform (20)

Training and deploying ML models with Google Cloud Platform
Training and deploying ML models with Google Cloud PlatformTraining and deploying ML models with Google Cloud Platform
Training and deploying ML models with Google Cloud Platform
 
MLOps in action
MLOps in actionMLOps in action
MLOps in action
 
DevOps for Machine Learning overview en-us
DevOps for Machine Learning overview en-usDevOps for Machine Learning overview en-us
DevOps for Machine Learning overview en-us
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
[DSC Europe 23] Petar Zecevic - ML in Production on Databricks
[DSC Europe 23] Petar Zecevic - ML in Production on Databricks[DSC Europe 23] Petar Zecevic - ML in Production on Databricks
[DSC Europe 23] Petar Zecevic - ML in Production on Databricks
 
Machine Learning Models in Production
Machine Learning Models in ProductionMachine Learning Models in Production
Machine Learning Models in Production
 
With Automated ML, is Everyone an ML Engineer?
With Automated ML, is Everyone an ML Engineer?With Automated ML, is Everyone an ML Engineer?
With Automated ML, is Everyone an ML Engineer?
 
Continuous delivery for machine learning
Continuous delivery for machine learningContinuous delivery for machine learning
Continuous delivery for machine learning
 
AI Stack on AWS: Amazon SageMaker and Beyond
AI Stack on AWS: Amazon SageMaker and BeyondAI Stack on AWS: Amazon SageMaker and Beyond
AI Stack on AWS: Amazon SageMaker and Beyond
 
MLflow with Databricks
MLflow with DatabricksMLflow with Databricks
MLflow with Databricks
 
Mlflow with databricks
Mlflow with databricksMlflow with databricks
Mlflow with databricks
 
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
 
A survey on Machine Learning In Production (July 2018)
A survey on Machine Learning In Production (July 2018)A survey on Machine Learning In Production (July 2018)
A survey on Machine Learning In Production (July 2018)
 
Legion - AI Runtime Platform
Legion -  AI Runtime PlatformLegion -  AI Runtime Platform
Legion - AI Runtime Platform
 
201908 Overview of Automated ML
201908 Overview of Automated ML201908 Overview of Automated ML
201908 Overview of Automated ML
 
Magdalena Stenius: MLOPS Will Change Machine Learning
Magdalena Stenius: MLOPS Will Change Machine LearningMagdalena Stenius: MLOPS Will Change Machine Learning
Magdalena Stenius: MLOPS Will Change Machine Learning
 
Ds for finance day 4
Ds for finance day 4Ds for finance day 4
Ds for finance day 4
 
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdfSlides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
 
Part 3: Models in Production: A Look From Beginning to End
Part 3: Models in Production: A Look From Beginning to EndPart 3: Models in Production: A Look From Beginning to End
Part 3: Models in Production: A Look From Beginning to End
 
Infrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentInfrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload Deployment
 

Plus de Sotrender

Topic modeling - nie tylko LDA w Gensim
Topic modeling - nie tylko LDA w GensimTopic modeling - nie tylko LDA w Gensim
Topic modeling - nie tylko LDA w GensimSotrender
 
Budowa modeli uczenia maszynowego zgodnie z regulacjami o ochronie danych za ...
Budowa modeli uczenia maszynowego zgodnie z regulacjami o ochronie danych za ...Budowa modeli uczenia maszynowego zgodnie z regulacjami o ochronie danych za ...
Budowa modeli uczenia maszynowego zgodnie z regulacjami o ochronie danych za ...Sotrender
 
Facebook Audience Insights – czyli czym interesują się polscy użytkownicy Fac...
Facebook Audience Insights – czyli czym interesują się polscy użytkownicy Fac...Facebook Audience Insights – czyli czym interesują się polscy użytkownicy Fac...
Facebook Audience Insights – czyli czym interesują się polscy użytkownicy Fac...Sotrender
 
Human-in-the-loop (HILT) machine learning i augmentacja danych, czyli jak zbu...
Human-in-the-loop (HILT) machine learning i augmentacja danych, czyli jak zbu...Human-in-the-loop (HILT) machine learning i augmentacja danych, czyli jak zbu...
Human-in-the-loop (HILT) machine learning i augmentacja danych, czyli jak zbu...Sotrender
 
Rozpoznawanie treści obrazów na kreacjach reklam na Facebooku z wykorzystanie...
Rozpoznawanie treści obrazów na kreacjach reklam na Facebooku z wykorzystanie...Rozpoznawanie treści obrazów na kreacjach reklam na Facebooku z wykorzystanie...
Rozpoznawanie treści obrazów na kreacjach reklam na Facebooku z wykorzystanie...Sotrender
 
Predykcja efektywności działań marketingowych w serwisie Facebook
Predykcja efektywności działań marketingowych w serwisie FacebookPredykcja efektywności działań marketingowych w serwisie Facebook
Predykcja efektywności działań marketingowych w serwisie FacebookSotrender
 
Wykrywanie mowy nienawiści w języku polskim
Wykrywanie mowy nienawiści w języku polskimWykrywanie mowy nienawiści w języku polskim
Wykrywanie mowy nienawiści w języku polskimSotrender
 
Federated Learning: Budowanie modeli uczenia maszynowego bez wglądu w rozpros...
Federated Learning: Budowanie modeli uczenia maszynowego bez wglądu w rozpros...Federated Learning: Budowanie modeli uczenia maszynowego bez wglądu w rozpros...
Federated Learning: Budowanie modeli uczenia maszynowego bez wglądu w rozpros...Sotrender
 
Prawdziwe oblicze tekstu, czyli jak rozmawiamy w sieci [WDI 2019]
Prawdziwe oblicze tekstu, czyli jak rozmawiamy w sieci [WDI 2019]Prawdziwe oblicze tekstu, czyli jak rozmawiamy w sieci [WDI 2019]
Prawdziwe oblicze tekstu, czyli jak rozmawiamy w sieci [WDI 2019]Sotrender
 
Ślady cyfrowe - sposoby na analizowanie aktywności internautów i działań rekl...
Ślady cyfrowe - sposoby na analizowanie aktywności internautów i działań rekl...Ślady cyfrowe - sposoby na analizowanie aktywności internautów i działań rekl...
Ślady cyfrowe - sposoby na analizowanie aktywności internautów i działań rekl...Sotrender
 
Bajki robotów? Machine Learning in Digital Marketing | Konferencja In Digital...
Bajki robotów? Machine Learning in Digital Marketing | Konferencja In Digital...Bajki robotów? Machine Learning in Digital Marketing | Konferencja In Digital...
Bajki robotów? Machine Learning in Digital Marketing | Konferencja In Digital...Sotrender
 
Sztuczna inteligencja w marketingu | Infoshare 2019
Sztuczna inteligencja w marketingu | Infoshare 2019Sztuczna inteligencja w marketingu | Infoshare 2019
Sztuczna inteligencja w marketingu | Infoshare 2019Sotrender
 
Pragmatic Machine Learning in Business
Pragmatic Machine Learning in BusinessPragmatic Machine Learning in Business
Pragmatic Machine Learning in BusinessSotrender
 
Wykorzystanie Big Data i cyfrowego śladu w naukach psychologicznych i społecz...
Wykorzystanie Big Data i cyfrowego śladu w naukach psychologicznych i społecz...Wykorzystanie Big Data i cyfrowego śladu w naukach psychologicznych i społecz...
Wykorzystanie Big Data i cyfrowego śladu w naukach psychologicznych i społecz...Sotrender
 
Jak wykorzystać social media w badaniach i jak przełożyć to na decyzje związa...
Jak wykorzystać social media w badaniach i jak przełożyć to na decyzje związa...Jak wykorzystać social media w badaniach i jak przełożyć to na decyzje związa...
Jak wykorzystać social media w badaniach i jak przełożyć to na decyzje związa...Sotrender
 
Obsługa klienta w social media
Obsługa klienta w social mediaObsługa klienta w social media
Obsługa klienta w social mediaSotrender
 
Jakimi wartościami kieruje się Twoja grupa docelowa? [Listonic Case Study]
Jakimi wartościami kieruje się Twoja grupa docelowa? [Listonic Case Study]Jakimi wartościami kieruje się Twoja grupa docelowa? [Listonic Case Study]
Jakimi wartościami kieruje się Twoja grupa docelowa? [Listonic Case Study]Sotrender
 
Każde pokolenie ma swój czas? Różnice generacyjne a dane z mediów społecznośc...
Każde pokolenie ma swój czas? Różnice generacyjne a dane z mediów społecznośc...Każde pokolenie ma swój czas? Różnice generacyjne a dane z mediów społecznośc...
Każde pokolenie ma swój czas? Różnice generacyjne a dane z mediów społecznośc...Sotrender
 
Poszerzanie pola walki - czyli z kim tak naprawdę konkurujecie?
Poszerzanie pola walki - czyli z kim tak naprawdę konkurujecie? Poszerzanie pola walki - czyli z kim tak naprawdę konkurujecie?
Poszerzanie pola walki - czyli z kim tak naprawdę konkurujecie? Sotrender
 
Mallkołaj rozdaje prezenty - Case Study z akcji Mall.pl i Los Videos
Mallkołaj rozdaje prezenty - Case Study z akcji Mall.pl i Los VideosMallkołaj rozdaje prezenty - Case Study z akcji Mall.pl i Los Videos
Mallkołaj rozdaje prezenty - Case Study z akcji Mall.pl i Los VideosSotrender
 

Plus de Sotrender (20)

Topic modeling - nie tylko LDA w Gensim
Topic modeling - nie tylko LDA w GensimTopic modeling - nie tylko LDA w Gensim
Topic modeling - nie tylko LDA w Gensim
 
Budowa modeli uczenia maszynowego zgodnie z regulacjami o ochronie danych za ...
Budowa modeli uczenia maszynowego zgodnie z regulacjami o ochronie danych za ...Budowa modeli uczenia maszynowego zgodnie z regulacjami o ochronie danych za ...
Budowa modeli uczenia maszynowego zgodnie z regulacjami o ochronie danych za ...
 
Facebook Audience Insights – czyli czym interesują się polscy użytkownicy Fac...
Facebook Audience Insights – czyli czym interesują się polscy użytkownicy Fac...Facebook Audience Insights – czyli czym interesują się polscy użytkownicy Fac...
Facebook Audience Insights – czyli czym interesują się polscy użytkownicy Fac...
 
Human-in-the-loop (HILT) machine learning i augmentacja danych, czyli jak zbu...
Human-in-the-loop (HILT) machine learning i augmentacja danych, czyli jak zbu...Human-in-the-loop (HILT) machine learning i augmentacja danych, czyli jak zbu...
Human-in-the-loop (HILT) machine learning i augmentacja danych, czyli jak zbu...
 
Rozpoznawanie treści obrazów na kreacjach reklam na Facebooku z wykorzystanie...
Rozpoznawanie treści obrazów na kreacjach reklam na Facebooku z wykorzystanie...Rozpoznawanie treści obrazów na kreacjach reklam na Facebooku z wykorzystanie...
Rozpoznawanie treści obrazów na kreacjach reklam na Facebooku z wykorzystanie...
 
Predykcja efektywności działań marketingowych w serwisie Facebook
Predykcja efektywności działań marketingowych w serwisie FacebookPredykcja efektywności działań marketingowych w serwisie Facebook
Predykcja efektywności działań marketingowych w serwisie Facebook
 
Wykrywanie mowy nienawiści w języku polskim
Wykrywanie mowy nienawiści w języku polskimWykrywanie mowy nienawiści w języku polskim
Wykrywanie mowy nienawiści w języku polskim
 
Federated Learning: Budowanie modeli uczenia maszynowego bez wglądu w rozpros...
Federated Learning: Budowanie modeli uczenia maszynowego bez wglądu w rozpros...Federated Learning: Budowanie modeli uczenia maszynowego bez wglądu w rozpros...
Federated Learning: Budowanie modeli uczenia maszynowego bez wglądu w rozpros...
 
Prawdziwe oblicze tekstu, czyli jak rozmawiamy w sieci [WDI 2019]
Prawdziwe oblicze tekstu, czyli jak rozmawiamy w sieci [WDI 2019]Prawdziwe oblicze tekstu, czyli jak rozmawiamy w sieci [WDI 2019]
Prawdziwe oblicze tekstu, czyli jak rozmawiamy w sieci [WDI 2019]
 
Ślady cyfrowe - sposoby na analizowanie aktywności internautów i działań rekl...
Ślady cyfrowe - sposoby na analizowanie aktywności internautów i działań rekl...Ślady cyfrowe - sposoby na analizowanie aktywności internautów i działań rekl...
Ślady cyfrowe - sposoby na analizowanie aktywności internautów i działań rekl...
 
Bajki robotów? Machine Learning in Digital Marketing | Konferencja In Digital...
Bajki robotów? Machine Learning in Digital Marketing | Konferencja In Digital...Bajki robotów? Machine Learning in Digital Marketing | Konferencja In Digital...
Bajki robotów? Machine Learning in Digital Marketing | Konferencja In Digital...
 
Sztuczna inteligencja w marketingu | Infoshare 2019
Sztuczna inteligencja w marketingu | Infoshare 2019Sztuczna inteligencja w marketingu | Infoshare 2019
Sztuczna inteligencja w marketingu | Infoshare 2019
 
Pragmatic Machine Learning in Business
Pragmatic Machine Learning in BusinessPragmatic Machine Learning in Business
Pragmatic Machine Learning in Business
 
Wykorzystanie Big Data i cyfrowego śladu w naukach psychologicznych i społecz...
Wykorzystanie Big Data i cyfrowego śladu w naukach psychologicznych i społecz...Wykorzystanie Big Data i cyfrowego śladu w naukach psychologicznych i społecz...
Wykorzystanie Big Data i cyfrowego śladu w naukach psychologicznych i społecz...
 
Jak wykorzystać social media w badaniach i jak przełożyć to na decyzje związa...
Jak wykorzystać social media w badaniach i jak przełożyć to na decyzje związa...Jak wykorzystać social media w badaniach i jak przełożyć to na decyzje związa...
Jak wykorzystać social media w badaniach i jak przełożyć to na decyzje związa...
 
Obsługa klienta w social media
Obsługa klienta w social mediaObsługa klienta w social media
Obsługa klienta w social media
 
Jakimi wartościami kieruje się Twoja grupa docelowa? [Listonic Case Study]
Jakimi wartościami kieruje się Twoja grupa docelowa? [Listonic Case Study]Jakimi wartościami kieruje się Twoja grupa docelowa? [Listonic Case Study]
Jakimi wartościami kieruje się Twoja grupa docelowa? [Listonic Case Study]
 
Każde pokolenie ma swój czas? Różnice generacyjne a dane z mediów społecznośc...
Każde pokolenie ma swój czas? Różnice generacyjne a dane z mediów społecznośc...Każde pokolenie ma swój czas? Różnice generacyjne a dane z mediów społecznośc...
Każde pokolenie ma swój czas? Różnice generacyjne a dane z mediów społecznośc...
 
Poszerzanie pola walki - czyli z kim tak naprawdę konkurujecie?
Poszerzanie pola walki - czyli z kim tak naprawdę konkurujecie? Poszerzanie pola walki - czyli z kim tak naprawdę konkurujecie?
Poszerzanie pola walki - czyli z kim tak naprawdę konkurujecie?
 
Mallkołaj rozdaje prezenty - Case Study z akcji Mall.pl i Los Videos
Mallkołaj rozdaje prezenty - Case Study z akcji Mall.pl i Los VideosMallkołaj rozdaje prezenty - Case Study z akcji Mall.pl i Los Videos
Mallkołaj rozdaje prezenty - Case Study z akcji Mall.pl i Los Videos
 

Dernier

Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 

Dernier (20)

Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 

Machine Learning Model Training and Deployment with Google Cloud Platform

  • 1. Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem GCP Maciej Pieńkosz Data Science Summit 2020 1
  • 2. What we do at Sotrender 2
  • 3. Our models 1. Sentiment 2. Hatespeech 3. Topic modelling 4. Keyphrase extractor 5. NER (brands and products) 6. Image Tagger 7. Text Extractor 8. Logo Detector 9. Post Classifier 10. …. 3
  • 4. ML models lifecycle 1. Planning and project setup 2. Data collection and labeling 3. Modeling and exploration 4. Model training and refinement 5. Testing and evaluation 6. Model deployment 7. Ongoing model maintenance and monitoring 4 https://www.jeremyjordan.me/ml-projects-guide/
  • 5. Modeling with AI Notebooks 1. We use Google Cloud Platform as our cloud provider 2. AI Platform Notebooks is used for initial data exploration and modeling 3. For the start, we favor faster, simpler model architectures that can be easily built, validated, iterated and eventually deployed (usually on CPU) 4. Experiment tracking: MlFlow 5 https://databricks.com/blog/2018/06/05/introducing-mlflow-an-open-source-machine-learning-platform.html https://cloud.google.com/ai-platform-notebooks?hl=id
  • 6. Structuring training code • Notebooks disadvantages: – You pay for the whole time the notebook is running – Code quality is usually lower – Hard to parametrize, unit test, and review • After initial experimentation phase, we try to give more structure to the model training code: – Refactor codebase to Python packages and modules and move to git repository (Gitlab) – Add tests (more on it later) – Wrap code into a Docker container – Use dedicated AI Platform Training service to train in the cloud 6 https://www.jeremyjordan.me/ml-projects-guide/
  • 7. AI Platform Training with custom containers 7 • Advantages: – Develop locally, train in the cloud – Pay only for the time of training – Broad configuration options – Job statuses and logs for historical runs are available in the dashboard – Easy integration with hyperparameter tuning Training job dockerfile Cloud training script
  • 8. Google Storage for Models and Datasets • We use Google Storage as primary Store for models and datasets • One bucket per model • We follow unified bucket and directory structure, same for every model – Raw data – Combined datasets, with predefined splits – Model files • Documentation in Knowledge Base (Confluence) • One can use dedicated systems like DVC, Quilt 8
  • 9. Additional training tips • Consider having two validation sets: training-dev and test-dev, to distinguish between overfitting errors and distribution shift • Establish human performance for your task • Evaluate your model performance on important data slices • Do hyperparameter tuning; utilize open source packages e.g. hyperopt • Develop a systematic way of analyzing model errors Recommended resources: • https://www.coursera.org/learn/machine-learning-projects • https://www.deeplearning.ai/machine-learning-yearning/ 9 https://towardsdatascience.com/some-strategies-for-machine-learning-projects-5f2f32c34635
  • 10. Model deployment • Your options: – Online – Batch (offline) • Our approach is to deploy models as services – Easy to integrate – Easy to use by other teams • We serve them as REST service with Flask (or, most recently, FastApi) • We wrap them in Docker containers so they can be easily deployed to cloud and serve with Cloud Run 10 https://mlinproduction.com/batch-inference-vs-online-inference/ Online inference Batch inference
  • 11. Cloud Deployment: Cloud Run • We use Cloud Run to deploy our model services • Cloud Build for delegating build process to GCP • GCP has dedicated service for serving models, AI Platform Prediction, but we use Cloud Run – It is more flexible for us, we can set up any environment and add any dependencies – AI Predictions has limits regarding model size – We can add additional endpoints (e.g. /explain to services) 11 Service dockerfile Cloud deployment script
  • 12. Cloud Run c.d. • Useful features out-of-the box – Autoscaling – Multiple Revisions (versions), easy Rollback – Traffic management – Multiple Namespaces (dev, prod) – Resource Monitoring 12
  • 13. Delivery pipeline automation (CI/CD) 13 • Implemented in Gitlab CI/CD push Download files Build image Run tests Run static analysis Push image to registry Code Review Canary rollout deploy
  • 14. Testing and evaluation • Unit and integration tests for: – Input pipelines – Preprocessing functions • “Regression” tests for: – Performance on validation data – Predictions on some important, hand-picked examples – Performance on data slices 14
  • 15. Monitoring • System level metrics: – Resource consumption (RAM, CPU), healthchecks, status codes, latency, etc. • Data level metrics – Prediction distributions, input data distributions – System performance against real time labels (collected automatically or manually) 15 https://mlinproduction.com/
  • 16. Streamlit • https://www.streamlit.io/ • Easy tool to create simple web Data Products directly in Python • You can use it to create Demos, share your work, showcase your models behaviour, debug • Very intuitive, no Web skills required 16 https://towardsdatascience.com/coding-ml-tools-like-you-code-ml-models-ddba3357eace