SlideShare a Scribd company logo
1 of 64
Download to read offline
OSS Meetup
11 Feb 2020
Tools for Data Scientists
6:00 pm Registration, Food, Networking
Faisal Siddiqi 7:00 (5m) Welcome
Ville Tuulos 7:05 pm (20m) Metaflow
Jeremy Smith 7:25 pm (20m) Polynote
Matthew Seal 7:45 pm (15m) Papermill
8:00 pm Demo Stations, Networking, Food
Agenda
data scientist
productivity
Basics
Workflow as a DAG
State Transfer and Checkpointing
Versioning and Experiment Tracking
Inspection and Monitoring
Vertical Scalability
Horizontal Scalability
Dependency Management
...and much more!
See metaflow.org for details
Metaflow @
Google
polynote.org
Polynote is a polyglot notebook environment,
built from scratch.
Polynote is a polyglot notebook environment,
built from scratch.
It supports mixing Scala, Python, SQL, and
Vega in a single notebook.
Polynote is a polyglot notebook environment,
built from scratch.
It supports mixing Scala, Python, SQL, and
Vega in a single notebook.
Data is shared seamlessly* between
languages.
Why did we build it?
Scientists were avoiding Scala notebooks for
experimentation.
Why did we build it?
Scientists were avoiding Scala notebooks for
experimentation.
It was just a pain to use Scala and Spark in a
notebook.
Scala + Spark pain points
● Interactive autocomplete is practically a
necessity
● Difficult to find compiler errors
● Dependencies are many and varied
● Spark clashes with dependencies –
constantly building shaded JARs
What's different about
Polynote?
Editing improvements
Quality-of-life IDE features like autocomplete and
parameter hints, error highlighting, etc.
Reproducibility
Cells see only the state derived from cells above, no
matter what order they ran in.
Visibility
See what the Kernel's up to with the symbol table, task
list and executing expression highlight.
Data Visualization
Use the built-in Data Inspector to browse tabular data
and inspect schema. Plot data with the plot editor, or use
Vega or matplotlib directly.
Polyglot
Scala cells and Python cells together in one notebook.
Variables from each language are available to the other.
Polyglot
Scala cells and Python cells together in one notebook.
Variables from each language are available to the other.
Example use case: data prep in Scala+Spark, model
training in Python with TensorFlow/PyTorch/etc
Questions?
(stop by our demo station!)
Papermill
(2.0!)
Matthew Seal
Backend Engineer on the Big Data Platform
Orchestration Team @ Netflix
@codeseal
Speaker Details
Notebook
Wins.
● Shareable
● Easy to Read
● Documentation with
Code
● Outputs as Reports
● Familiar Interface
● Multi-Language
Things to preserve:
● Results linked to code
● Good visuals
● Easy to share
Focus points to extend uses.
Things to improve:
● Not versioned
● Mutable state
● Templating
Jupyter Notebooks:
A Repl Protocol + UIs
Jupyter
UIs
Jupyter
Server
Jupyter
Kernel
execute
code
receive
outputs
forward
requests
save / load
.ipynb
It’s more complex than this in reality
develop
share
A simple library for executing
notebooks.
EFS
S3
Papermill
template.ipynb
run_1.ipynb
run_3.ipynb
output
notebooks
parameterize & run
run_2.ipynb
run_4.ipynbinput
notebook
input store
s3://output/mseal/
efs://users/mseal/notebooks
import papermill as pm
pm.execute_notebook('input_nb.ipynb', 'outputs/20190402_run.ipynb')
…
# Each run can be placed in a unique / sortable path
pprint(files_in_directory('outputs'))
outputs/
...
20190401_run.ipynb
20190402_run.ipynb
Choose an output location.
# Pass template parameters to notebook execution
pm.execute_notebook('input_nb.ipynb', 'outputs/20190402_run.ipynb',
{'region': 'ca', 'devices': ['phone', 'tablet']})
…
[2] # Default values for our potential input parameters
region = 'us'
devices = ['pc']
date_since = datetime.now() - timedelta(days=30)
[3] # Parameters
region = 'ca'
devices = ['phone', 'tablet']
Add Parameters
# Same example as last slide
pm.execute_notebook('input_nb.ipynb', 'outputs/20190402_run.ipynb',
{'region': 'ca', 'devices': ['phone', 'tablet']})
…
# Bash version of that input
papermill input_nb.ipynb outputs/20190402_run.ipynb -p region ca -y
'{"devices": ["phone", "tablet"]}'
Also Available as a CLI
Let’s use the CLI ...
Notebooks: Programmatically
Jupyter
UIs
Jupyter
Server
Jupyter
Kernel
execute
code
receive
outputs
forward
requests
save / load
.ipynb
develop
share
Papermill
receive
outputs
Kernel
Manager
forward
requests
read write
execute
code
# To add SFTP support you’d add this class
class SFTPHandler():
def read(self, file_path):
...
def write(self, file_contents, file_path):
…
# Then add an entry_point for the handler
from setuptools import setup, find_packages
setup(
# all the usual setup arguments ...
entry_points={'papermill.io':
['sftp://=papermill_sftp:SFTPHandler']})
# Use the new prefix to read/write from that location
pm.execute_notebook('sftp://my_ftp_server.co.uk/input.ipynb',
'sftp://my_ftp_server.co.uk/output.ipynb')
Entire Library is Component Based
Failed Notebooks
A better way to review outcomes
Debugging failed jobs.
Notebook
Job #1
Notebook
Job #2
Failed
Notebook
Job #3
Notebook
Job #4
Notebook
Job #5
Output notebooks are the place to
look for failures. They have:
● Stack traces
● Re-runnable code
● Execution logs
● Same interface as input
Failed outputs
are useful.
Find the issue.
Test the fix.
Update the notebook.
Output notebooks are the place to
look for failures. They have:
● Stack traces
● Re-runnable code
● Execution logs
● Same interface as input
Adds notebook isolation
● Immutable inputs
● Immutable outputs
● Parameterization of notebook runs
● Configurable sourcing / sinking
and gives better control of notebook flows via library calls.
Changes to the notebook experience.
● Platform Scheduler uses Jupyter
Notebooks for all Templates
● Notebooks used to run integration tests,
monitor systems, execute ETL, and wrap
ML flows.
Jupyter Notebooks @Netflix
Questions?
https://slack.nteract.io/
https://discourse.jupyter.org/

More Related Content

What's hot

Container World 2018
Container World 2018Container World 2018
Container World 2018aspyker
 
CMP376 - Another Week, Another Million Containers on Amazon EC2
CMP376 - Another Week, Another Million Containers on Amazon EC2CMP376 - Another Week, Another Million Containers on Amazon EC2
CMP376 - Another Week, Another Million Containers on Amazon EC2aspyker
 
Velocity NYC 2016 - Containers @ Netflix
Velocity NYC 2016 - Containers @ NetflixVelocity NYC 2016 - Containers @ Netflix
Velocity NYC 2016 - Containers @ Netflixaspyker
 
CS80A Foothill College Open Source Talk
CS80A Foothill College Open Source TalkCS80A Foothill College Open Source Talk
CS80A Foothill College Open Source Talkaspyker
 
The Art of Decomposing Monoliths - Kfir Bloch, Wix
The Art of Decomposing Monoliths - Kfir Bloch, WixThe Art of Decomposing Monoliths - Kfir Bloch, Wix
The Art of Decomposing Monoliths - Kfir Bloch, WixCodemotion Tel Aviv
 
NetflixOSS Meetup S6E2 - Spinnaker, Kayenta
NetflixOSS Meetup S6E2 - Spinnaker, KayentaNetflixOSS Meetup S6E2 - Spinnaker, Kayenta
NetflixOSS Meetup S6E2 - Spinnaker, Kayentaaspyker
 
Netflix oss season 2 episode 1 - meetup Lightning talks
Netflix oss   season 2 episode 1 - meetup Lightning talksNetflix oss   season 2 episode 1 - meetup Lightning talks
Netflix oss season 2 episode 1 - meetup Lightning talksRuslan Meshenberg
 
Netflix oss season 1 episode 3
Netflix oss season 1 episode 3 Netflix oss season 1 episode 3
Netflix oss season 1 episode 3 Ruslan Meshenberg
 
Netflix Open Source Meetup Season 4 Episode 1
Netflix Open Source Meetup Season 4 Episode 1Netflix Open Source Meetup Season 4 Episode 1
Netflix Open Source Meetup Season 4 Episode 1aspyker
 
Timed Text At Netflix
Timed Text At NetflixTimed Text At Netflix
Timed Text At NetflixRohit Puri
 
Netflix Cloud Platform and Open Source
Netflix Cloud Platform and Open SourceNetflix Cloud Platform and Open Source
Netflix Cloud Platform and Open Sourceaspyker
 
Netflix Container Runtime - Titus - for Container Camp 2016
Netflix Container Runtime - Titus - for Container Camp 2016Netflix Container Runtime - Titus - for Container Camp 2016
Netflix Container Runtime - Titus - for Container Camp 2016aspyker
 
Netflix OSS Meetup Season 5 Episode 1
Netflix OSS Meetup Season 5 Episode 1Netflix OSS Meetup Season 5 Episode 1
Netflix OSS Meetup Season 5 Episode 1aspyker
 
Monitoring, the Prometheus Way - Julius Voltz, Prometheus
Monitoring, the Prometheus Way - Julius Voltz, Prometheus Monitoring, the Prometheus Way - Julius Voltz, Prometheus
Monitoring, the Prometheus Way - Julius Voltz, Prometheus Docker, Inc.
 
Dev309 from asgard to zuul - netflix oss-final
Dev309  from asgard to zuul - netflix oss-finalDev309  from asgard to zuul - netflix oss-final
Dev309 from asgard to zuul - netflix oss-finalRuslan Meshenberg
 
Netflix Story of Embracing the Cloud
Netflix Story of Embracing the CloudNetflix Story of Embracing the Cloud
Netflix Story of Embracing the CloudKate Karniouchina
 
Open-source vs. public cloud in the Big Data landscape. Friends or Foes?
Open-source vs. public cloud in the Big Data landscape. Friends or Foes?Open-source vs. public cloud in the Big Data landscape. Friends or Foes?
Open-source vs. public cloud in the Big Data landscape. Friends or Foes?GetInData
 
The service mesh management plane
The service mesh management planeThe service mesh management plane
The service mesh management planeLibbySchulze
 
Hot to build continuously processing for 24/7 real-time data streaming platform?
Hot to build continuously processing for 24/7 real-time data streaming platform?Hot to build continuously processing for 24/7 real-time data streaming platform?
Hot to build continuously processing for 24/7 real-time data streaming platform?GetInData
 

What's hot (20)

Container World 2018
Container World 2018Container World 2018
Container World 2018
 
CMP376 - Another Week, Another Million Containers on Amazon EC2
CMP376 - Another Week, Another Million Containers on Amazon EC2CMP376 - Another Week, Another Million Containers on Amazon EC2
CMP376 - Another Week, Another Million Containers on Amazon EC2
 
The new Netflix API
The new Netflix APIThe new Netflix API
The new Netflix API
 
Velocity NYC 2016 - Containers @ Netflix
Velocity NYC 2016 - Containers @ NetflixVelocity NYC 2016 - Containers @ Netflix
Velocity NYC 2016 - Containers @ Netflix
 
CS80A Foothill College Open Source Talk
CS80A Foothill College Open Source TalkCS80A Foothill College Open Source Talk
CS80A Foothill College Open Source Talk
 
The Art of Decomposing Monoliths - Kfir Bloch, Wix
The Art of Decomposing Monoliths - Kfir Bloch, WixThe Art of Decomposing Monoliths - Kfir Bloch, Wix
The Art of Decomposing Monoliths - Kfir Bloch, Wix
 
NetflixOSS Meetup S6E2 - Spinnaker, Kayenta
NetflixOSS Meetup S6E2 - Spinnaker, KayentaNetflixOSS Meetup S6E2 - Spinnaker, Kayenta
NetflixOSS Meetup S6E2 - Spinnaker, Kayenta
 
Netflix oss season 2 episode 1 - meetup Lightning talks
Netflix oss   season 2 episode 1 - meetup Lightning talksNetflix oss   season 2 episode 1 - meetup Lightning talks
Netflix oss season 2 episode 1 - meetup Lightning talks
 
Netflix oss season 1 episode 3
Netflix oss season 1 episode 3 Netflix oss season 1 episode 3
Netflix oss season 1 episode 3
 
Netflix Open Source Meetup Season 4 Episode 1
Netflix Open Source Meetup Season 4 Episode 1Netflix Open Source Meetup Season 4 Episode 1
Netflix Open Source Meetup Season 4 Episode 1
 
Timed Text At Netflix
Timed Text At NetflixTimed Text At Netflix
Timed Text At Netflix
 
Netflix Cloud Platform and Open Source
Netflix Cloud Platform and Open SourceNetflix Cloud Platform and Open Source
Netflix Cloud Platform and Open Source
 
Netflix Container Runtime - Titus - for Container Camp 2016
Netflix Container Runtime - Titus - for Container Camp 2016Netflix Container Runtime - Titus - for Container Camp 2016
Netflix Container Runtime - Titus - for Container Camp 2016
 
Netflix OSS Meetup Season 5 Episode 1
Netflix OSS Meetup Season 5 Episode 1Netflix OSS Meetup Season 5 Episode 1
Netflix OSS Meetup Season 5 Episode 1
 
Monitoring, the Prometheus Way - Julius Voltz, Prometheus
Monitoring, the Prometheus Way - Julius Voltz, Prometheus Monitoring, the Prometheus Way - Julius Voltz, Prometheus
Monitoring, the Prometheus Way - Julius Voltz, Prometheus
 
Dev309 from asgard to zuul - netflix oss-final
Dev309  from asgard to zuul - netflix oss-finalDev309  from asgard to zuul - netflix oss-final
Dev309 from asgard to zuul - netflix oss-final
 
Netflix Story of Embracing the Cloud
Netflix Story of Embracing the CloudNetflix Story of Embracing the Cloud
Netflix Story of Embracing the Cloud
 
Open-source vs. public cloud in the Big Data landscape. Friends or Foes?
Open-source vs. public cloud in the Big Data landscape. Friends or Foes?Open-source vs. public cloud in the Big Data landscape. Friends or Foes?
Open-source vs. public cloud in the Big Data landscape. Friends or Foes?
 
The service mesh management plane
The service mesh management planeThe service mesh management plane
The service mesh management plane
 
Hot to build continuously processing for 24/7 real-time data streaming platform?
Hot to build continuously processing for 24/7 real-time data streaming platform?Hot to build continuously processing for 24/7 real-time data streaming platform?
Hot to build continuously processing for 24/7 real-time data streaming platform?
 

Similar to Season 7 Episode 1 - Tools for Data Scientists

KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...Chris Fregly
 
Jupyter notebooks on steroids
Jupyter notebooks on steroidsJupyter notebooks on steroids
Jupyter notebooks on steroidsJose Enrique Ruiz
 
Présentation du FME World Tour 2018 à Montréal
Présentation du FME World Tour 2018 à MontréalPrésentation du FME World Tour 2018 à Montréal
Présentation du FME World Tour 2018 à MontréalGuillaume Genest
 
Developing and releasing SOFA Statistics
Developing and releasing SOFA StatisticsDeveloping and releasing SOFA Statistics
Developing and releasing SOFA StatisticsGrant Paton-Simpson
 
Collaborative data science and how to build a data science toolchain around n...
Collaborative data science and how to build a data science toolchain around n...Collaborative data science and how to build a data science toolchain around n...
Collaborative data science and how to build a data science toolchain around n...Moon Soo Lee
 
Présentation du FME World Tour 2018 à Québec
Présentation du FME World Tour 2018 à QuébecPrésentation du FME World Tour 2018 à Québec
Présentation du FME World Tour 2018 à QuébecGuillaume Genest
 
Intro - End to end ML with Kubeflow @ SignalConf 2018
Intro - End to end ML with Kubeflow @ SignalConf 2018Intro - End to end ML with Kubeflow @ SignalConf 2018
Intro - End to end ML with Kubeflow @ SignalConf 2018Holden Karau
 
PyQt Application Development On Maemo
PyQt Application Development On MaemoPyQt Application Development On Maemo
PyQt Application Development On Maemoachipa
 
Fullstack workshop
Fullstack workshopFullstack workshop
Fullstack workshopAssaf Gannon
 
Using Elyra for COVID-19 Analytics
Using Elyra for COVID-19 AnalyticsUsing Elyra for COVID-19 Analytics
Using Elyra for COVID-19 AnalyticsLuciano Resende
 
Exploring SharePoint with F#
Exploring SharePoint with F#Exploring SharePoint with F#
Exploring SharePoint with F#Talbott Crowell
 
Openmeetings
OpenmeetingsOpenmeetings
Openmeetingshs1250
 
Extending DevOps to Big Data Applications with Kubernetes
Extending DevOps to Big Data Applications with KubernetesExtending DevOps to Big Data Applications with Kubernetes
Extending DevOps to Big Data Applications with KubernetesNicola Ferraro
 
carrow - Go bindings to Apache Arrow via C++-API
carrow - Go bindings to Apache Arrow via C++-APIcarrow - Go bindings to Apache Arrow via C++-API
carrow - Go bindings to Apache Arrow via C++-APIYoni Davidson
 
Improving Operations Efficiency with Puppet
Improving Operations Efficiency with PuppetImproving Operations Efficiency with Puppet
Improving Operations Efficiency with PuppetNicolas Brousse
 
DevOps for Data Scientists - Stefano Tucci
DevOps for Data Scientists - Stefano TucciDevOps for Data Scientists - Stefano Tucci
DevOps for Data Scientists - Stefano TucciStefano Tucci
 
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdfSteve Caron
 
Yaetos Tech Overview
Yaetos Tech OverviewYaetos Tech Overview
Yaetos Tech Overviewprevota
 
Python Data Science and Machine Learning at Scale with Intel and Anaconda
Python Data Science and Machine Learning at Scale with Intel and AnacondaPython Data Science and Machine Learning at Scale with Intel and Anaconda
Python Data Science and Machine Learning at Scale with Intel and AnacondaIntel® Software
 
EclipseOMRBuildingBlocks4Polyglot_TURBO18
EclipseOMRBuildingBlocks4Polyglot_TURBO18EclipseOMRBuildingBlocks4Polyglot_TURBO18
EclipseOMRBuildingBlocks4Polyglot_TURBO18Xiaoli Liang
 

Similar to Season 7 Episode 1 - Tools for Data Scientists (20)

KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
 
Jupyter notebooks on steroids
Jupyter notebooks on steroidsJupyter notebooks on steroids
Jupyter notebooks on steroids
 
Présentation du FME World Tour 2018 à Montréal
Présentation du FME World Tour 2018 à MontréalPrésentation du FME World Tour 2018 à Montréal
Présentation du FME World Tour 2018 à Montréal
 
Developing and releasing SOFA Statistics
Developing and releasing SOFA StatisticsDeveloping and releasing SOFA Statistics
Developing and releasing SOFA Statistics
 
Collaborative data science and how to build a data science toolchain around n...
Collaborative data science and how to build a data science toolchain around n...Collaborative data science and how to build a data science toolchain around n...
Collaborative data science and how to build a data science toolchain around n...
 
Présentation du FME World Tour 2018 à Québec
Présentation du FME World Tour 2018 à QuébecPrésentation du FME World Tour 2018 à Québec
Présentation du FME World Tour 2018 à Québec
 
Intro - End to end ML with Kubeflow @ SignalConf 2018
Intro - End to end ML with Kubeflow @ SignalConf 2018Intro - End to end ML with Kubeflow @ SignalConf 2018
Intro - End to end ML with Kubeflow @ SignalConf 2018
 
PyQt Application Development On Maemo
PyQt Application Development On MaemoPyQt Application Development On Maemo
PyQt Application Development On Maemo
 
Fullstack workshop
Fullstack workshopFullstack workshop
Fullstack workshop
 
Using Elyra for COVID-19 Analytics
Using Elyra for COVID-19 AnalyticsUsing Elyra for COVID-19 Analytics
Using Elyra for COVID-19 Analytics
 
Exploring SharePoint with F#
Exploring SharePoint with F#Exploring SharePoint with F#
Exploring SharePoint with F#
 
Openmeetings
OpenmeetingsOpenmeetings
Openmeetings
 
Extending DevOps to Big Data Applications with Kubernetes
Extending DevOps to Big Data Applications with KubernetesExtending DevOps to Big Data Applications with Kubernetes
Extending DevOps to Big Data Applications with Kubernetes
 
carrow - Go bindings to Apache Arrow via C++-API
carrow - Go bindings to Apache Arrow via C++-APIcarrow - Go bindings to Apache Arrow via C++-API
carrow - Go bindings to Apache Arrow via C++-API
 
Improving Operations Efficiency with Puppet
Improving Operations Efficiency with PuppetImproving Operations Efficiency with Puppet
Improving Operations Efficiency with Puppet
 
DevOps for Data Scientists - Stefano Tucci
DevOps for Data Scientists - Stefano TucciDevOps for Data Scientists - Stefano Tucci
DevOps for Data Scientists - Stefano Tucci
 
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
 
Yaetos Tech Overview
Yaetos Tech OverviewYaetos Tech Overview
Yaetos Tech Overview
 
Python Data Science and Machine Learning at Scale with Intel and Anaconda
Python Data Science and Machine Learning at Scale with Intel and AnacondaPython Data Science and Machine Learning at Scale with Intel and Anaconda
Python Data Science and Machine Learning at Scale with Intel and Anaconda
 
EclipseOMRBuildingBlocks4Polyglot_TURBO18
EclipseOMRBuildingBlocks4Polyglot_TURBO18EclipseOMRBuildingBlocks4Polyglot_TURBO18
EclipseOMRBuildingBlocks4Polyglot_TURBO18
 

More from aspyker

SRECon Lightning Talk
SRECon Lightning TalkSRECon Lightning Talk
SRECon Lightning Talkaspyker
 
Series of Unfortunate Netflix Container Events - QConNYC17
Series of Unfortunate Netflix Container Events - QConNYC17Series of Unfortunate Netflix Container Events - QConNYC17
Series of Unfortunate Netflix Container Events - QConNYC17aspyker
 
Netflix OSS Meetup Season 4 Episode 4
Netflix OSS Meetup Season 4 Episode 4Netflix OSS Meetup Season 4 Episode 4
Netflix OSS Meetup Season 4 Episode 4aspyker
 
Re:invent 2016 Container Scheduling, Execution and AWS Integration
Re:invent 2016 Container Scheduling, Execution and AWS IntegrationRe:invent 2016 Container Scheduling, Execution and AWS Integration
Re:invent 2016 Container Scheduling, Execution and AWS Integrationaspyker
 
Netflix Open Source: Building a Distributed and Automated Open Source Program
Netflix Open Source:  Building a Distributed and Automated Open Source ProgramNetflix Open Source:  Building a Distributed and Automated Open Source Program
Netflix Open Source: Building a Distributed and Automated Open Source Programaspyker
 
Netflix Open Source Meetup Season 4 Episode 3
Netflix Open Source Meetup Season 4 Episode 3Netflix Open Source Meetup Season 4 Episode 3
Netflix Open Source Meetup Season 4 Episode 3aspyker
 
Netflix Container Scheduling and Execution - QCon New York 2016
Netflix Container Scheduling and Execution - QCon New York 2016Netflix Container Scheduling and Execution - QCon New York 2016
Netflix Container Scheduling and Execution - QCon New York 2016aspyker
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2aspyker
 
Netflix Cloud Architecture and Open Source
Netflix Cloud Architecture and Open SourceNetflix Cloud Architecture and Open Source
Netflix Cloud Architecture and Open Sourceaspyker
 
Ibm cloud nativenetflixossfinal
Ibm cloud nativenetflixossfinalIbm cloud nativenetflixossfinal
Ibm cloud nativenetflixossfinalaspyker
 
Docker Demo IBM Impact 2014
Docker Demo IBM Impact 2014Docker Demo IBM Impact 2014
Docker Demo IBM Impact 2014aspyker
 
Netflix s2e1lightningtalk
Netflix s2e1lightningtalkNetflix s2e1lightningtalk
Netflix s2e1lightningtalkaspyker
 
Going Cloud Native with IBM Cloud and NetflixOSS for Dev@Pulse
Going Cloud Native with IBM Cloud and NetflixOSS for Dev@PulseGoing Cloud Native with IBM Cloud and NetflixOSS for Dev@Pulse
Going Cloud Native with IBM Cloud and NetflixOSS for Dev@Pulseaspyker
 

More from aspyker (13)

SRECon Lightning Talk
SRECon Lightning TalkSRECon Lightning Talk
SRECon Lightning Talk
 
Series of Unfortunate Netflix Container Events - QConNYC17
Series of Unfortunate Netflix Container Events - QConNYC17Series of Unfortunate Netflix Container Events - QConNYC17
Series of Unfortunate Netflix Container Events - QConNYC17
 
Netflix OSS Meetup Season 4 Episode 4
Netflix OSS Meetup Season 4 Episode 4Netflix OSS Meetup Season 4 Episode 4
Netflix OSS Meetup Season 4 Episode 4
 
Re:invent 2016 Container Scheduling, Execution and AWS Integration
Re:invent 2016 Container Scheduling, Execution and AWS IntegrationRe:invent 2016 Container Scheduling, Execution and AWS Integration
Re:invent 2016 Container Scheduling, Execution and AWS Integration
 
Netflix Open Source: Building a Distributed and Automated Open Source Program
Netflix Open Source:  Building a Distributed and Automated Open Source ProgramNetflix Open Source:  Building a Distributed and Automated Open Source Program
Netflix Open Source: Building a Distributed and Automated Open Source Program
 
Netflix Open Source Meetup Season 4 Episode 3
Netflix Open Source Meetup Season 4 Episode 3Netflix Open Source Meetup Season 4 Episode 3
Netflix Open Source Meetup Season 4 Episode 3
 
Netflix Container Scheduling and Execution - QCon New York 2016
Netflix Container Scheduling and Execution - QCon New York 2016Netflix Container Scheduling and Execution - QCon New York 2016
Netflix Container Scheduling and Execution - QCon New York 2016
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
 
Netflix Cloud Architecture and Open Source
Netflix Cloud Architecture and Open SourceNetflix Cloud Architecture and Open Source
Netflix Cloud Architecture and Open Source
 
Ibm cloud nativenetflixossfinal
Ibm cloud nativenetflixossfinalIbm cloud nativenetflixossfinal
Ibm cloud nativenetflixossfinal
 
Docker Demo IBM Impact 2014
Docker Demo IBM Impact 2014Docker Demo IBM Impact 2014
Docker Demo IBM Impact 2014
 
Netflix s2e1lightningtalk
Netflix s2e1lightningtalkNetflix s2e1lightningtalk
Netflix s2e1lightningtalk
 
Going Cloud Native with IBM Cloud and NetflixOSS for Dev@Pulse
Going Cloud Native with IBM Cloud and NetflixOSS for Dev@PulseGoing Cloud Native with IBM Cloud and NetflixOSS for Dev@Pulse
Going Cloud Native with IBM Cloud and NetflixOSS for Dev@Pulse
 

Recently uploaded

complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...asadnawaz62
 
11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdfHafizMudaserAhmad
 
Indian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptIndian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptMadan Karki
 
Research Methodology for Engineering pdf
Research Methodology for Engineering pdfResearch Methodology for Engineering pdf
Research Methodology for Engineering pdfCaalaaAbdulkerim
 
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONjhunlian
 
National Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdfNational Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdfRajuKanojiya4
 
Configuration of IoT devices - Systems managament
Configuration of IoT devices - Systems managamentConfiguration of IoT devices - Systems managament
Configuration of IoT devices - Systems managamentBharaniDharan195623
 
Class 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm SystemClass 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm Systemirfanmechengr
 
DM Pillar Training Manual.ppt will be useful in deploying TPM in project
DM Pillar Training Manual.ppt will be useful in deploying TPM in projectDM Pillar Training Manual.ppt will be useful in deploying TPM in project
DM Pillar Training Manual.ppt will be useful in deploying TPM in projectssuserb6619e
 
Risk Management in Engineering Construction Project
Risk Management in Engineering Construction ProjectRisk Management in Engineering Construction Project
Risk Management in Engineering Construction ProjectErbil Polytechnic University
 
"Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ..."Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ...Erbil Polytechnic University
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleAlluxio, Inc.
 
Transport layer issues and challenges - Guide
Transport layer issues and challenges - GuideTransport layer issues and challenges - Guide
Transport layer issues and challenges - GuideGOPINATHS437943
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
System Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event SchedulingSystem Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event SchedulingBootNeck1
 
Internet of things -Arshdeep Bahga .pptx
Internet of things -Arshdeep Bahga .pptxInternet of things -Arshdeep Bahga .pptx
Internet of things -Arshdeep Bahga .pptxVelmuruganTECE
 

Recently uploaded (20)

complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
 
11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf
 
Indian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.pptIndian Dairy Industry Present Status and.ppt
Indian Dairy Industry Present Status and.ppt
 
Research Methodology for Engineering pdf
Research Methodology for Engineering pdfResearch Methodology for Engineering pdf
Research Methodology for Engineering pdf
 
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
 
National Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdfNational Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdf
 
Configuration of IoT devices - Systems managament
Configuration of IoT devices - Systems managamentConfiguration of IoT devices - Systems managament
Configuration of IoT devices - Systems managament
 
Designing pile caps according to ACI 318-19.pptx
Designing pile caps according to ACI 318-19.pptxDesigning pile caps according to ACI 318-19.pptx
Designing pile caps according to ACI 318-19.pptx
 
POWER SYSTEMS-1 Complete notes examples
POWER SYSTEMS-1 Complete notes  examplesPOWER SYSTEMS-1 Complete notes  examples
POWER SYSTEMS-1 Complete notes examples
 
Class 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm SystemClass 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm System
 
DM Pillar Training Manual.ppt will be useful in deploying TPM in project
DM Pillar Training Manual.ppt will be useful in deploying TPM in projectDM Pillar Training Manual.ppt will be useful in deploying TPM in project
DM Pillar Training Manual.ppt will be useful in deploying TPM in project
 
Risk Management in Engineering Construction Project
Risk Management in Engineering Construction ProjectRisk Management in Engineering Construction Project
Risk Management in Engineering Construction Project
 
Design and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdfDesign and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdf
 
"Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ..."Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ...
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at Scale
 
young call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Serviceyoung call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Service
 
Transport layer issues and challenges - Guide
Transport layer issues and challenges - GuideTransport layer issues and challenges - Guide
Transport layer issues and challenges - Guide
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
System Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event SchedulingSystem Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event Scheduling
 
Internet of things -Arshdeep Bahga .pptx
Internet of things -Arshdeep Bahga .pptxInternet of things -Arshdeep Bahga .pptx
Internet of things -Arshdeep Bahga .pptx
 

Season 7 Episode 1 - Tools for Data Scientists

  • 1. OSS Meetup 11 Feb 2020 Tools for Data Scientists
  • 2. 6:00 pm Registration, Food, Networking Faisal Siddiqi 7:00 (5m) Welcome Ville Tuulos 7:05 pm (20m) Metaflow Jeremy Smith 7:25 pm (20m) Polynote Matthew Seal 7:45 pm (15m) Papermill 8:00 pm Demo Stations, Networking, Food Agenda
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 17.
  • 18.
  • 19.
  • 22. State Transfer and Checkpointing
  • 28. ...and much more! See metaflow.org for details
  • 31. Polynote is a polyglot notebook environment, built from scratch.
  • 32. Polynote is a polyglot notebook environment, built from scratch. It supports mixing Scala, Python, SQL, and Vega in a single notebook.
  • 33. Polynote is a polyglot notebook environment, built from scratch. It supports mixing Scala, Python, SQL, and Vega in a single notebook. Data is shared seamlessly* between languages.
  • 34. Why did we build it? Scientists were avoiding Scala notebooks for experimentation.
  • 35. Why did we build it? Scientists were avoiding Scala notebooks for experimentation. It was just a pain to use Scala and Spark in a notebook.
  • 36. Scala + Spark pain points ● Interactive autocomplete is practically a necessity ● Difficult to find compiler errors ● Dependencies are many and varied ● Spark clashes with dependencies – constantly building shaded JARs
  • 38. Editing improvements Quality-of-life IDE features like autocomplete and parameter hints, error highlighting, etc.
  • 39. Reproducibility Cells see only the state derived from cells above, no matter what order they ran in.
  • 40. Visibility See what the Kernel's up to with the symbol table, task list and executing expression highlight.
  • 41. Data Visualization Use the built-in Data Inspector to browse tabular data and inspect schema. Plot data with the plot editor, or use Vega or matplotlib directly.
  • 42.
  • 43. Polyglot Scala cells and Python cells together in one notebook. Variables from each language are available to the other.
  • 44. Polyglot Scala cells and Python cells together in one notebook. Variables from each language are available to the other. Example use case: data prep in Scala+Spark, model training in Python with TensorFlow/PyTorch/etc
  • 45. Questions? (stop by our demo station!)
  • 47. Matthew Seal Backend Engineer on the Big Data Platform Orchestration Team @ Netflix @codeseal Speaker Details
  • 48. Notebook Wins. ● Shareable ● Easy to Read ● Documentation with Code ● Outputs as Reports ● Familiar Interface ● Multi-Language
  • 49. Things to preserve: ● Results linked to code ● Good visuals ● Easy to share Focus points to extend uses. Things to improve: ● Not versioned ● Mutable state ● Templating
  • 50. Jupyter Notebooks: A Repl Protocol + UIs Jupyter UIs Jupyter Server Jupyter Kernel execute code receive outputs forward requests save / load .ipynb It’s more complex than this in reality develop share
  • 51. A simple library for executing notebooks. EFS S3 Papermill template.ipynb run_1.ipynb run_3.ipynb output notebooks parameterize & run run_2.ipynb run_4.ipynbinput notebook input store s3://output/mseal/ efs://users/mseal/notebooks
  • 52. import papermill as pm pm.execute_notebook('input_nb.ipynb', 'outputs/20190402_run.ipynb') … # Each run can be placed in a unique / sortable path pprint(files_in_directory('outputs')) outputs/ ... 20190401_run.ipynb 20190402_run.ipynb Choose an output location.
  • 53. # Pass template parameters to notebook execution pm.execute_notebook('input_nb.ipynb', 'outputs/20190402_run.ipynb', {'region': 'ca', 'devices': ['phone', 'tablet']}) … [2] # Default values for our potential input parameters region = 'us' devices = ['pc'] date_since = datetime.now() - timedelta(days=30) [3] # Parameters region = 'ca' devices = ['phone', 'tablet'] Add Parameters
  • 54. # Same example as last slide pm.execute_notebook('input_nb.ipynb', 'outputs/20190402_run.ipynb', {'region': 'ca', 'devices': ['phone', 'tablet']}) … # Bash version of that input papermill input_nb.ipynb outputs/20190402_run.ipynb -p region ca -y '{"devices": ["phone", "tablet"]}' Also Available as a CLI
  • 55. Let’s use the CLI ...
  • 56. Notebooks: Programmatically Jupyter UIs Jupyter Server Jupyter Kernel execute code receive outputs forward requests save / load .ipynb develop share Papermill receive outputs Kernel Manager forward requests read write execute code
  • 57. # To add SFTP support you’d add this class class SFTPHandler(): def read(self, file_path): ... def write(self, file_contents, file_path): … # Then add an entry_point for the handler from setuptools import setup, find_packages setup( # all the usual setup arguments ... entry_points={'papermill.io': ['sftp://=papermill_sftp:SFTPHandler']}) # Use the new prefix to read/write from that location pm.execute_notebook('sftp://my_ftp_server.co.uk/input.ipynb', 'sftp://my_ftp_server.co.uk/output.ipynb') Entire Library is Component Based
  • 58. Failed Notebooks A better way to review outcomes
  • 59. Debugging failed jobs. Notebook Job #1 Notebook Job #2 Failed Notebook Job #3 Notebook Job #4 Notebook Job #5
  • 60. Output notebooks are the place to look for failures. They have: ● Stack traces ● Re-runnable code ● Execution logs ● Same interface as input Failed outputs are useful.
  • 61. Find the issue. Test the fix. Update the notebook. Output notebooks are the place to look for failures. They have: ● Stack traces ● Re-runnable code ● Execution logs ● Same interface as input
  • 62. Adds notebook isolation ● Immutable inputs ● Immutable outputs ● Parameterization of notebook runs ● Configurable sourcing / sinking and gives better control of notebook flows via library calls. Changes to the notebook experience.
  • 63. ● Platform Scheduler uses Jupyter Notebooks for all Templates ● Notebooks used to run integration tests, monitor systems, execute ETL, and wrap ML flows. Jupyter Notebooks @Netflix