SlideShare une entreprise Scribd logo
1  sur  34
Télécharger pour lire hors ligne
Computable	
  Content:	
  

Lessons	
  Learned
Paco	
  Nathan	
  @pacoid	
  
Director,	
  Learning	
  Group	
  @	
  O’Reilly	
  Media	
  
2017-­‐05-­‐25
An	
  observation
2
Oriole
3
One	
  approach
4
Jupyter	
  use	
  @	
  O’Reilly	
  Media
▪ Embracing	
  Jupyter	
  notebooks	
  at	
  O’Reilly

oreilly.com/ideas/jupyter-­‐at-­‐oreilly	
  
▪ Learning	
  alongside	
  innovators,	
  thought-­‐by-­‐thought,	
  in	
  context

oreilly.com/ideas/oreilly-­‐oriole-­‐learn-­‐alongside-­‐innovators-­‐
thought-­‐by-­‐thought-­‐in-­‐context	
  
▪ Oriole	
  online	
  tutorials

safaribooksonline.com/oriole/	
  
▪ How	
  do	
  you	
  learn?

oreilly.com/learning/how-­‐do-­‐you-­‐learn
5
For	
  example
▪ A	
  unique	
  new	
  medium	
  blends	
  code,	
  data,	
  text,	
  

and	
  video	
  into	
  a	
  narrated	
  learning	
  experience

with	
  computable	
  content	
  
▪ Purely	
  browser-­‐based	
  UX;	
  zero	
  installation	
  

required	
  
▪ Substantially	
  higher	
  engagement	
  metrics	
  
▪ Opens	
  the	
  door	
  for	
  live	
  coding	
  in	
  assessments
6
oreilly.com/learning/regex-­‐golf-­‐with-­‐peter-­‐norvig
Motivations
O’Reilly	
  needed	
  a	
  way	
  for	
  authors	
  to	
  use	
  Jupyter	
  notebooks	
  to	
  create	
  
professional	
  publications.	
  We	
  also	
  wanted	
  to	
  integrate	
  video	
  narration	
  
into	
  the	
  UX.	
  The	
  result	
  is	
  a	
  unique	
  new	
  medium	
  called	
  Oriole:	
  
▪ Context	
  as	
  a	
  “unit	
  of	
  thought”	
  
▪ Code	
  and	
  video	
  sync’ed	
  together	
  
▪ Each	
  web	
  session	
  get	
  its	
  own	
  Docker	
  container	
  in	
  the	
  cloud	
  
▪ 100%	
  HTML	
  experience,	
  no	
  download/install/config	
  needed	
  
▪ Jupyter	
  notebooks	
  used	
  in	
  the	
  middleware	
  
▪ Leverage	
  interactive,	
  data-­‐driven	
  graphics
7
Outcomes
8
▪ Tutorials	
  are	
  now	
  much	
  quicker	
  to	
  publish	
  than	
  
“traditional”	
  books	
  and	
  videos	
  
▪ Less	
  time	
  required	
  for	
  innovators	
  in	
  programming,	
  
data	
  science,	
  devops,	
  design,	
  etc.	
  –	
  who	
  tend	
  to	
  be	
  
really	
  busy	
  people	
  
▪ Audience	
  gets	
  direct,	
  hands-­‐on,	
  contextualized	
  
experience	
  across	
  a	
  wide	
  variety	
  of	
  programming	
  
environments
Limitations
9
▪ Notebook	
  kernels	
  run	
  REPLs,	
  so	
  older	
  languages	
  
were	
  not	
  feasible	
  
▪ Brief	
  code	
  blocks	
  with	
  tangible	
  outcomes	
  –	
  

precludes	
  business	
  topics,	
  systems	
  engineering,	
  etc	
  
▪ What	
  materials	
  will	
  fit	
  within	
  a	
  Docker	
  container?
Third	
  iteration	
  of	
  Jupyter	
  @	
  O’Reilly
10
1. notebooks	
  as	
  supplemental	
  material	
  to	
  
other	
  published	
  work	
  
2. notebooks	
  published	
  as	
  HTML,	
  as	
  articles	
  
3. computable	
  content,	
  containerized	
  
notebooks	
  +	
  video	
  narratives	
  
4. hosted	
  notebooks
Long-­‐term	
  goal:	
  	
  
make	
  learning	
  materials	
  more	
  powerful	
  by	
  
integrating	
  compute	
  engines	
  +	
  data	
  services
11
Project	
  Jupyter
12
Project	
  Jupyter
▪ The	
  evolution	
  of	
  iPython	
  notebooks,	
  applied	
  
to	
  a	
  range	
  of	
  different	
  programming	
  languages	
  
and	
  environments	
  
▪ https://jupyter.org/	
  
▪ https://github.com/ipython/ipython/wiki/
IPython-­‐kernels-­‐for-­‐other-­‐languages
13
Projects
14
▪ JupyterHub

github.com/jupyterhub/jupyterhub	
  
▪ Jupyter	
  in	
  Education

groups.google.com/forum/#!forum/
jupyter-­‐education	
  
▪ JupyterLab

github.com/jupyterlab/jupyterlab	
  
▪ Jupyter	
  Kernels

github.com/ipython/ipython/wiki/
IPython-­‐kernels-­‐for-­‐other-­‐languages
A	
  suite	
  of	
  network	
  protocols
Think	
  of	
  Jupyter,	
  at	
  its	
  core,	
  as	
  a	
  suite	
  of	
  
network	
  protocols:	
  
Jupyter	
  is	
  to	
  the	
  remote	
  semantics	
  of	
  a	
  REPL	
  
as…	
  
HTTP	
  is	
  to	
  the	
  remote	
  semantics	
  of	
  file	
  share
15
A	
  suite	
  of	
  network	
  protocols
16
Code%runs%
in%a%REPL
Kernel
Edi0ng%+%
Results
Notebook
Network
Protocol
History,	
  Context
17
Notebook	
  metaphor
Wolfram	
  Research	
  introduced	
  
notebooks	
  in	
  1988	
  for	
  working	
  

with	
  Mathematica
18
Related	
  work
19
Literate	
  programming
Don	
  Knuth

literateprogramming.com/	
  
Paraphrased:

Instead	
  of	
  telling	
  computers	
  what	
  to	
  
do,	
  tell	
  other	
  people	
  what	
  you	
  want	
  
the	
  computers	
  to	
  do
20
Speech	
  acts
PyCon	
  2016	
  Keynote,	
  Lorena	
  Barba	
  
youtu.be/ckW1xuGVpug?t=35m11s

(video)	
  
figshare.com/articles/
PyCon2016_Keynote/3407779

(slides)	
  
Highly	
  recommended:	
  speech	
  acts	
  

(based	
  on	
  Winograd	
  and	
  Flores)	
  

as	
  theory	
  here
21
Best	
  Practices
22
The	
  following	
  lessons	
  learned	
  in	
  using	
  Jupyter	
  
notebooks	
  +	
  video	
  for	
  learning	
  materials	
  apply	
  
well	
  in	
  many	
  situations	
  for	
  data	
  science	
  teams	
  
working	
  across	
  an	
  organization
23
Teaching	
  with	
  Jupyter	
  –	
  1	
  of	
  2
▪ focus	
  on	
  a	
  concise	
  “unit	
  of	
  thought”	
  
▪ invest	
  the	
  time	
  and	
  editorial	
  effort	
  to	
  create	
  a	
  good	
  intro	
  
▪ keep	
  your	
  narrative	
  simple	
  and	
  reasonably	
  linear	
  
▪ “chunk”	
  the	
  text	
  and	
  code	
  into	
  understandable	
  parts	
  
▪ alternate	
  between	
  text,	
  code,	
  output,	
  further	
  links,	
  etc.	
  
▪ code	
  cells	
  should	
  be	
  brief	
  (<	
  10	
  lines),	
  must	
  show	
  output
24
Teaching	
  with	
  Jupyter	
  –	
  2	
  of	
  2
▪ load	
  data+libraries	
  from	
  the	
  container,	
  not	
  the	
  network	
  
▪ clear	
  all	
  output	
  then	
  “Run	
  All”	
  –	
  or	
  it	
  didn’t	
  happen	
  
▪ video	
  narratives:	
  there’s	
  text,	
  and	
  there’s	
  subtext...	
  
▪ pause	
  after	
  each	
  “beat”:	
  smile,	
  breathe,	
  let	
  people	
  follow	
  you	
  
For	
  JVM	
  people:	
  stop	
  thinking	
  only	
  about	
  IDEs,	
  Ivy,	
  Maven,	
  etc.	
  (ibid,	
  Knuth1984)

BUILD	
  UBER	
  JARS,	
  LOAD	
  LIBS	
  FROM	
  CONTAINER,	
  NOT	
  THE	
  NETWORK!

(apologies	
  for	
  shousng)
25
Sharing	
  is	
  caring
In	
  data	
  science,	
  we	
  see	
  the	
  benefits	
  to	
  teams	
  for	
  shared	
  
insights,	
  storytelling,	
  etc.	
  
Meanwhile	
  domain	
  expertise	
  is	
  generally	
  more	
  important	
  
than	
  knowledge	
  about	
  tools	
  
There’s	
  a	
  value	
  for	
  developers	
  to	
  use	
  notebooks	
  in	
  lieu	
  of	
  
IDEs	
  in	
  some	
  cases	
  –	
  what	
  are	
  those	
  cases?	
  
GitHub	
  now	
  renders	
  notebooks,	
  so	
  they	
  can	
  be	
  used	
  for	
  
documentation,	
  reporting,	
  etc.	
  
Digital	
  Object	
  Identifiers	
  (DOI)	
  can	
  be	
  assigned	
  through	
  
Zenodo,	
  making	
  notebooks	
  citable	
  for	
  academic	
  publication
26
Authoring	
  and	
  Scale-­‐Out
27
Launchbot.io
28
Achieving	
  scale
▪ Launchbot.io	
  allows	
  a	
  notebook	
  author	
  to	
  
build	
  a	
  container	
  that	
  includes	
  the	
  required	
  
Jupyter	
  kernel,	
  installed	
  libraries,	
  datasets,	
  
etc.	
  
▪ Install	
  Docker	
  on	
  your	
  laptop	
  
▪ Backend	
  uses	
  Git	
  and	
  DockerHub	
  to	
  manage	
  
containers	
  
▪ For	
  scale,	
  deploy	
  to	
  DC/OS	
  or	
  a	
  cloud
29
“A	
  notebook,	
  a	
  container,	
  and	
  ~20	
  minutes	
  

	
  of	
  informal	
  video	
  walk	
  into	
  a	
  bar…”
30
System	
  architecture
31
	
  Tutorial	
  	
  	
  	
  	
  	
  	
  
Middleware
	
  Cluster	
  	
  	
  	
  	
  	
  	
  	
  
O’Reilly	
  Strata	
  
NY,	
  Sep	
  25-­‐28

SG,	
  Dec	
  4-­‐7	
  
O’Reilly	
  ArXficial	
  Intelligence	
  
NY,	
  Jun	
  26-­‐29

SF,	
  Sep	
  17-­‐20	
  
JupyterCon	
  
NY,	
  Aug	
  22-­‐25
32
33
Learn	
  Alongside

Innovators
Just	
  Enough	
  Math Building	
  Data	
  
Science	
  Teams
Hylbert-­‐Speys How	
  Do	
  You	
  Learn?
periodic	
  newslewer	
  with	
  updates,	
  

events,	
  conference	
  summaries…	
  
liber118.com/pxn/

@pacoid
Computable Content: Lessons Learned

Contenu connexe

Similaire à Computable Content: Lessons Learned

Jupyter for Education: Beyond Gutenberg and Erasmus
Jupyter for Education: Beyond Gutenberg and ErasmusJupyter for Education: Beyond Gutenberg and Erasmus
Jupyter for Education: Beyond Gutenberg and ErasmusPaco Nathan
 
Jupyter, A Platform for Data Science at Scale
Jupyter, A Platform for Data Science at ScaleJupyter, A Platform for Data Science at Scale
Jupyter, A Platform for Data Science at ScaleMatthias Bussonnier
 
A Jupyter kernel for Scala and Apache Spark.pdf
A Jupyter kernel for Scala and Apache Spark.pdfA Jupyter kernel for Scala and Apache Spark.pdf
A Jupyter kernel for Scala and Apache Spark.pdfLuciano Resende
 
Scaling notebooks for Deep Learning workloads
Scaling notebooks for Deep Learning workloadsScaling notebooks for Deep Learning workloads
Scaling notebooks for Deep Learning workloadsLuciano Resende
 
Datascope runs on python
Datascope runs on pythonDatascope runs on python
Datascope runs on pythonbo_p
 
Reproducible Open Science with EGI Notebooks, Binder and Zenodo
Reproducible Open Science with EGI Notebooks, Binder and ZenodoReproducible Open Science with EGI Notebooks, Binder and Zenodo
Reproducible Open Science with EGI Notebooks, Binder and ZenodoEGI Federation
 
PyCourse - Self driving python course
PyCourse - Self driving python coursePyCourse - Self driving python course
PyCourse - Self driving python courseEran Shlomo
 
What is Python? An overview of Python for science.
What is Python? An overview of Python for science.What is Python? An overview of Python for science.
What is Python? An overview of Python for science.Nicholas Pringle
 
Software for data management and exploitation
Software for data management and exploitationSoftware for data management and exploitation
Software for data management and exploitationEOSC-hub project
 
OKFest: FabLab Project Documentation
OKFest: FabLab Project DocumentationOKFest: FabLab Project Documentation
OKFest: FabLab Project DocumentationAnu Maa
 
We Need to Talk: How Communication Helps Code
We Need to Talk: How Communication Helps CodeWe Need to Talk: How Communication Helps Code
We Need to Talk: How Communication Helps CodeDocker, Inc.
 
Open Data analysis with EOSC-hub services
Open Data analysis with EOSC-hub servicesOpen Data analysis with EOSC-hub services
Open Data analysis with EOSC-hub servicesOpenAIRE
 
JupyterHub - A "Thing Explainer" Overview
JupyterHub - A "Thing Explainer" OverviewJupyterHub - A "Thing Explainer" Overview
JupyterHub - A "Thing Explainer" OverviewCarol Willing
 
IATEFL / British Council Milan 09
IATEFL / British Council Milan 09IATEFL / British Council Milan 09
IATEFL / British Council Milan 09Seth dickens
 
A Whirlwind Tour Of Python
A Whirlwind Tour Of PythonA Whirlwind Tour Of Python
A Whirlwind Tour Of PythonAsia Smith
 
CloudFoundry-summit-2015-a-look-back
CloudFoundry-summit-2015-a-look-backCloudFoundry-summit-2015-a-look-back
CloudFoundry-summit-2015-a-look-backKrishna-Kumar
 
Data science apps powered by Jupyter Notebooks
Data science apps powered by Jupyter NotebooksData science apps powered by Jupyter Notebooks
Data science apps powered by Jupyter NotebooksNatalino Busa
 
JupyterCon 2017 - Collaboration and automated operation as literate computing...
JupyterCon 2017 - Collaboration and automated operation as literate computing...JupyterCon 2017 - Collaboration and automated operation as literate computing...
JupyterCon 2017 - Collaboration and automated operation as literate computing...No Bu
 
Python.pptx
Python.pptxPython.pptx
Python.pptxabclara
 

Similaire à Computable Content: Lessons Learned (20)

Jupyter for Education: Beyond Gutenberg and Erasmus
Jupyter for Education: Beyond Gutenberg and ErasmusJupyter for Education: Beyond Gutenberg and Erasmus
Jupyter for Education: Beyond Gutenberg and Erasmus
 
Jupyter, A Platform for Data Science at Scale
Jupyter, A Platform for Data Science at ScaleJupyter, A Platform for Data Science at Scale
Jupyter, A Platform for Data Science at Scale
 
A Jupyter kernel for Scala and Apache Spark.pdf
A Jupyter kernel for Scala and Apache Spark.pdfA Jupyter kernel for Scala and Apache Spark.pdf
A Jupyter kernel for Scala and Apache Spark.pdf
 
Scaling notebooks for Deep Learning workloads
Scaling notebooks for Deep Learning workloadsScaling notebooks for Deep Learning workloads
Scaling notebooks for Deep Learning workloads
 
Datascope runs on python
Datascope runs on pythonDatascope runs on python
Datascope runs on python
 
Reproducible Open Science with EGI Notebooks, Binder and Zenodo
Reproducible Open Science with EGI Notebooks, Binder and ZenodoReproducible Open Science with EGI Notebooks, Binder and Zenodo
Reproducible Open Science with EGI Notebooks, Binder and Zenodo
 
PyCourse - Self driving python course
PyCourse - Self driving python coursePyCourse - Self driving python course
PyCourse - Self driving python course
 
What is Python? An overview of Python for science.
What is Python? An overview of Python for science.What is Python? An overview of Python for science.
What is Python? An overview of Python for science.
 
Software for data management and exploitation
Software for data management and exploitationSoftware for data management and exploitation
Software for data management and exploitation
 
OKFest: FabLab Project Documentation
OKFest: FabLab Project DocumentationOKFest: FabLab Project Documentation
OKFest: FabLab Project Documentation
 
We Need to Talk: How Communication Helps Code
We Need to Talk: How Communication Helps CodeWe Need to Talk: How Communication Helps Code
We Need to Talk: How Communication Helps Code
 
Open Data analysis with EOSC-hub services
Open Data analysis with EOSC-hub servicesOpen Data analysis with EOSC-hub services
Open Data analysis with EOSC-hub services
 
JupyterHub - A "Thing Explainer" Overview
JupyterHub - A "Thing Explainer" OverviewJupyterHub - A "Thing Explainer" Overview
JupyterHub - A "Thing Explainer" Overview
 
IATEFL / British Council Milan 09
IATEFL / British Council Milan 09IATEFL / British Council Milan 09
IATEFL / British Council Milan 09
 
A Whirlwind Tour Of Python
A Whirlwind Tour Of PythonA Whirlwind Tour Of Python
A Whirlwind Tour Of Python
 
CloudFoundry-summit-2015-a-look-back
CloudFoundry-summit-2015-a-look-backCloudFoundry-summit-2015-a-look-back
CloudFoundry-summit-2015-a-look-back
 
Conole edinburgh
Conole edinburghConole edinburgh
Conole edinburgh
 
Data science apps powered by Jupyter Notebooks
Data science apps powered by Jupyter NotebooksData science apps powered by Jupyter Notebooks
Data science apps powered by Jupyter Notebooks
 
JupyterCon 2017 - Collaboration and automated operation as literate computing...
JupyterCon 2017 - Collaboration and automated operation as literate computing...JupyterCon 2017 - Collaboration and automated operation as literate computing...
JupyterCon 2017 - Collaboration and automated operation as literate computing...
 
Python.pptx
Python.pptxPython.pptx
Python.pptx
 

Plus de Paco Nathan

Human in the loop: a design pattern for managing teams working with ML
Human in the loop: a design pattern for managing  teams working with MLHuman in the loop: a design pattern for managing  teams working with ML
Human in the loop: a design pattern for managing teams working with MLPaco Nathan
 
Human-in-the-loop: a design pattern for managing teams that leverage ML
Human-in-the-loop: a design pattern for managing teams that leverage MLHuman-in-the-loop: a design pattern for managing teams that leverage ML
Human-in-the-loop: a design pattern for managing teams that leverage MLPaco Nathan
 
Human-in-a-loop: a design pattern for managing teams which leverage ML
Human-in-a-loop: a design pattern for managing teams which leverage MLHuman-in-a-loop: a design pattern for managing teams which leverage ML
Human-in-a-loop: a design pattern for managing teams which leverage MLPaco Nathan
 
Humans in a loop: Jupyter notebooks as a front-end for AI
Humans in a loop: Jupyter notebooks as a front-end for AIHumans in a loop: Jupyter notebooks as a front-end for AI
Humans in a loop: Jupyter notebooks as a front-end for AIPaco Nathan
 
Humans in the loop: AI in open source and industry
Humans in the loop: AI in open source and industryHumans in the loop: AI in open source and industry
Humans in the loop: AI in open source and industryPaco Nathan
 
SF Python Meetup: TextRank in Python
SF Python Meetup: TextRank in PythonSF Python Meetup: TextRank in Python
SF Python Meetup: TextRank in PythonPaco Nathan
 
Use of standards and related issues in predictive analytics
Use of standards and related issues in predictive analyticsUse of standards and related issues in predictive analytics
Use of standards and related issues in predictive analyticsPaco Nathan
 
Data Science in 2016: Moving Up
Data Science in 2016: Moving UpData Science in 2016: Moving Up
Data Science in 2016: Moving UpPaco Nathan
 
Data Science Reinvents Learning?
Data Science Reinvents Learning?Data Science Reinvents Learning?
Data Science Reinvents Learning?Paco Nathan
 
GalvanizeU Seattle: Eleven Almost-Truisms About Data
GalvanizeU Seattle: Eleven Almost-Truisms About DataGalvanizeU Seattle: Eleven Almost-Truisms About Data
GalvanizeU Seattle: Eleven Almost-Truisms About DataPaco Nathan
 
Microservices, containers, and machine learning
Microservices, containers, and machine learningMicroservices, containers, and machine learning
Microservices, containers, and machine learningPaco Nathan
 
GraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesGraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesPaco Nathan
 
Graph Analytics in Spark
Graph Analytics in SparkGraph Analytics in Spark
Graph Analytics in SparkPaco Nathan
 
Apache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big DataApache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big DataPaco Nathan
 
QCon São Paulo: Real-Time Analytics with Spark Streaming
QCon São Paulo: Real-Time Analytics with Spark StreamingQCon São Paulo: Real-Time Analytics with Spark Streaming
QCon São Paulo: Real-Time Analytics with Spark StreamingPaco Nathan
 
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and More
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and MoreStrata 2015 Data Preview: Spark, Data Visualization, YARN, and More
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and MorePaco Nathan
 
A New Year in Data Science: ML Unpaused
A New Year in Data Science: ML UnpausedA New Year in Data Science: ML Unpaused
A New Year in Data Science: ML UnpausedPaco Nathan
 
Microservices, Containers, and Machine Learning
Microservices, Containers, and Machine LearningMicroservices, Containers, and Machine Learning
Microservices, Containers, and Machine LearningPaco Nathan
 
Databricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User GroupDatabricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User GroupPaco Nathan
 
How Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscapeHow Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscapePaco Nathan
 

Plus de Paco Nathan (20)

Human in the loop: a design pattern for managing teams working with ML
Human in the loop: a design pattern for managing  teams working with MLHuman in the loop: a design pattern for managing  teams working with ML
Human in the loop: a design pattern for managing teams working with ML
 
Human-in-the-loop: a design pattern for managing teams that leverage ML
Human-in-the-loop: a design pattern for managing teams that leverage MLHuman-in-the-loop: a design pattern for managing teams that leverage ML
Human-in-the-loop: a design pattern for managing teams that leverage ML
 
Human-in-a-loop: a design pattern for managing teams which leverage ML
Human-in-a-loop: a design pattern for managing teams which leverage MLHuman-in-a-loop: a design pattern for managing teams which leverage ML
Human-in-a-loop: a design pattern for managing teams which leverage ML
 
Humans in a loop: Jupyter notebooks as a front-end for AI
Humans in a loop: Jupyter notebooks as a front-end for AIHumans in a loop: Jupyter notebooks as a front-end for AI
Humans in a loop: Jupyter notebooks as a front-end for AI
 
Humans in the loop: AI in open source and industry
Humans in the loop: AI in open source and industryHumans in the loop: AI in open source and industry
Humans in the loop: AI in open source and industry
 
SF Python Meetup: TextRank in Python
SF Python Meetup: TextRank in PythonSF Python Meetup: TextRank in Python
SF Python Meetup: TextRank in Python
 
Use of standards and related issues in predictive analytics
Use of standards and related issues in predictive analyticsUse of standards and related issues in predictive analytics
Use of standards and related issues in predictive analytics
 
Data Science in 2016: Moving Up
Data Science in 2016: Moving UpData Science in 2016: Moving Up
Data Science in 2016: Moving Up
 
Data Science Reinvents Learning?
Data Science Reinvents Learning?Data Science Reinvents Learning?
Data Science Reinvents Learning?
 
GalvanizeU Seattle: Eleven Almost-Truisms About Data
GalvanizeU Seattle: Eleven Almost-Truisms About DataGalvanizeU Seattle: Eleven Almost-Truisms About Data
GalvanizeU Seattle: Eleven Almost-Truisms About Data
 
Microservices, containers, and machine learning
Microservices, containers, and machine learningMicroservices, containers, and machine learning
Microservices, containers, and machine learning
 
GraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesGraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communities
 
Graph Analytics in Spark
Graph Analytics in SparkGraph Analytics in Spark
Graph Analytics in Spark
 
Apache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big DataApache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big Data
 
QCon São Paulo: Real-Time Analytics with Spark Streaming
QCon São Paulo: Real-Time Analytics with Spark StreamingQCon São Paulo: Real-Time Analytics with Spark Streaming
QCon São Paulo: Real-Time Analytics with Spark Streaming
 
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and More
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and MoreStrata 2015 Data Preview: Spark, Data Visualization, YARN, and More
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and More
 
A New Year in Data Science: ML Unpaused
A New Year in Data Science: ML UnpausedA New Year in Data Science: ML Unpaused
A New Year in Data Science: ML Unpaused
 
Microservices, Containers, and Machine Learning
Microservices, Containers, and Machine LearningMicroservices, Containers, and Machine Learning
Microservices, Containers, and Machine Learning
 
Databricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User GroupDatabricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User Group
 
How Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscapeHow Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscape
 

Dernier

Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 

Dernier (20)

Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 

Computable Content: Lessons Learned

  • 1. Computable  Content:  
 Lessons  Learned Paco  Nathan  @pacoid   Director,  Learning  Group  @  O’Reilly  Media   2017-­‐05-­‐25
  • 5. Jupyter  use  @  O’Reilly  Media ▪ Embracing  Jupyter  notebooks  at  O’Reilly
 oreilly.com/ideas/jupyter-­‐at-­‐oreilly   ▪ Learning  alongside  innovators,  thought-­‐by-­‐thought,  in  context
 oreilly.com/ideas/oreilly-­‐oriole-­‐learn-­‐alongside-­‐innovators-­‐ thought-­‐by-­‐thought-­‐in-­‐context   ▪ Oriole  online  tutorials
 safaribooksonline.com/oriole/   ▪ How  do  you  learn?
 oreilly.com/learning/how-­‐do-­‐you-­‐learn 5
  • 6. For  example ▪ A  unique  new  medium  blends  code,  data,  text,  
 and  video  into  a  narrated  learning  experience
 with  computable  content   ▪ Purely  browser-­‐based  UX;  zero  installation  
 required   ▪ Substantially  higher  engagement  metrics   ▪ Opens  the  door  for  live  coding  in  assessments 6 oreilly.com/learning/regex-­‐golf-­‐with-­‐peter-­‐norvig
  • 7. Motivations O’Reilly  needed  a  way  for  authors  to  use  Jupyter  notebooks  to  create   professional  publications.  We  also  wanted  to  integrate  video  narration   into  the  UX.  The  result  is  a  unique  new  medium  called  Oriole:   ▪ Context  as  a  “unit  of  thought”   ▪ Code  and  video  sync’ed  together   ▪ Each  web  session  get  its  own  Docker  container  in  the  cloud   ▪ 100%  HTML  experience,  no  download/install/config  needed   ▪ Jupyter  notebooks  used  in  the  middleware   ▪ Leverage  interactive,  data-­‐driven  graphics 7
  • 8. Outcomes 8 ▪ Tutorials  are  now  much  quicker  to  publish  than   “traditional”  books  and  videos   ▪ Less  time  required  for  innovators  in  programming,   data  science,  devops,  design,  etc.  –  who  tend  to  be   really  busy  people   ▪ Audience  gets  direct,  hands-­‐on,  contextualized   experience  across  a  wide  variety  of  programming   environments
  • 9. Limitations 9 ▪ Notebook  kernels  run  REPLs,  so  older  languages   were  not  feasible   ▪ Brief  code  blocks  with  tangible  outcomes  –  
 precludes  business  topics,  systems  engineering,  etc   ▪ What  materials  will  fit  within  a  Docker  container?
  • 10. Third  iteration  of  Jupyter  @  O’Reilly 10 1. notebooks  as  supplemental  material  to   other  published  work   2. notebooks  published  as  HTML,  as  articles   3. computable  content,  containerized   notebooks  +  video  narratives   4. hosted  notebooks
  • 11. Long-­‐term  goal:     make  learning  materials  more  powerful  by   integrating  compute  engines  +  data  services 11
  • 13. Project  Jupyter ▪ The  evolution  of  iPython  notebooks,  applied   to  a  range  of  different  programming  languages   and  environments   ▪ https://jupyter.org/   ▪ https://github.com/ipython/ipython/wiki/ IPython-­‐kernels-­‐for-­‐other-­‐languages 13
  • 14. Projects 14 ▪ JupyterHub
 github.com/jupyterhub/jupyterhub   ▪ Jupyter  in  Education
 groups.google.com/forum/#!forum/ jupyter-­‐education   ▪ JupyterLab
 github.com/jupyterlab/jupyterlab   ▪ Jupyter  Kernels
 github.com/ipython/ipython/wiki/ IPython-­‐kernels-­‐for-­‐other-­‐languages
  • 15. A  suite  of  network  protocols Think  of  Jupyter,  at  its  core,  as  a  suite  of   network  protocols:   Jupyter  is  to  the  remote  semantics  of  a  REPL   as…   HTTP  is  to  the  remote  semantics  of  file  share 15
  • 16. A  suite  of  network  protocols 16 Code%runs% in%a%REPL Kernel Edi0ng%+% Results Notebook Network Protocol
  • 18. Notebook  metaphor Wolfram  Research  introduced   notebooks  in  1988  for  working  
 with  Mathematica 18
  • 20. Literate  programming Don  Knuth
 literateprogramming.com/   Paraphrased:
 Instead  of  telling  computers  what  to   do,  tell  other  people  what  you  want   the  computers  to  do 20
  • 21. Speech  acts PyCon  2016  Keynote,  Lorena  Barba   youtu.be/ckW1xuGVpug?t=35m11s
 (video)   figshare.com/articles/ PyCon2016_Keynote/3407779
 (slides)   Highly  recommended:  speech  acts  
 (based  on  Winograd  and  Flores)  
 as  theory  here 21
  • 23. The  following  lessons  learned  in  using  Jupyter   notebooks  +  video  for  learning  materials  apply   well  in  many  situations  for  data  science  teams   working  across  an  organization 23
  • 24. Teaching  with  Jupyter  –  1  of  2 ▪ focus  on  a  concise  “unit  of  thought”   ▪ invest  the  time  and  editorial  effort  to  create  a  good  intro   ▪ keep  your  narrative  simple  and  reasonably  linear   ▪ “chunk”  the  text  and  code  into  understandable  parts   ▪ alternate  between  text,  code,  output,  further  links,  etc.   ▪ code  cells  should  be  brief  (<  10  lines),  must  show  output 24
  • 25. Teaching  with  Jupyter  –  2  of  2 ▪ load  data+libraries  from  the  container,  not  the  network   ▪ clear  all  output  then  “Run  All”  –  or  it  didn’t  happen   ▪ video  narratives:  there’s  text,  and  there’s  subtext...   ▪ pause  after  each  “beat”:  smile,  breathe,  let  people  follow  you   For  JVM  people:  stop  thinking  only  about  IDEs,  Ivy,  Maven,  etc.  (ibid,  Knuth1984)
 BUILD  UBER  JARS,  LOAD  LIBS  FROM  CONTAINER,  NOT  THE  NETWORK!
 (apologies  for  shousng) 25
  • 26. Sharing  is  caring In  data  science,  we  see  the  benefits  to  teams  for  shared   insights,  storytelling,  etc.   Meanwhile  domain  expertise  is  generally  more  important   than  knowledge  about  tools   There’s  a  value  for  developers  to  use  notebooks  in  lieu  of   IDEs  in  some  cases  –  what  are  those  cases?   GitHub  now  renders  notebooks,  so  they  can  be  used  for   documentation,  reporting,  etc.   Digital  Object  Identifiers  (DOI)  can  be  assigned  through   Zenodo,  making  notebooks  citable  for  academic  publication 26
  • 29. Achieving  scale ▪ Launchbot.io  allows  a  notebook  author  to   build  a  container  that  includes  the  required   Jupyter  kernel,  installed  libraries,  datasets,   etc.   ▪ Install  Docker  on  your  laptop   ▪ Backend  uses  Git  and  DockerHub  to  manage   containers   ▪ For  scale,  deploy  to  DC/OS  or  a  cloud 29
  • 30. “A  notebook,  a  container,  and  ~20  minutes  
  of  informal  video  walk  into  a  bar…” 30
  • 31. System  architecture 31  Tutorial               Middleware  Cluster                
  • 32. O’Reilly  Strata   NY,  Sep  25-­‐28
 SG,  Dec  4-­‐7   O’Reilly  ArXficial  Intelligence   NY,  Jun  26-­‐29
 SF,  Sep  17-­‐20   JupyterCon   NY,  Aug  22-­‐25 32
  • 33. 33 Learn  Alongside
 Innovators Just  Enough  Math Building  Data   Science  Teams Hylbert-­‐Speys How  Do  You  Learn? periodic  newslewer  with  updates,  
 events,  conference  summaries…   liber118.com/pxn/
 @pacoid