SlideShare une entreprise Scribd logo
1  sur  35
Télécharger pour lire hors ligne
Computable	
  Content:	
  

Lessons	
  Learned
Paco	
  Nathan	
  @pacoid	
  
Director,	
  Learning	
  Group	
  @	
  O’Reilly	
  Media	
  
2017-­‐06-­‐21
An	
  observation
2
Oriole
3
One	
  approach
4
Jupyter	
  use	
  @	
  O’Reilly	
  Media
▪ Embracing	
  Jupyter	
  notebooks	
  at	
  O’Reilly

oreilly.com/ideas/jupyter-­‐at-­‐oreilly	
  
▪ Learning	
  alongside	
  innovators,	
  thought-­‐by-­‐thought,	
  in	
  context

oreilly.com/ideas/oreilly-­‐oriole-­‐learn-­‐alongside-­‐innovators-­‐
thought-­‐by-­‐thought-­‐in-­‐context	
  
▪ Oriole	
  online	
  tutorials

safaribooksonline.com/oriole/	
  
▪ How	
  do	
  you	
  learn?

oreilly.com/learning/how-­‐do-­‐you-­‐learn
5
For	
  example
▪ A	
  unique	
  new	
  medium	
  blends	
  code,	
  data,	
  text,	
  

and	
  video	
  into	
  a	
  narrated	
  learning	
  experience

with	
  computable	
  content	
  
▪ Purely	
  browser-­‐based	
  UX;	
  zero	
  installation	
  

required	
  
▪ Substantially	
  higher	
  engagement	
  metrics	
  
▪ Opens	
  the	
  door	
  for	
  live	
  coding	
  in	
  assessments
6
oreilly.com/learning/regex-­‐golf-­‐with-­‐peter-­‐norvig
Motivations
O’Reilly	
  needed	
  a	
  way	
  for	
  authors	
  to	
  use	
  Jupyter	
  notebooks	
  to	
  create	
  
professional	
  publications.	
  We	
  also	
  wanted	
  to	
  integrate	
  video	
  narration	
  
into	
  the	
  UX.	
  The	
  result	
  is	
  a	
  unique	
  new	
  medium	
  called	
  Oriole:	
  
▪ Context	
  as	
  a	
  “unit	
  of	
  thought”	
  
▪ Code	
  and	
  video	
  sync’ed	
  together	
  
▪ Each	
  web	
  session	
  get	
  its	
  own	
  Docker	
  container	
  in	
  the	
  cloud	
  
▪ 100%	
  HTML	
  experience,	
  no	
  download/install/config	
  needed	
  
▪ Jupyter	
  notebooks	
  used	
  in	
  the	
  middleware	
  
▪ Leverage	
  interactive,	
  data-­‐driven	
  graphics
7
Outcomes
8
▪ Tutorials	
  are	
  now	
  much	
  quicker	
  to	
  publish	
  than	
  
“traditional”	
  books	
  and	
  videos	
  
▪ Less	
  time	
  required	
  for	
  innovators	
  in	
  programming,	
  
data	
  science,	
  devops,	
  design,	
  etc.	
  –	
  who	
  tend	
  to	
  be	
  
really	
  busy	
  people	
  
▪ Audience	
  gets	
  direct,	
  hands-­‐on,	
  contextualized	
  
experience	
  across	
  a	
  wide	
  variety	
  of	
  programming	
  
environments
Limitations
9
▪ Notebook	
  kernels	
  run	
  REPLs,	
  so	
  older	
  languages	
  
were	
  not	
  feasible	
  
▪ Brief	
  code	
  blocks	
  with	
  tangible	
  outcomes	
  –	
  

precludes	
  business	
  topics,	
  systems	
  engineering,	
  etc	
  
▪ What	
  materials	
  will	
  fit	
  within	
  a	
  Docker	
  container?
Third	
  iteration	
  of	
  Jupyter	
  @	
  O’Reilly
10
1. notebooks	
  as	
  supplemental	
  material	
  to	
  
other	
  published	
  work	
  
2. notebooks	
  published	
  as	
  HTML,	
  as	
  articles	
  
3. computable	
  content,	
  containerized	
  
notebooks	
  +	
  video	
  narratives	
  
4. hosted	
  notebooks
Long-­‐term	
  goal:	
  	
  
make	
  learning	
  materials	
  more	
  powerful	
  by	
  
integrating	
  compute	
  engines	
  +	
  data	
  services
11
Project	
  Jupyter
12
Project	
  Jupyter
▪ The	
  evolution	
  of	
  iPython	
  notebooks,	
  applied	
  
to	
  a	
  range	
  of	
  different	
  programming	
  languages	
  
and	
  environments	
  
▪ https://jupyter.org/	
  
▪ https://github.com/ipython/ipython/wiki/
IPython-­‐kernels-­‐for-­‐other-­‐languages
13
Example	
  Notebook
14
Projects
15
▪ JupyterHub

github.com/jupyterhub/jupyterhub	
  
▪ Jupyter	
  in	
  Education

groups.google.com/forum/#!forum/
jupyter-­‐education	
  
▪ JupyterLab

github.com/jupyterlab/jupyterlab	
  
▪ Jupyter	
  Kernels

github.com/ipython/ipython/wiki/
IPython-­‐kernels-­‐for-­‐other-­‐languages
A	
  suite	
  of	
  network	
  protocols
Think	
  of	
  Jupyter,	
  at	
  its	
  core,	
  as	
  a	
  suite	
  of	
  
network	
  protocols:	
  
Jupyter	
  is	
  to	
  the	
  remote	
  semantics	
  of	
  a	
  REPL	
  
as…	
  
HTTP	
  is	
  to	
  the	
  remote	
  semantics	
  of	
  file	
  share
16
A	
  suite	
  of	
  network	
  protocols
17
Code%runs%
in%a%REPL
Kernel
Edi0ng%+%
Results
Notebook
Network
Protocol
History,	
  Context
18
Notebook	
  metaphor
Wolfram	
  Research	
  introduced	
  
notebooks	
  in	
  1988	
  for	
  working	
  

with	
  Mathematica
19
Related	
  work
20
Literate	
  programming
Don	
  Knuth

literateprogramming.com/	
  
Paraphrased:

Instead	
  of	
  telling	
  computers	
  what	
  to	
  
do,	
  tell	
  other	
  people	
  what	
  you	
  want	
  
the	
  computers	
  to	
  do
21
Speech	
  acts
PyCon	
  2016	
  Keynote,	
  Lorena	
  Barba	
  
youtu.be/ckW1xuGVpug?t=35m11s

(video)	
  
figshare.com/articles/
PyCon2016_Keynote/3407779

(slides)	
  
Highly	
  recommended:	
  speech	
  acts	
  

(based	
  on	
  Winograd	
  and	
  Flores)	
  

as	
  theory	
  here
22
Best	
  Practices
23
The	
  following	
  lessons	
  learned	
  in	
  using	
  Jupyter	
  
notebooks	
  +	
  video	
  for	
  learning	
  materials	
  apply	
  
well	
  in	
  many	
  situations	
  for	
  data	
  science	
  teams	
  
working	
  across	
  an	
  organization
24
Teaching	
  with	
  Jupyter	
  –	
  1	
  of	
  2
▪ focus	
  on	
  a	
  concise	
  “unit	
  of	
  thought”	
  
▪ invest	
  the	
  time	
  and	
  editorial	
  effort	
  to	
  create	
  a	
  good	
  intro	
  
▪ keep	
  your	
  narrative	
  simple	
  and	
  reasonably	
  linear	
  
▪ “chunk”	
  the	
  text	
  and	
  code	
  into	
  understandable	
  parts	
  
▪ alternate	
  between	
  text,	
  code,	
  output,	
  further	
  links,	
  etc.	
  
▪ code	
  cells	
  should	
  be	
  brief	
  (<	
  10	
  lines),	
  must	
  show	
  output
25
Teaching	
  with	
  Jupyter	
  –	
  2	
  of	
  2
▪ load	
  data+libraries	
  from	
  the	
  container,	
  not	
  the	
  network	
  
▪ clear	
  all	
  output	
  then	
  “Run	
  All”	
  –	
  or	
  it	
  didn’t	
  happen	
  
▪ video	
  narratives:	
  there’s	
  text,	
  and	
  there’s	
  subtext...	
  
▪ pause	
  after	
  each	
  “beat”:	
  smile,	
  breathe,	
  let	
  people	
  follow	
  you	
  
For	
  JVM	
  people:	
  stop	
  thinking	
  only	
  about	
  IDEs,	
  Ivy,	
  Maven,	
  etc.	
  (ibid,	
  Knuth1984)

BUILD	
  UBER	
  JARS,	
  LOAD	
  LIBS	
  FROM	
  CONTAINER,	
  NOT	
  THE	
  NETWORK!

(apologies	
  for	
  shousng)
26
Sharing	
  is	
  caring
In	
  data	
  science,	
  we	
  see	
  the	
  benefits	
  to	
  teams	
  for	
  shared	
  
insights,	
  storytelling,	
  etc.	
  
Meanwhile	
  domain	
  expertise	
  is	
  generally	
  more	
  important	
  
than	
  knowledge	
  about	
  tools	
  
There’s	
  a	
  value	
  for	
  developers	
  to	
  use	
  notebooks	
  in	
  lieu	
  of	
  
IDEs	
  in	
  some	
  cases	
  –	
  what	
  are	
  those	
  cases?	
  
GitHub	
  now	
  renders	
  notebooks,	
  so	
  they	
  can	
  be	
  used	
  for	
  
documentation,	
  reporting,	
  etc.	
  
Digital	
  Object	
  Identifiers	
  (DOI)	
  can	
  be	
  assigned	
  through	
  
Zenodo,	
  making	
  notebooks	
  citable	
  for	
  academic	
  publication
27
Authoring	
  and	
  Scale-­‐Out
28
Launchbot.io
29
Achieving	
  scale
▪ Launchbot.io	
  allows	
  a	
  notebook	
  author	
  to	
  
build	
  a	
  container	
  that	
  includes	
  the	
  required	
  
Jupyter	
  kernel,	
  installed	
  libraries,	
  datasets,	
  
etc.	
  
▪ Install	
  Docker	
  on	
  your	
  laptop	
  
▪ Backend	
  uses	
  Git	
  and	
  DockerHub	
  to	
  manage	
  
containers	
  
▪ For	
  scale,	
  deploy	
  to	
  DC/OS	
  or	
  a	
  cloud
30
“A	
  notebook,	
  a	
  container,	
  and	
  ~20	
  minutes	
  

	
  of	
  informal	
  video	
  walk	
  into	
  a	
  bar…”
31
System	
  architecture
32
	
  Tutorial	
  	
  	
  	
  	
  	
  	
  
Middleware
	
  Cluster	
  	
  	
  	
  	
  	
  	
  	
  
O’Reilly	
  Strata	
  
NY,	
  Sep	
  25-­‐28

SG,	
  Dec	
  4-­‐7	
  
O’Reilly	
  ArXficial	
  Intelligence	
  
NY,	
  Jun	
  26-­‐29

SF,	
  Sep	
  17-­‐20	
  
JupyterCon	
  
NY,	
  Aug	
  22-­‐25
33
34
Learn	
  Alongside

Innovators
Just	
  Enough	
  Math Building	
  Data	
  
Science	
  Teams
Hylbert-­‐Speys How	
  Do	
  You	
  Learn?
periodic	
  newslewer	
  with	
  updates,	
  

events,	
  conference	
  summaries…	
  
liber118.com/pxn/

@pacoid
Computable Content

Contenu connexe

Similaire à Computable Content

Jupyter, A Platform for Data Science at Scale
Jupyter, A Platform for Data Science at ScaleJupyter, A Platform for Data Science at Scale
Jupyter, A Platform for Data Science at ScaleMatthias Bussonnier
 
Jupyter for Education: Beyond Gutenberg and Erasmus
Jupyter for Education: Beyond Gutenberg and ErasmusJupyter for Education: Beyond Gutenberg and Erasmus
Jupyter for Education: Beyond Gutenberg and ErasmusPaco Nathan
 
Datascope runs on python
Datascope runs on pythonDatascope runs on python
Datascope runs on pythonbo_p
 
Reproducible Open Science with EGI Notebooks, Binder and Zenodo
Reproducible Open Science with EGI Notebooks, Binder and ZenodoReproducible Open Science with EGI Notebooks, Binder and Zenodo
Reproducible Open Science with EGI Notebooks, Binder and ZenodoEGI Federation
 
What is Python? An overview of Python for science.
What is Python? An overview of Python for science.What is Python? An overview of Python for science.
What is Python? An overview of Python for science.Nicholas Pringle
 
PyCourse - Self driving python course
PyCourse - Self driving python coursePyCourse - Self driving python course
PyCourse - Self driving python courseEran Shlomo
 
A Jupyter kernel for Scala and Apache Spark.pdf
A Jupyter kernel for Scala and Apache Spark.pdfA Jupyter kernel for Scala and Apache Spark.pdf
A Jupyter kernel for Scala and Apache Spark.pdfLuciano Resende
 
Scaling notebooks for Deep Learning workloads
Scaling notebooks for Deep Learning workloadsScaling notebooks for Deep Learning workloads
Scaling notebooks for Deep Learning workloadsLuciano Resende
 
OKFest: FabLab Project Documentation
OKFest: FabLab Project DocumentationOKFest: FabLab Project Documentation
OKFest: FabLab Project DocumentationAnu Maa
 
We Need to Talk: How Communication Helps Code
We Need to Talk: How Communication Helps CodeWe Need to Talk: How Communication Helps Code
We Need to Talk: How Communication Helps CodeDocker, Inc.
 
Software for data management and exploitation
Software for data management and exploitationSoftware for data management and exploitation
Software for data management and exploitationEOSC-hub project
 
A Whirlwind Tour Of Python
A Whirlwind Tour Of PythonA Whirlwind Tour Of Python
A Whirlwind Tour Of PythonAsia Smith
 
Open Data analysis with EOSC-hub services
Open Data analysis with EOSC-hub servicesOpen Data analysis with EOSC-hub services
Open Data analysis with EOSC-hub servicesOpenAIRE
 
JupyterCon 2017 - Collaboration and automated operation as literate computing...
JupyterCon 2017 - Collaboration and automated operation as literate computing...JupyterCon 2017 - Collaboration and automated operation as literate computing...
JupyterCon 2017 - Collaboration and automated operation as literate computing...No Bu
 
JupyterHub - A "Thing Explainer" Overview
JupyterHub - A "Thing Explainer" OverviewJupyterHub - A "Thing Explainer" Overview
JupyterHub - A "Thing Explainer" OverviewCarol Willing
 
Challenges and Guidelines for Reproducible Research with Jupyter Notebook
Challenges and Guidelines for Reproducible Research with Jupyter NotebookChallenges and Guidelines for Reproducible Research with Jupyter Notebook
Challenges and Guidelines for Reproducible Research with Jupyter NotebookPeter Rose
 
Data science apps powered by Jupyter Notebooks
Data science apps powered by Jupyter NotebooksData science apps powered by Jupyter Notebooks
Data science apps powered by Jupyter NotebooksNatalino Busa
 
The Art Of Documentation for Open Source Projects
The Art Of Documentation for Open Source ProjectsThe Art Of Documentation for Open Source Projects
The Art Of Documentation for Open Source ProjectsBen Hall
 
Python.pptx
Python.pptxPython.pptx
Python.pptxabclara
 
Data visualisation in python tool - a brief
Data visualisation in python tool - a briefData visualisation in python tool - a brief
Data visualisation in python tool - a briefameermalik11
 

Similaire à Computable Content (20)

Jupyter, A Platform for Data Science at Scale
Jupyter, A Platform for Data Science at ScaleJupyter, A Platform for Data Science at Scale
Jupyter, A Platform for Data Science at Scale
 
Jupyter for Education: Beyond Gutenberg and Erasmus
Jupyter for Education: Beyond Gutenberg and ErasmusJupyter for Education: Beyond Gutenberg and Erasmus
Jupyter for Education: Beyond Gutenberg and Erasmus
 
Datascope runs on python
Datascope runs on pythonDatascope runs on python
Datascope runs on python
 
Reproducible Open Science with EGI Notebooks, Binder and Zenodo
Reproducible Open Science with EGI Notebooks, Binder and ZenodoReproducible Open Science with EGI Notebooks, Binder and Zenodo
Reproducible Open Science with EGI Notebooks, Binder and Zenodo
 
What is Python? An overview of Python for science.
What is Python? An overview of Python for science.What is Python? An overview of Python for science.
What is Python? An overview of Python for science.
 
PyCourse - Self driving python course
PyCourse - Self driving python coursePyCourse - Self driving python course
PyCourse - Self driving python course
 
A Jupyter kernel for Scala and Apache Spark.pdf
A Jupyter kernel for Scala and Apache Spark.pdfA Jupyter kernel for Scala and Apache Spark.pdf
A Jupyter kernel for Scala and Apache Spark.pdf
 
Scaling notebooks for Deep Learning workloads
Scaling notebooks for Deep Learning workloadsScaling notebooks for Deep Learning workloads
Scaling notebooks for Deep Learning workloads
 
OKFest: FabLab Project Documentation
OKFest: FabLab Project DocumentationOKFest: FabLab Project Documentation
OKFest: FabLab Project Documentation
 
We Need to Talk: How Communication Helps Code
We Need to Talk: How Communication Helps CodeWe Need to Talk: How Communication Helps Code
We Need to Talk: How Communication Helps Code
 
Software for data management and exploitation
Software for data management and exploitationSoftware for data management and exploitation
Software for data management and exploitation
 
A Whirlwind Tour Of Python
A Whirlwind Tour Of PythonA Whirlwind Tour Of Python
A Whirlwind Tour Of Python
 
Open Data analysis with EOSC-hub services
Open Data analysis with EOSC-hub servicesOpen Data analysis with EOSC-hub services
Open Data analysis with EOSC-hub services
 
JupyterCon 2017 - Collaboration and automated operation as literate computing...
JupyterCon 2017 - Collaboration and automated operation as literate computing...JupyterCon 2017 - Collaboration and automated operation as literate computing...
JupyterCon 2017 - Collaboration and automated operation as literate computing...
 
JupyterHub - A "Thing Explainer" Overview
JupyterHub - A "Thing Explainer" OverviewJupyterHub - A "Thing Explainer" Overview
JupyterHub - A "Thing Explainer" Overview
 
Challenges and Guidelines for Reproducible Research with Jupyter Notebook
Challenges and Guidelines for Reproducible Research with Jupyter NotebookChallenges and Guidelines for Reproducible Research with Jupyter Notebook
Challenges and Guidelines for Reproducible Research with Jupyter Notebook
 
Data science apps powered by Jupyter Notebooks
Data science apps powered by Jupyter NotebooksData science apps powered by Jupyter Notebooks
Data science apps powered by Jupyter Notebooks
 
The Art Of Documentation for Open Source Projects
The Art Of Documentation for Open Source ProjectsThe Art Of Documentation for Open Source Projects
The Art Of Documentation for Open Source Projects
 
Python.pptx
Python.pptxPython.pptx
Python.pptx
 
Data visualisation in python tool - a brief
Data visualisation in python tool - a briefData visualisation in python tool - a brief
Data visualisation in python tool - a brief
 

Plus de Paco Nathan

Human in the loop: a design pattern for managing teams working with ML
Human in the loop: a design pattern for managing  teams working with MLHuman in the loop: a design pattern for managing  teams working with ML
Human in the loop: a design pattern for managing teams working with MLPaco Nathan
 
Human-in-the-loop: a design pattern for managing teams that leverage ML
Human-in-the-loop: a design pattern for managing teams that leverage MLHuman-in-the-loop: a design pattern for managing teams that leverage ML
Human-in-the-loop: a design pattern for managing teams that leverage MLPaco Nathan
 
Human-in-a-loop: a design pattern for managing teams which leverage ML
Human-in-a-loop: a design pattern for managing teams which leverage MLHuman-in-a-loop: a design pattern for managing teams which leverage ML
Human-in-a-loop: a design pattern for managing teams which leverage MLPaco Nathan
 
Humans in a loop: Jupyter notebooks as a front-end for AI
Humans in a loop: Jupyter notebooks as a front-end for AIHumans in a loop: Jupyter notebooks as a front-end for AI
Humans in a loop: Jupyter notebooks as a front-end for AIPaco Nathan
 
Humans in the loop: AI in open source and industry
Humans in the loop: AI in open source and industryHumans in the loop: AI in open source and industry
Humans in the loop: AI in open source and industryPaco Nathan
 
SF Python Meetup: TextRank in Python
SF Python Meetup: TextRank in PythonSF Python Meetup: TextRank in Python
SF Python Meetup: TextRank in PythonPaco Nathan
 
Use of standards and related issues in predictive analytics
Use of standards and related issues in predictive analyticsUse of standards and related issues in predictive analytics
Use of standards and related issues in predictive analyticsPaco Nathan
 
Data Science in 2016: Moving Up
Data Science in 2016: Moving UpData Science in 2016: Moving Up
Data Science in 2016: Moving UpPaco Nathan
 
Data Science Reinvents Learning?
Data Science Reinvents Learning?Data Science Reinvents Learning?
Data Science Reinvents Learning?Paco Nathan
 
GalvanizeU Seattle: Eleven Almost-Truisms About Data
GalvanizeU Seattle: Eleven Almost-Truisms About DataGalvanizeU Seattle: Eleven Almost-Truisms About Data
GalvanizeU Seattle: Eleven Almost-Truisms About DataPaco Nathan
 
Microservices, containers, and machine learning
Microservices, containers, and machine learningMicroservices, containers, and machine learning
Microservices, containers, and machine learningPaco Nathan
 
GraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesGraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesPaco Nathan
 
Graph Analytics in Spark
Graph Analytics in SparkGraph Analytics in Spark
Graph Analytics in SparkPaco Nathan
 
Apache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big DataApache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big DataPaco Nathan
 
QCon São Paulo: Real-Time Analytics with Spark Streaming
QCon São Paulo: Real-Time Analytics with Spark StreamingQCon São Paulo: Real-Time Analytics with Spark Streaming
QCon São Paulo: Real-Time Analytics with Spark StreamingPaco Nathan
 
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and More
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and MoreStrata 2015 Data Preview: Spark, Data Visualization, YARN, and More
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and MorePaco Nathan
 
A New Year in Data Science: ML Unpaused
A New Year in Data Science: ML UnpausedA New Year in Data Science: ML Unpaused
A New Year in Data Science: ML UnpausedPaco Nathan
 
Microservices, Containers, and Machine Learning
Microservices, Containers, and Machine LearningMicroservices, Containers, and Machine Learning
Microservices, Containers, and Machine LearningPaco Nathan
 
Databricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User GroupDatabricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User GroupPaco Nathan
 
How Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscapeHow Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscapePaco Nathan
 

Plus de Paco Nathan (20)

Human in the loop: a design pattern for managing teams working with ML
Human in the loop: a design pattern for managing  teams working with MLHuman in the loop: a design pattern for managing  teams working with ML
Human in the loop: a design pattern for managing teams working with ML
 
Human-in-the-loop: a design pattern for managing teams that leverage ML
Human-in-the-loop: a design pattern for managing teams that leverage MLHuman-in-the-loop: a design pattern for managing teams that leverage ML
Human-in-the-loop: a design pattern for managing teams that leverage ML
 
Human-in-a-loop: a design pattern for managing teams which leverage ML
Human-in-a-loop: a design pattern for managing teams which leverage MLHuman-in-a-loop: a design pattern for managing teams which leverage ML
Human-in-a-loop: a design pattern for managing teams which leverage ML
 
Humans in a loop: Jupyter notebooks as a front-end for AI
Humans in a loop: Jupyter notebooks as a front-end for AIHumans in a loop: Jupyter notebooks as a front-end for AI
Humans in a loop: Jupyter notebooks as a front-end for AI
 
Humans in the loop: AI in open source and industry
Humans in the loop: AI in open source and industryHumans in the loop: AI in open source and industry
Humans in the loop: AI in open source and industry
 
SF Python Meetup: TextRank in Python
SF Python Meetup: TextRank in PythonSF Python Meetup: TextRank in Python
SF Python Meetup: TextRank in Python
 
Use of standards and related issues in predictive analytics
Use of standards and related issues in predictive analyticsUse of standards and related issues in predictive analytics
Use of standards and related issues in predictive analytics
 
Data Science in 2016: Moving Up
Data Science in 2016: Moving UpData Science in 2016: Moving Up
Data Science in 2016: Moving Up
 
Data Science Reinvents Learning?
Data Science Reinvents Learning?Data Science Reinvents Learning?
Data Science Reinvents Learning?
 
GalvanizeU Seattle: Eleven Almost-Truisms About Data
GalvanizeU Seattle: Eleven Almost-Truisms About DataGalvanizeU Seattle: Eleven Almost-Truisms About Data
GalvanizeU Seattle: Eleven Almost-Truisms About Data
 
Microservices, containers, and machine learning
Microservices, containers, and machine learningMicroservices, containers, and machine learning
Microservices, containers, and machine learning
 
GraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesGraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communities
 
Graph Analytics in Spark
Graph Analytics in SparkGraph Analytics in Spark
Graph Analytics in Spark
 
Apache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big DataApache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big Data
 
QCon São Paulo: Real-Time Analytics with Spark Streaming
QCon São Paulo: Real-Time Analytics with Spark StreamingQCon São Paulo: Real-Time Analytics with Spark Streaming
QCon São Paulo: Real-Time Analytics with Spark Streaming
 
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and More
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and MoreStrata 2015 Data Preview: Spark, Data Visualization, YARN, and More
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and More
 
A New Year in Data Science: ML Unpaused
A New Year in Data Science: ML UnpausedA New Year in Data Science: ML Unpaused
A New Year in Data Science: ML Unpaused
 
Microservices, Containers, and Machine Learning
Microservices, Containers, and Machine LearningMicroservices, Containers, and Machine Learning
Microservices, Containers, and Machine Learning
 
Databricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User GroupDatabricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User Group
 
How Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscapeHow Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscape
 

Dernier

Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 

Dernier (20)

Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 

Computable Content

  • 1. Computable  Content:  
 Lessons  Learned Paco  Nathan  @pacoid   Director,  Learning  Group  @  O’Reilly  Media   2017-­‐06-­‐21
  • 5. Jupyter  use  @  O’Reilly  Media ▪ Embracing  Jupyter  notebooks  at  O’Reilly
 oreilly.com/ideas/jupyter-­‐at-­‐oreilly   ▪ Learning  alongside  innovators,  thought-­‐by-­‐thought,  in  context
 oreilly.com/ideas/oreilly-­‐oriole-­‐learn-­‐alongside-­‐innovators-­‐ thought-­‐by-­‐thought-­‐in-­‐context   ▪ Oriole  online  tutorials
 safaribooksonline.com/oriole/   ▪ How  do  you  learn?
 oreilly.com/learning/how-­‐do-­‐you-­‐learn 5
  • 6. For  example ▪ A  unique  new  medium  blends  code,  data,  text,  
 and  video  into  a  narrated  learning  experience
 with  computable  content   ▪ Purely  browser-­‐based  UX;  zero  installation  
 required   ▪ Substantially  higher  engagement  metrics   ▪ Opens  the  door  for  live  coding  in  assessments 6 oreilly.com/learning/regex-­‐golf-­‐with-­‐peter-­‐norvig
  • 7. Motivations O’Reilly  needed  a  way  for  authors  to  use  Jupyter  notebooks  to  create   professional  publications.  We  also  wanted  to  integrate  video  narration   into  the  UX.  The  result  is  a  unique  new  medium  called  Oriole:   ▪ Context  as  a  “unit  of  thought”   ▪ Code  and  video  sync’ed  together   ▪ Each  web  session  get  its  own  Docker  container  in  the  cloud   ▪ 100%  HTML  experience,  no  download/install/config  needed   ▪ Jupyter  notebooks  used  in  the  middleware   ▪ Leverage  interactive,  data-­‐driven  graphics 7
  • 8. Outcomes 8 ▪ Tutorials  are  now  much  quicker  to  publish  than   “traditional”  books  and  videos   ▪ Less  time  required  for  innovators  in  programming,   data  science,  devops,  design,  etc.  –  who  tend  to  be   really  busy  people   ▪ Audience  gets  direct,  hands-­‐on,  contextualized   experience  across  a  wide  variety  of  programming   environments
  • 9. Limitations 9 ▪ Notebook  kernels  run  REPLs,  so  older  languages   were  not  feasible   ▪ Brief  code  blocks  with  tangible  outcomes  –  
 precludes  business  topics,  systems  engineering,  etc   ▪ What  materials  will  fit  within  a  Docker  container?
  • 10. Third  iteration  of  Jupyter  @  O’Reilly 10 1. notebooks  as  supplemental  material  to   other  published  work   2. notebooks  published  as  HTML,  as  articles   3. computable  content,  containerized   notebooks  +  video  narratives   4. hosted  notebooks
  • 11. Long-­‐term  goal:     make  learning  materials  more  powerful  by   integrating  compute  engines  +  data  services 11
  • 13. Project  Jupyter ▪ The  evolution  of  iPython  notebooks,  applied   to  a  range  of  different  programming  languages   and  environments   ▪ https://jupyter.org/   ▪ https://github.com/ipython/ipython/wiki/ IPython-­‐kernels-­‐for-­‐other-­‐languages 13
  • 15. Projects 15 ▪ JupyterHub
 github.com/jupyterhub/jupyterhub   ▪ Jupyter  in  Education
 groups.google.com/forum/#!forum/ jupyter-­‐education   ▪ JupyterLab
 github.com/jupyterlab/jupyterlab   ▪ Jupyter  Kernels
 github.com/ipython/ipython/wiki/ IPython-­‐kernels-­‐for-­‐other-­‐languages
  • 16. A  suite  of  network  protocols Think  of  Jupyter,  at  its  core,  as  a  suite  of   network  protocols:   Jupyter  is  to  the  remote  semantics  of  a  REPL   as…   HTTP  is  to  the  remote  semantics  of  file  share 16
  • 17. A  suite  of  network  protocols 17 Code%runs% in%a%REPL Kernel Edi0ng%+% Results Notebook Network Protocol
  • 19. Notebook  metaphor Wolfram  Research  introduced   notebooks  in  1988  for  working  
 with  Mathematica 19
  • 21. Literate  programming Don  Knuth
 literateprogramming.com/   Paraphrased:
 Instead  of  telling  computers  what  to   do,  tell  other  people  what  you  want   the  computers  to  do 21
  • 22. Speech  acts PyCon  2016  Keynote,  Lorena  Barba   youtu.be/ckW1xuGVpug?t=35m11s
 (video)   figshare.com/articles/ PyCon2016_Keynote/3407779
 (slides)   Highly  recommended:  speech  acts  
 (based  on  Winograd  and  Flores)  
 as  theory  here 22
  • 24. The  following  lessons  learned  in  using  Jupyter   notebooks  +  video  for  learning  materials  apply   well  in  many  situations  for  data  science  teams   working  across  an  organization 24
  • 25. Teaching  with  Jupyter  –  1  of  2 ▪ focus  on  a  concise  “unit  of  thought”   ▪ invest  the  time  and  editorial  effort  to  create  a  good  intro   ▪ keep  your  narrative  simple  and  reasonably  linear   ▪ “chunk”  the  text  and  code  into  understandable  parts   ▪ alternate  between  text,  code,  output,  further  links,  etc.   ▪ code  cells  should  be  brief  (<  10  lines),  must  show  output 25
  • 26. Teaching  with  Jupyter  –  2  of  2 ▪ load  data+libraries  from  the  container,  not  the  network   ▪ clear  all  output  then  “Run  All”  –  or  it  didn’t  happen   ▪ video  narratives:  there’s  text,  and  there’s  subtext...   ▪ pause  after  each  “beat”:  smile,  breathe,  let  people  follow  you   For  JVM  people:  stop  thinking  only  about  IDEs,  Ivy,  Maven,  etc.  (ibid,  Knuth1984)
 BUILD  UBER  JARS,  LOAD  LIBS  FROM  CONTAINER,  NOT  THE  NETWORK!
 (apologies  for  shousng) 26
  • 27. Sharing  is  caring In  data  science,  we  see  the  benefits  to  teams  for  shared   insights,  storytelling,  etc.   Meanwhile  domain  expertise  is  generally  more  important   than  knowledge  about  tools   There’s  a  value  for  developers  to  use  notebooks  in  lieu  of   IDEs  in  some  cases  –  what  are  those  cases?   GitHub  now  renders  notebooks,  so  they  can  be  used  for   documentation,  reporting,  etc.   Digital  Object  Identifiers  (DOI)  can  be  assigned  through   Zenodo,  making  notebooks  citable  for  academic  publication 27
  • 30. Achieving  scale ▪ Launchbot.io  allows  a  notebook  author  to   build  a  container  that  includes  the  required   Jupyter  kernel,  installed  libraries,  datasets,   etc.   ▪ Install  Docker  on  your  laptop   ▪ Backend  uses  Git  and  DockerHub  to  manage   containers   ▪ For  scale,  deploy  to  DC/OS  or  a  cloud 30
  • 31. “A  notebook,  a  container,  and  ~20  minutes  
  of  informal  video  walk  into  a  bar…” 31
  • 32. System  architecture 32  Tutorial               Middleware  Cluster                
  • 33. O’Reilly  Strata   NY,  Sep  25-­‐28
 SG,  Dec  4-­‐7   O’Reilly  ArXficial  Intelligence   NY,  Jun  26-­‐29
 SF,  Sep  17-­‐20   JupyterCon   NY,  Aug  22-­‐25 33
  • 34. 34 Learn  Alongside
 Innovators Just  Enough  Math Building  Data   Science  Teams Hylbert-­‐Speys How  Do  You  Learn? periodic  newslewer  with  updates,  
 events,  conference  summaries…   liber118.com/pxn/
 @pacoid