Jupyter notebooks are transforming the way we look at computing, coding and problem solving. But is this the only “data scientist experience” that this technology can provide?
In this webinar, Natalino will sketch how you could use Jupyter to create interactive and compelling data science web applications and provide new ways of data exploration and analysis. In the background, these apps are still powered by well understood and documented Jupyter notebooks.
They will present an architecture which is composed of four parts: a jupyter server-only gateway, a Scala/Spark Jupyter kernel, a Spark cluster and a angular/bootstrap web application.
3. 3
Icons made by Gregor Cresnar
from www.flaticon.com is licensed by CC 3.0 BY
Learning: The Scientific Method
Ørsted's "First Introduction to General Physics" (1811)
https://en.m.wikipedia.org/wiki/History_of_scientific_method
observation hypothesis deduction synthesis
Hans Christian Ørsted
experiment
8. 8
Jupyter notebook: why?
Language of choice
The Notebook has support for
over 40 programming
languages, including those
popular in Data Science such as
Python, R, Julia and Scala.
Share notebooks
Notebooks can be shared with
others using email, Dropbox,
GitHub and the Jupyter
Notebook Viewer.
Interactive widgets
Code can produce rich output
such as images, videos, LaTeX,
and JavaScript. Interactive
widgets can be used to
manipulate and visualize data in
realtime.
Big data integration
Leverage big data tools, such as
Apache Spark, from Python, R
and Scala. Explore that same
data with pandas, scikit-learn,
ggplot2, dplyr, etc.
11. 11
Architecture of a Jupyter Notebook
• Modular architecture:
Web App, Server, Kernel
• Kernels:
Python, R, Scala, Julia, Bash, SPARKQL
• Web App:
Asynchronous, rich editing, syntax highlight, export and share
12. 12
Jupyter Notebook
● Narratives and Use Cases
Narratives are collaborative, shareable, publishable, and reproducible. We believe that
Narratives help both yourself and other researchers by sharing your use of Jupyter
projects, technical specifics of your deployment, and installation and configuration tips so
that others can learn from your experiences.
From https://jupyter.readthedocs.io/en/latest/use-cases/content-user.html
18. 18
Build your own narrative!
What do you need?
Understand how to communicate to the jupyter server
Two ways: websockets or http api endpoints
Build your own web application
Many ways: e.g. angular, polymer, dart, etc
1
2
24. 24
Jupyter Gateway: expose API endpoints
Declare the endpoint
Produce the JSON payload
GET http://localhost:8800/cog/datasets/1
25. 25
Jupyter Gateway: consume the data
Consume the JSON payload
GET http://localhost:8800/cog/datasets/1
app.controller('datasetCtrl', function ($scope, $routeParams, $http) {
var id= $routeParams.id;
$http({
method: 'GET',
url: '/cog/datasets/'+id
}).then(function successCallback(response) {
// this callback will be called asynchronously
// when the response is available
$scope.d = response.data
}, function errorCallback(response) {
// called asynchronously if an error occurs
// or server returns response with an error status.
});
});
26. 26
<div class="row">
<div class="col-md-9 offset-md-2">
<p class="small">{{d.ds.rows}} obs. of {{d.ds.cols}} variables <br/>
NA rows:{{d.ds.na.rows}}, columns:{{d.ds.na.cols}}</p>
</div>
</div>
...
<tr ng-repeat="v in d.vars">
<td><a href="#/ds/{{d.ds.id}}/variables/{{v.id}}">{{v.name}}</a></td>
<td class="small">{{ v.sample.toString() }}</td>
<td>{{v.type.vtype}}</td>
<td>{{v.type.tcoerce}}</td>
<td>{{v.type.unique}}</td>
<td>{{v.type.nan}}</td>
<td>{{v.type.valid}}</td>
<td>{{v.type.quality}}</td>
...
Jupyter Gateway: consume the data
$scope.d
Render the angular
scope object
29. 29
Dockerize your jupyter gateway api
Add the jupyter gateway
FROM jupyter/all-spark-notebook
...
# add some extra packages
ADD packages /srv/
RUN pip install -r /srv/packages
# install the kernel gateway
RUN pip install jupyter_kernel_gateway
ENV JUPYTER_GATEWAY=1
# REST API is designed as notebooks
ADD notebooks /srv/notebooks
Add the notebook which powers the API
30. 30
Dockerize your jupyter gateway api
IMAGE=autoscience/kernel_gateway
docker build -t $(IMAGE) .
docker run --rm -ti -p 8888:8888 $(IMAGE)
jupyter kernelgateway
--KernelGatewayApp.ip=0.0.0.0
--KernelGatewayApp.port=8888
--KernelGatewayApp.api=notebook-http
--KernelGatewayApp.seed_uri=/srv/notebooks/autoscience.ipynb
32. 32
Summary
• Jupyter notebook is a great way to create and share
data-driven uses cases and projects
• Jupyter is more than notebooks
– gateway, kernels, hub, etc
• Narratives powered by jupyter
– O’ Reilly Orioles
– build your own: autoscience example