SlideShare une entreprise Scribd logo
1  sur  26
Télécharger pour lire hors ligne
Data Science Apps: Beyond Notebooks
Natalino Busa
2 Natalino Busa - @natbusa
Linkedin + Twitter + Github:
@natbusa
DBS
Teradata
Cognitive Finance
ING Group
O’Reilly
Philips
3 Natalino Busa - @natbusa
Icons made by Gregor Cresnar
from www.flaticon.com is licensed by CC
Learning: The Scientific Method
Ørsted's "First Introduction to General Physics" (1811)
https://en.m.wikipedia.org/wiki/History_of_scientific_method
observation hypothesis deduction synthesis
Hans Christian Ørsted
experiment
4 Natalino Busa - @natbusa
Data Scientist Experience
5 Natalino Busa - @natbusa
CloudTools Math Humans
6 Natalino Busa - @natbusa
The Jupyter Project
http://jupyter.org
7 Natalino Busa - @natbusa
Jupyter notebook: what is it?
The Jupyter Notebook
The Jupyter Notebook is a web application that
allows you to create and share documents that
contain live code, equations, visualizations and
explanatory text.
Uses include: data cleaning and
transformation, numerical simulation,
statistical modeling, machine learning and
much more.
credit : Jupyter project
extracted from http://jupyter.org/index.html
8 Natalino Busa - @natbusa
Jupyter notebook: why?
Language of choice
The Notebook has support for
over 40 programming
languages, including those
popular in Data Science such as
Python, R, Julia and Scala.
Share notebooks
Notebooks can be shared with
others using email, Dropbox,
GitHub and the Jupyter
Notebook Viewer.
Interactive widgets
Code can produce rich output
such as images, videos, LaTeX,
and JavaScript. Interactive
widgets can be used to
manipulate and visualize data in
realtime.
Big data integration
Leverage big data tools, such as
Apache Spark, from Python, R
and Scala. Explore that same
data with pandas, scikit-learn,
ggplot2, dplyr, etc.
credit : Jupyter project
extracted from http://jupyter.org/index.html
9 Natalino Busa - @natbusa
Text Cell
Code Cell
Cell Input
Cell Output
Edit, Run, Kernel, Widgets Menu’s
Kernel Type
Cell output: ASCII, HTML, Image.
etc
10 Natalino Busa - @natbusa
Architecture of a Jupyter Notebook
Jupyter Notebook Server Kernel
∅MQ
Notebook files
Jupyter Notebook
Web App
Web
Browser
HTTP
Websockets
https://jupyter.readthedocs.io/en/latest/architecture/how_jupyter_ipython_work.html
11 Natalino Busa - @natbusa
Architecture of a Jupyter Notebook
• Modular architecture:
Web App, Server, Kernel
• Kernels:
Python, R, Scala, Bash, SQL
• Web App:
Asynchronous, rich editing, syntax highlight, export and share
12 Natalino Busa - @natbusa
Jupyter Notebook
● Narratives and Use Cases
Narratives are collaborative, shareable, publishable, and reproducible. We believe that
Narratives help both yourself and other researchers by sharing your use of Jupyter
projects, technical specifics of your deployment, and installation and configuration tips so
that others can learn from your experiences.
From https://jupyter.readthedocs.io/en/latest/use-cases/content-user.html
13 Natalino Busa - @natbusa
Jupyter is more than Notebooks
“ What if I told you that the notebook
is NOT the only sort of narrative that
you can create with the Jupyter
project? ”
14 Natalino Busa - @natbusa
Examples of Jupyter powered narratives
● O’Reilly Orioles
● Examples - build your own!
15 Natalino Busa - @natbusa
Orioles: A powerful educational narrative
16 Natalino Busa - @natbusa
Geolocated clustering and prediction
services with scikit-learn
Learn how to build a venue
recommender and a geofencing
alerting engine using geolocated data,
ML clustering algorithms, and
scikit-learn
17 Natalino Busa - @natbusa
Build your own narrative!
What do you need?
Understand how to communicate to the jupyter server
Two ways: websockets or http api endpoints
Build your own web application
Many ways: e.g. angular, polymer, dart, etc
1
2
18 Natalino Busa - @natbusa
Demos: kernel gateway
Purpose:
- Understand how to expose API endpoints
- Build your own narrative!
- Productivity gain: faster app prototyping
19 Natalino Busa - @natbusa
20 Natalino Busa - @natbusa
Jupyter Gateway: expose API endpoints
Declare the endpoint
Declear MIME type, Headers, Status
GET http://localhost:8800/counters/my_counter
21 Natalino Busa - @natbusa
Jupyter: docker stacks
Docker container:
jupyter notebook + apache toree
https://github.com/jupyter/docker-stacks
22 Natalino Busa - @natbusa
Dockerize your jupyter gateway api
IMAGE=demos/kernel_gateway_demo
docker build -t $(IMAGE) .
docker run -p 8888:8888 $(IMAGE) 
jupyter kernelgateway
--KernelGatewayApp.ip=0.0.0.0 
--KernelGatewayApp.port=8888 
--KernelGatewayApp.api=notebook-http 
--KernelGatewayApp.seed_uri=/srv/notebooks/autoscience.ipynb
23 Natalino Busa - @natbusa
Big Data apps:
Dockerize your jupyter gateway api with Toree
Jupyter Kernel Gateway Toree Kernel
∅MQ
Notebook files
Web
Browser
Your own
Web App
HTTP REST API
Docker
Containers
onewebsession=
oneserveronacloud
24 Natalino Busa - @natbusa
Summary
• Jupyter notebook is a great way to create and share
data-driven uses cases and projects
• Jupyter is more than notebooks
– gateway, kernels, hub, etc
• Narratives powered by jupyter
– O’ Reilly Orioles
– build your own narrative
25 Natalino Busa - @natbusa
Resources
Jupyter
http://jupyter.org/index.html
https://jupyter.readthedocs.io/en/latest/index.html#
Jupyter Kernel Gateway
https://github.com/jupyter/kernel_gateway
http://jupyter-kernel-gateway.readthedocs.io/en/latest/
Jupyter Con (first of its kind!)
https://conferences.oreilly.com/jupyter/jup-ny
Apache Toree (Spark Kernel)
https://toree.apache.org/
Web application dev
https://angular.io/
https://www.polymer-project.org/1.0/
Docker
https://github.com/jupyter/docker-stacks
https://www.docker.com/
26 Natalino Busa - @natbusa
Linkedin and Twitter:
@natbusa

Contenu connexe

Tendances

H2O & Tensorflow - Fabrizio
H2O & Tensorflow - Fabrizio H2O & Tensorflow - Fabrizio
H2O & Tensorflow - Fabrizio Sri Ambati
 
Big Data with Modern R & Spark
Big Data with Modern R & SparkBig Data with Modern R & Spark
Big Data with Modern R & SparkXavier de Pedro
 
Reproducible Workflow with Cytoscape and Jupyter Notebook
Reproducible Workflow with Cytoscape and Jupyter NotebookReproducible Workflow with Cytoscape and Jupyter Notebook
Reproducible Workflow with Cytoscape and Jupyter NotebookKeiichiro Ono
 
Building Reproducible Network Data Analysis / Visualization Workflows
Building Reproducible Network Data Analysis / Visualization WorkflowsBuilding Reproducible Network Data Analysis / Visualization Workflows
Building Reproducible Network Data Analysis / Visualization WorkflowsKeiichiro Ono
 
Programming for Everybody in Python
Programming for Everybody in PythonProgramming for Everybody in Python
Programming for Everybody in PythonCharles Severance
 
Cytoscape and External Data Analysis Tools
Cytoscape and External Data Analysis ToolsCytoscape and External Data Analysis Tools
Cytoscape and External Data Analysis ToolsKeiichiro Ono
 
Deep learning with Tensorflow in R
Deep learning with Tensorflow in RDeep learning with Tensorflow in R
Deep learning with Tensorflow in Rmikaelhuss
 
Collaborations in the Extreme: 
The rise of open code development in the scie...
Collaborations in the Extreme: 
The rise of open code development in the scie...Collaborations in the Extreme: 
The rise of open code development in the scie...
Collaborations in the Extreme: 
The rise of open code development in the scie...Kelle Cruz
 
Halko_santafe_2015
Halko_santafe_2015Halko_santafe_2015
Halko_santafe_2015Nathan Halko
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...Spark Summit
 
Introduction to Biological Network Analysis and Visualization with Cytoscape ...
Introduction to Biological Network Analysis and Visualization with Cytoscape ...Introduction to Biological Network Analysis and Visualization with Cytoscape ...
Introduction to Biological Network Analysis and Visualization with Cytoscape ...Keiichiro Ono
 
Making Machine Learning Scale: Single Machine and Distributed
Making Machine Learning Scale: Single Machine and DistributedMaking Machine Learning Scale: Single Machine and Distributed
Making Machine Learning Scale: Single Machine and DistributedTuri, Inc.
 
Python for Big Data Analytics
Python for Big Data AnalyticsPython for Big Data Analytics
Python for Big Data AnalyticsEdureka!
 

Tendances (14)

H2O & Tensorflow - Fabrizio
H2O & Tensorflow - Fabrizio H2O & Tensorflow - Fabrizio
H2O & Tensorflow - Fabrizio
 
Big Data with Modern R & Spark
Big Data with Modern R & SparkBig Data with Modern R & Spark
Big Data with Modern R & Spark
 
OpenStack NSA
OpenStack NSAOpenStack NSA
OpenStack NSA
 
Reproducible Workflow with Cytoscape and Jupyter Notebook
Reproducible Workflow with Cytoscape and Jupyter NotebookReproducible Workflow with Cytoscape and Jupyter Notebook
Reproducible Workflow with Cytoscape and Jupyter Notebook
 
Building Reproducible Network Data Analysis / Visualization Workflows
Building Reproducible Network Data Analysis / Visualization WorkflowsBuilding Reproducible Network Data Analysis / Visualization Workflows
Building Reproducible Network Data Analysis / Visualization Workflows
 
Programming for Everybody in Python
Programming for Everybody in PythonProgramming for Everybody in Python
Programming for Everybody in Python
 
Cytoscape and External Data Analysis Tools
Cytoscape and External Data Analysis ToolsCytoscape and External Data Analysis Tools
Cytoscape and External Data Analysis Tools
 
Deep learning with Tensorflow in R
Deep learning with Tensorflow in RDeep learning with Tensorflow in R
Deep learning with Tensorflow in R
 
Collaborations in the Extreme: 
The rise of open code development in the scie...
Collaborations in the Extreme: 
The rise of open code development in the scie...Collaborations in the Extreme: 
The rise of open code development in the scie...
Collaborations in the Extreme: 
The rise of open code development in the scie...
 
Halko_santafe_2015
Halko_santafe_2015Halko_santafe_2015
Halko_santafe_2015
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
 
Introduction to Biological Network Analysis and Visualization with Cytoscape ...
Introduction to Biological Network Analysis and Visualization with Cytoscape ...Introduction to Biological Network Analysis and Visualization with Cytoscape ...
Introduction to Biological Network Analysis and Visualization with Cytoscape ...
 
Making Machine Learning Scale: Single Machine and Distributed
Making Machine Learning Scale: Single Machine and DistributedMaking Machine Learning Scale: Single Machine and Distributed
Making Machine Learning Scale: Single Machine and Distributed
 
Python for Big Data Analytics
Python for Big Data AnalyticsPython for Big Data Analytics
Python for Big Data Analytics
 

Similaire à Data science apps powered by Jupyter Notebooks

Data Science Apps: Beyond Notebooks - Natalino Busa - Codemotion Amsterdam 2017
Data Science Apps: Beyond Notebooks - Natalino Busa - Codemotion Amsterdam 2017Data Science Apps: Beyond Notebooks - Natalino Busa - Codemotion Amsterdam 2017
Data Science Apps: Beyond Notebooks - Natalino Busa - Codemotion Amsterdam 2017Codemotion
 
Jupyter con meetup extended jupyter kernel gateway
Jupyter con meetup   extended jupyter kernel gatewayJupyter con meetup   extended jupyter kernel gateway
Jupyter con meetup extended jupyter kernel gatewayLuciano Resende
 
Building analytical microservices powered by jupyter kernels
Building analytical microservices powered by jupyter kernelsBuilding analytical microservices powered by jupyter kernels
Building analytical microservices powered by jupyter kernelsLuciano Resende
 
The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017
The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017
The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017Luciano Resende
 
An Enterprise Analytics Platform with Jupyter Notebooks and Apache Spark
An Enterprise Analytics Platform with Jupyter Notebooks and Apache SparkAn Enterprise Analytics Platform with Jupyter Notebooks and Apache Spark
An Enterprise Analytics Platform with Jupyter Notebooks and Apache SparkLuciano Resende
 
Cytoscape: Now and Future
Cytoscape: Now and FutureCytoscape: Now and Future
Cytoscape: Now and FutureKeiichiro Ono
 
Computable content: Notebooks, containers, and data-centric organizational le...
Computable content: Notebooks, containers, and data-centric organizational le...Computable content: Notebooks, containers, and data-centric organizational le...
Computable content: Notebooks, containers, and data-centric organizational le...Domino Data Lab
 
2018 02 20-jeg_index
2018 02 20-jeg_index2018 02 20-jeg_index
2018 02 20-jeg_indexChester Chen
 
Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes John Archer
 
A Whirlwind Tour Of Python
A Whirlwind Tour Of PythonA Whirlwind Tour Of Python
A Whirlwind Tour Of PythonAsia Smith
 
The Ai & I at Work
The Ai & I at WorkThe Ai & I at Work
The Ai & I at WorkTarek Hoteit
 
Intro to Python Data Analysis in Wakari
Intro to Python Data Analysis in WakariIntro to Python Data Analysis in Wakari
Intro to Python Data Analysis in WakariKarissa Rae McKelvey
 
Python PPT
Python PPTPython PPT
Python PPTEdureka!
 
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a... The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...Big Data Spain
 
Use open source software to develop ideas at work
Use open source software to develop ideas at workUse open source software to develop ideas at work
Use open source software to develop ideas at workSammy Fung
 
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...Big Data Spain
 
Big analytics meetup - Extended Jupyter Kernel Gateway
Big analytics meetup - Extended Jupyter Kernel GatewayBig analytics meetup - Extended Jupyter Kernel Gateway
Big analytics meetup - Extended Jupyter Kernel GatewayLuciano Resende
 
Everyone wants (someone else) to do it: writing documentation for open source...
Everyone wants (someone else) to do it: writing documentation for open source...Everyone wants (someone else) to do it: writing documentation for open source...
Everyone wants (someone else) to do it: writing documentation for open source...Jody Garnett
 
Using oss and hacker culture at an internet company at osc/tokyo 2014/03/01
Using oss and hacker culture at an internet company at osc/tokyo 2014/03/01Using oss and hacker culture at an internet company at osc/tokyo 2014/03/01
Using oss and hacker culture at an internet company at osc/tokyo 2014/03/01Hiro Yoshioka
 

Similaire à Data science apps powered by Jupyter Notebooks (20)

Data Science Apps: Beyond Notebooks - Natalino Busa - Codemotion Amsterdam 2017
Data Science Apps: Beyond Notebooks - Natalino Busa - Codemotion Amsterdam 2017Data Science Apps: Beyond Notebooks - Natalino Busa - Codemotion Amsterdam 2017
Data Science Apps: Beyond Notebooks - Natalino Busa - Codemotion Amsterdam 2017
 
Jupyter con meetup extended jupyter kernel gateway
Jupyter con meetup   extended jupyter kernel gatewayJupyter con meetup   extended jupyter kernel gateway
Jupyter con meetup extended jupyter kernel gateway
 
Building analytical microservices powered by jupyter kernels
Building analytical microservices powered by jupyter kernelsBuilding analytical microservices powered by jupyter kernels
Building analytical microservices powered by jupyter kernels
 
The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017
The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017
The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017
 
An Enterprise Analytics Platform with Jupyter Notebooks and Apache Spark
An Enterprise Analytics Platform with Jupyter Notebooks and Apache SparkAn Enterprise Analytics Platform with Jupyter Notebooks and Apache Spark
An Enterprise Analytics Platform with Jupyter Notebooks and Apache Spark
 
Cytoscape: Now and Future
Cytoscape: Now and FutureCytoscape: Now and Future
Cytoscape: Now and Future
 
Computable content: Notebooks, containers, and data-centric organizational le...
Computable content: Notebooks, containers, and data-centric organizational le...Computable content: Notebooks, containers, and data-centric organizational le...
Computable content: Notebooks, containers, and data-centric organizational le...
 
2018 02 20-jeg_index
2018 02 20-jeg_index2018 02 20-jeg_index
2018 02 20-jeg_index
 
Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes
 
A Whirlwind Tour Of Python
A Whirlwind Tour Of PythonA Whirlwind Tour Of Python
A Whirlwind Tour Of Python
 
Sci computing using python
Sci computing using pythonSci computing using python
Sci computing using python
 
The Ai & I at Work
The Ai & I at WorkThe Ai & I at Work
The Ai & I at Work
 
Intro to Python Data Analysis in Wakari
Intro to Python Data Analysis in WakariIntro to Python Data Analysis in Wakari
Intro to Python Data Analysis in Wakari
 
Python PPT
Python PPTPython PPT
Python PPT
 
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a... The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 
Use open source software to develop ideas at work
Use open source software to develop ideas at workUse open source software to develop ideas at work
Use open source software to develop ideas at work
 
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...
Are we reaching a Data Science Singularity? How Cognitive Computing is emergi...
 
Big analytics meetup - Extended Jupyter Kernel Gateway
Big analytics meetup - Extended Jupyter Kernel GatewayBig analytics meetup - Extended Jupyter Kernel Gateway
Big analytics meetup - Extended Jupyter Kernel Gateway
 
Everyone wants (someone else) to do it: writing documentation for open source...
Everyone wants (someone else) to do it: writing documentation for open source...Everyone wants (someone else) to do it: writing documentation for open source...
Everyone wants (someone else) to do it: writing documentation for open source...
 
Using oss and hacker culture at an internet company at osc/tokyo 2014/03/01
Using oss and hacker culture at an internet company at osc/tokyo 2014/03/01Using oss and hacker culture at an internet company at osc/tokyo 2014/03/01
Using oss and hacker culture at an internet company at osc/tokyo 2014/03/01
 

Plus de Natalino Busa

Data Production Pipelines: Legacy, practices, and innovation
Data Production Pipelines: Legacy, practices, and innovationData Production Pipelines: Legacy, practices, and innovation
Data Production Pipelines: Legacy, practices, and innovationNatalino Busa
 
[Ai in finance] AI in regulatory compliance, risk management, and auditing
[Ai in finance] AI in regulatory compliance, risk management, and auditing[Ai in finance] AI in regulatory compliance, risk management, and auditing
[Ai in finance] AI in regulatory compliance, risk management, and auditingNatalino Busa
 
Strata London 16: sightseeing, venues, and friends
Strata  London 16: sightseeing, venues, and friendsStrata  London 16: sightseeing, venues, and friends
Strata London 16: sightseeing, venues, and friendsNatalino Busa
 
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
Real-Time Anomaly Detection  with Spark MLlib, Akka and  CassandraReal-Time Anomaly Detection  with Spark MLlib, Akka and  Cassandra
Real-Time Anomaly Detection with Spark MLlib, Akka and CassandraNatalino Busa
 
The evolution of data analytics
The evolution of data analyticsThe evolution of data analytics
The evolution of data analyticsNatalino Busa
 
Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...
Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...
Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...Natalino Busa
 
Streaming Api Design with Akka, Scala and Spray
Streaming Api Design with Akka, Scala and SprayStreaming Api Design with Akka, Scala and Spray
Streaming Api Design with Akka, Scala and SprayNatalino Busa
 
Hadoop + Cassandra: Fast queries on data lakes, and wikipedia search tutorial.
Hadoop + Cassandra: Fast queries on data lakes, and  wikipedia search tutorial.Hadoop + Cassandra: Fast queries on data lakes, and  wikipedia search tutorial.
Hadoop + Cassandra: Fast queries on data lakes, and wikipedia search tutorial.Natalino Busa
 
Big data solutions for advanced marketing analytics
Big data solutions for advanced marketing analyticsBig data solutions for advanced marketing analytics
Big data solutions for advanced marketing analyticsNatalino Busa
 
Awesome Banking API's
Awesome Banking API'sAwesome Banking API's
Awesome Banking API'sNatalino Busa
 
Yo. big data. understanding data science in the era of big data.
Yo. big data. understanding data science in the era of big data.Yo. big data. understanding data science in the era of big data.
Yo. big data. understanding data science in the era of big data.Natalino Busa
 
Big and fast a quest for relevant and real-time analytics
Big and fast a quest for relevant and real-time analyticsBig and fast a quest for relevant and real-time analytics
Big and fast a quest for relevant and real-time analyticsNatalino Busa
 
Big Data and APIs - a recon tour on how to successfully do Big Data analytics
Big Data and APIs - a recon tour on how to successfully do Big Data analyticsBig Data and APIs - a recon tour on how to successfully do Big Data analytics
Big Data and APIs - a recon tour on how to successfully do Big Data analyticsNatalino Busa
 
Strata 2014: Data science and big data trending topics
Strata 2014: Data science and big data trending topicsStrata 2014: Data science and big data trending topics
Strata 2014: Data science and big data trending topicsNatalino Busa
 
Streaming computing: architectures, and tchnologies
Streaming computing: architectures, and tchnologiesStreaming computing: architectures, and tchnologies
Streaming computing: architectures, and tchnologiesNatalino Busa
 

Plus de Natalino Busa (17)

Data Production Pipelines: Legacy, practices, and innovation
Data Production Pipelines: Legacy, practices, and innovationData Production Pipelines: Legacy, practices, and innovation
Data Production Pipelines: Legacy, practices, and innovation
 
[Ai in finance] AI in regulatory compliance, risk management, and auditing
[Ai in finance] AI in regulatory compliance, risk management, and auditing[Ai in finance] AI in regulatory compliance, risk management, and auditing
[Ai in finance] AI in regulatory compliance, risk management, and auditing
 
Strata London 16: sightseeing, venues, and friends
Strata  London 16: sightseeing, venues, and friendsStrata  London 16: sightseeing, venues, and friends
Strata London 16: sightseeing, venues, and friends
 
Data in Action
Data in ActionData in Action
Data in Action
 
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
Real-Time Anomaly Detection  with Spark MLlib, Akka and  CassandraReal-Time Anomaly Detection  with Spark MLlib, Akka and  Cassandra
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
 
The evolution of data analytics
The evolution of data analyticsThe evolution of data analytics
The evolution of data analytics
 
Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...
Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...
Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...
 
Streaming Api Design with Akka, Scala and Spray
Streaming Api Design with Akka, Scala and SprayStreaming Api Design with Akka, Scala and Spray
Streaming Api Design with Akka, Scala and Spray
 
Hadoop + Cassandra: Fast queries on data lakes, and wikipedia search tutorial.
Hadoop + Cassandra: Fast queries on data lakes, and  wikipedia search tutorial.Hadoop + Cassandra: Fast queries on data lakes, and  wikipedia search tutorial.
Hadoop + Cassandra: Fast queries on data lakes, and wikipedia search tutorial.
 
Big data solutions for advanced marketing analytics
Big data solutions for advanced marketing analyticsBig data solutions for advanced marketing analytics
Big data solutions for advanced marketing analytics
 
Awesome Banking API's
Awesome Banking API'sAwesome Banking API's
Awesome Banking API's
 
Yo. big data. understanding data science in the era of big data.
Yo. big data. understanding data science in the era of big data.Yo. big data. understanding data science in the era of big data.
Yo. big data. understanding data science in the era of big data.
 
Big and fast a quest for relevant and real-time analytics
Big and fast a quest for relevant and real-time analyticsBig and fast a quest for relevant and real-time analytics
Big and fast a quest for relevant and real-time analytics
 
Big Data and APIs - a recon tour on how to successfully do Big Data analytics
Big Data and APIs - a recon tour on how to successfully do Big Data analyticsBig Data and APIs - a recon tour on how to successfully do Big Data analytics
Big Data and APIs - a recon tour on how to successfully do Big Data analytics
 
Strata 2014: Data science and big data trending topics
Strata 2014: Data science and big data trending topicsStrata 2014: Data science and big data trending topics
Strata 2014: Data science and big data trending topics
 
Streaming computing: architectures, and tchnologies
Streaming computing: architectures, and tchnologiesStreaming computing: architectures, and tchnologies
Streaming computing: architectures, and tchnologies
 
Big data landscape
Big data landscapeBig data landscape
Big data landscape
 

Dernier

MEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptMEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptaigil2
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationGiorgio Carbone
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxVenkatasubramani13
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...PrithaVashisht1
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best PracticesDataArchiva
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxDwiAyuSitiHartinah
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?sonikadigital1
 
SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024Becky Burwell
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Vladislav Solodkiy
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Guido X Jansen
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionajayrajaganeshkayala
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructuresonikadigital1
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityAggregage
 
YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.JasonViviers2
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerPavel Šabatka
 
AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)Data & Analytics Magazin
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introductionsanjaymuralee1
 

Dernier (17)

MEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptMEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .ppt
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - Presentation
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptx
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?
 
SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual intervention
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructure
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
 
YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayer
 
AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introduction
 

Data science apps powered by Jupyter Notebooks

  • 1. Data Science Apps: Beyond Notebooks Natalino Busa
  • 2. 2 Natalino Busa - @natbusa Linkedin + Twitter + Github: @natbusa DBS Teradata Cognitive Finance ING Group O’Reilly Philips
  • 3. 3 Natalino Busa - @natbusa Icons made by Gregor Cresnar from www.flaticon.com is licensed by CC Learning: The Scientific Method Ørsted's "First Introduction to General Physics" (1811) https://en.m.wikipedia.org/wiki/History_of_scientific_method observation hypothesis deduction synthesis Hans Christian Ørsted experiment
  • 4. 4 Natalino Busa - @natbusa Data Scientist Experience
  • 5. 5 Natalino Busa - @natbusa CloudTools Math Humans
  • 6. 6 Natalino Busa - @natbusa The Jupyter Project http://jupyter.org
  • 7. 7 Natalino Busa - @natbusa Jupyter notebook: what is it? The Jupyter Notebook The Jupyter Notebook is a web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, machine learning and much more. credit : Jupyter project extracted from http://jupyter.org/index.html
  • 8. 8 Natalino Busa - @natbusa Jupyter notebook: why? Language of choice The Notebook has support for over 40 programming languages, including those popular in Data Science such as Python, R, Julia and Scala. Share notebooks Notebooks can be shared with others using email, Dropbox, GitHub and the Jupyter Notebook Viewer. Interactive widgets Code can produce rich output such as images, videos, LaTeX, and JavaScript. Interactive widgets can be used to manipulate and visualize data in realtime. Big data integration Leverage big data tools, such as Apache Spark, from Python, R and Scala. Explore that same data with pandas, scikit-learn, ggplot2, dplyr, etc. credit : Jupyter project extracted from http://jupyter.org/index.html
  • 9. 9 Natalino Busa - @natbusa Text Cell Code Cell Cell Input Cell Output Edit, Run, Kernel, Widgets Menu’s Kernel Type Cell output: ASCII, HTML, Image. etc
  • 10. 10 Natalino Busa - @natbusa Architecture of a Jupyter Notebook Jupyter Notebook Server Kernel ∅MQ Notebook files Jupyter Notebook Web App Web Browser HTTP Websockets https://jupyter.readthedocs.io/en/latest/architecture/how_jupyter_ipython_work.html
  • 11. 11 Natalino Busa - @natbusa Architecture of a Jupyter Notebook • Modular architecture: Web App, Server, Kernel • Kernels: Python, R, Scala, Bash, SQL • Web App: Asynchronous, rich editing, syntax highlight, export and share
  • 12. 12 Natalino Busa - @natbusa Jupyter Notebook ● Narratives and Use Cases Narratives are collaborative, shareable, publishable, and reproducible. We believe that Narratives help both yourself and other researchers by sharing your use of Jupyter projects, technical specifics of your deployment, and installation and configuration tips so that others can learn from your experiences. From https://jupyter.readthedocs.io/en/latest/use-cases/content-user.html
  • 13. 13 Natalino Busa - @natbusa Jupyter is more than Notebooks “ What if I told you that the notebook is NOT the only sort of narrative that you can create with the Jupyter project? ”
  • 14. 14 Natalino Busa - @natbusa Examples of Jupyter powered narratives ● O’Reilly Orioles ● Examples - build your own!
  • 15. 15 Natalino Busa - @natbusa Orioles: A powerful educational narrative
  • 16. 16 Natalino Busa - @natbusa Geolocated clustering and prediction services with scikit-learn Learn how to build a venue recommender and a geofencing alerting engine using geolocated data, ML clustering algorithms, and scikit-learn
  • 17. 17 Natalino Busa - @natbusa Build your own narrative! What do you need? Understand how to communicate to the jupyter server Two ways: websockets or http api endpoints Build your own web application Many ways: e.g. angular, polymer, dart, etc 1 2
  • 18. 18 Natalino Busa - @natbusa Demos: kernel gateway Purpose: - Understand how to expose API endpoints - Build your own narrative! - Productivity gain: faster app prototyping
  • 19. 19 Natalino Busa - @natbusa
  • 20. 20 Natalino Busa - @natbusa Jupyter Gateway: expose API endpoints Declare the endpoint Declear MIME type, Headers, Status GET http://localhost:8800/counters/my_counter
  • 21. 21 Natalino Busa - @natbusa Jupyter: docker stacks Docker container: jupyter notebook + apache toree https://github.com/jupyter/docker-stacks
  • 22. 22 Natalino Busa - @natbusa Dockerize your jupyter gateway api IMAGE=demos/kernel_gateway_demo docker build -t $(IMAGE) . docker run -p 8888:8888 $(IMAGE) jupyter kernelgateway --KernelGatewayApp.ip=0.0.0.0 --KernelGatewayApp.port=8888 --KernelGatewayApp.api=notebook-http --KernelGatewayApp.seed_uri=/srv/notebooks/autoscience.ipynb
  • 23. 23 Natalino Busa - @natbusa Big Data apps: Dockerize your jupyter gateway api with Toree Jupyter Kernel Gateway Toree Kernel ∅MQ Notebook files Web Browser Your own Web App HTTP REST API Docker Containers onewebsession= oneserveronacloud
  • 24. 24 Natalino Busa - @natbusa Summary • Jupyter notebook is a great way to create and share data-driven uses cases and projects • Jupyter is more than notebooks – gateway, kernels, hub, etc • Narratives powered by jupyter – O’ Reilly Orioles – build your own narrative
  • 25. 25 Natalino Busa - @natbusa Resources Jupyter http://jupyter.org/index.html https://jupyter.readthedocs.io/en/latest/index.html# Jupyter Kernel Gateway https://github.com/jupyter/kernel_gateway http://jupyter-kernel-gateway.readthedocs.io/en/latest/ Jupyter Con (first of its kind!) https://conferences.oreilly.com/jupyter/jup-ny Apache Toree (Spark Kernel) https://toree.apache.org/ Web application dev https://angular.io/ https://www.polymer-project.org/1.0/ Docker https://github.com/jupyter/docker-stacks https://www.docker.com/
  • 26. 26 Natalino Busa - @natbusa Linkedin and Twitter: @natbusa