SlideShare une entreprise Scribd logo
1  sur  43
Télécharger pour lire hors ligne
1
Natalino Busa - @natbusa
Natalino Busa
Head of Data Science Teradata
Are we reaching a data science
singularity?
2
Natalino Busa - @natbusa
3
Natalino Busa - @natbusa
4
Natalino Busa - @natbusa
5
Natalino Busa - @natbusa
6
Natalino Busa - @natbusa
What about (data) science?
- technologies and tools are driving innovation in data analytics -
7
Natalino Busa - @natbusa
Man - Machine
as integrated cognitive systems
8
Natalino Busa - @natbusa
Learning: The Scientific Method
Ørsted's "First Introduction to General Physics" (1811)
https://en.m.wikipedia.org/wiki/History_of_scientific_method
observation hypothesis deduction synthesis
Hans Christian Ørsted
experiment
Icons made by Gregor Cresnar from www.flaticon.com is licensed by CC 3.0 BY
9
Natalino Busa - @natbusa
Innovation in Data Analytics
Cloud Community AI & ML
10
Natalino Busa - @natbusa
Cloud
11
Natalino Busa - @natbusa
“we live in an age of open source datacenters, so
we can stack all these things together and we
have open source from the ground to ceiling.”
Sam Ramji, CEO of Cloud Foundry
https://www.youtube.com/watch?v=7oCSFcUW-Qk
12
Natalino Busa - @natbusa
Analytics in the cloud
Bare Metal: Physical Machines
IAAS: Virtual Resources
CAAS: Containers,
dPAAS: Datastores, Data Engines
iPAAS: Tools Integration, Flows & Processes
DAAAS: Data Analytics as a Service
13
Natalino Busa - @natbusa
DAAAS: AI and ML API’s
Cloud Computing for Deep Neural Networks
> Models, Compute (Train, Score), and Data
AI and ML models for:
● Speech (audio)
● Language (text)
● Vision (images/video)
● Data (classification, regression, clustering, anomaly detection)
14
Natalino Busa - @natbusa
Ephemeral Computing Clusters on a Cloud
data
create load compute store
timeline
destroy
15
Natalino Busa - @natbusa
dPaaS: Analytical clusters
Ephemeral
Short-Lived
Data Exploration
Isolated, Personal
Simple Access Management
Permanent
Long Lived
Production / Operations
Co-Ordinated
Complex Access Management
vs
16
Natalino Busa - @natbusa
GPU’s and Distributed Computing
GPU support is coming in Kubernetes, Mesos, Spark
https://www.oreilly.com/learning/accelerating-spark-workloads-using-gpus
http://www.slideshare.net/databricks/tensorframes-google-tensorflow-on-apache-spark
out
up
CPU
R,Python
Spark
TensorFrames
17
Natalino Busa - @natbusa
Community
18
Natalino Busa - @natbusa
Community
Develop - Use - Share
19
Natalino Busa - @natbusa
Sharing is caring … speed
github.com + Jupyter notebooks,
share ideas, code, and data
arxiv.org
share innovation and scientific results
20
Natalino Busa - @natbusa
Artificial Intelligence
Machine Learning
21
Natalino Busa - @natbusa
Google: open-sources NLP parser
scoring 95% in grammar accuracy
https://github.com/tensorflow/models/tree/master/syntaxnet
22
Natalino Busa - @natbusa
Deep Learning in Language Parsing
https://github.com/tensorflow/models/blob/master/syntaxnet/ff_nn_schematic.png
23
Natalino Busa - @natbusa
Semantic Search: TDA + NNs Word2Vec, Par2Vec, Doc2Vec
https://arxiv.org/pdf/1405.4053v2.pdf
https://arxiv.org/pdf/1301.3781v3.pdf
24
Natalino Busa - @natbusa
Lip reading
LipNet achieves 93.4% accuracy,
on GRID corpus.
https://arxiv.org/pdf/1611.01599v1.pdf
25
Natalino Busa - @natbusa
Ask me Anything
Dynamic Memory Networks
for Natural Language
Processing
https://arxiv.org/pdf/1603.01417v1.pdf
https://youtu.be/oGk1v1jQITw
Caiming Xiong,
Stephen Merity,
Richard Socher
26
Natalino Busa - @natbusa
Ask me Anything
http://www.socher.org/index.php/DeepLearningTutorial/DeepLearningTutorial
Dynamic Memory Networks for Natural Language Processing
https://arxiv.org/pdf/1603.01417v1.pdf
http://www.socher.org/
Local
context
Wider
context
NLP, Attention Masks
Semantic Embeddings from Text, Images
27
Natalino Busa - @natbusa
Network Traffic Patterns Classification
28
Natalino Busa - @natbusa
Network Intrusion Detection
http://billsdata.net/?p=105
It contains 130 million flow records involving
12,027 distinct computers over 36 days (not
the full 58 days claimed for the entire data
release).
Each record consists of: time (to nearest
second), duration, source and destination
computer ids, source and destination ports,
protocol, number of packets and number of
bytes
Techniques: TDA, Dimensionality Reduction
https://en.wikipedia.org/wiki/Nonlinear_dimensionality_reduction
29
Natalino Busa - @natbusa
Approaching (Almost) Any Machine Learning Problem
- Abhishek Thakur, Kaggle Grandmaster -
data labels
raw data: tables, files Useful dataData munging Feature
Engineering
Tabular Data ready for ML
30
Natalino Busa - @natbusa
AutoML challenge
- based on scikit-learn
- 15 classifiers,
- 14 feature preprocessing methods
- 4 data preprocessing methods
- 110 hyperparameters
- Supervised classification challenge:
100 different datasets
https://arxiv.org/abs/1611.03824v1
Natalino Busa - @natbusa
31
Natalino Busa - @natbusa
Artificial + Human Intelligence
32
Natalino Busa - @natbusa
Human cognitive biases :
Too much information
Not enough meaning
What should we
remember?
Need to act fast
https://en.wikipedia.org/wiki/List_of_cognitive_biases
33
Natalino Busa - @natbusa
Man vs Machine cognitive limits
Model generation
Explanation
Unsupervised
Planning
Too much information
Not enough meaning
Need to act quickly
Memory limits
34
Natalino Busa - @natbusa
Theorems often tell us complex truths about the simple things,
but only rarely tell us simple truths about the complex ones
Marvin Minsky
K-Linesː A Theory of Memory (1980)
35
Natalino Busa - @natbusa
Data Science: wear the AI/ML Lenses
We are entering a new era of intelligent machines
Boost our understanding of data
Focus on higher level analyses
36
Natalino Busa - @natbusa
Intelligent Data Systems:
Long live the “database”
Wikipedia:
A database is an organized collection of data.
DATA
New-SQL
ML
AI
SQL
Python - Scala - R
NLP
UX
Speech
COG
37
Natalino Busa - @natbusa
The Database.
is never going to be the same.
38
Natalino Busa - @natbusa
Thank you.
@natbusa
39
Natalino Busa - @natbusa
Credits
Cover: courtesy of Big Data Spain - https://www.bigdataspain.org/
Pictures:
https://commons.wikimedia.org/wiki/File:PurportedUFO2.jpg
https://commons.wikimedia.org/wiki/File:Amazing_Stories_October_1957.jpg
https://commons.wikimedia.org/wiki/File:DJI_Phantom_2_Vision%2B_V3_hovering_over_Weissfluhjoch_(cropped).jpg
https://commons.wikimedia.org/wiki/File:Leonard_Nimoy_as_Spock_1967.jpg
https://en.wikipedia.org/wiki/File:STUltimate_Cp.jpg
https://github.com/tensorflow/models/blob/master/syntaxnet/ff_nn_schematic.png
http://billsdata.net/wordpress/wp-content/uploads/2015/11/wikimap2.jpg
http://billsdata.net/wordpress/wp-content/uploads/2015/11/netflow.png
https://commons.wikimedia.org/wiki/File:Girls_learning_sign_language.jpg
https://arxiv.org/pdf/1603.01417v1.pdf
http://www.socher.org/uploads/Main/RichardSocher.jpg
https://papers.nips.cc/paper/5872-efficient-and-robust-automated-machine-learning.pdf
https://commons.wikimedia.org/wiki/File:Cognitive_Bias_Codex_-_180%2B_biases,_designed_by_John_Manoogian_III_(jm3).jpg
Visualizations:
https://github.com/caffeinalab/siriwavejs
https://gist.github.com/AnanthaRajuC/91beee3eb04d11cb3af5
https://dribbble.com/shots/1714369-Cortana-Animation
Icons:
Icons made by Gregor Cresnar from www.flaticon.com is licensed by CC 3.0 BY
40
Natalino Busa - @natbusa
bonus slides
41
Natalino Busa - @natbusa
AI & ML: curated list of links
Applications
http://www.wsj.com/articles/googles-self-driving-car-program-odometer-reaches-2-million-miles-1475683321
http://www.nature.com/articles/srep26286
Why is AI so difficult?
http://waitbutwhy.com/2015/01/artificial-intelligence-revolution-1.html
http://www.forbes.com/sites/gilpress/2016/10/31/12-observations-about-artificial-intelligence-from-the-oreilly-ai-conference/
http://www.tor.com/2011/06/21/norvig-vs-chomsky-and-the-fight-for-the-future-of-ai/
https://www.safaribooksonline.com/library/view/oreilly-ai-conference/9781491973912/video260721.html
You Tube, great videos on AI
Yann LeCunn: https://youtu.be/_1Cyyt-4-n8
Andrej Karpathy: https://youtu.be/u6aEYuemt0M
Nando de Freitas: https://youtu.be/bEUX_56Lojc
Richard Socher:https://youtu.be/oGk1v1jQITw
42
Natalino Busa - @natbusa
AI & ML: curated list of links
NLP
https://github.com/tensorflow/models/tree/master/syntaxnet
https://arxiv.org/abs/1405.4053v2
https://arxiv.org/abs/1603.06042
https://arxiv.org/abs/1301.3781v3
Video, Images, Hybrid Deep Learning Networks
https://arxiv.org/abs/1611.01599v1
https://arxiv.org/abs/1603.01417v1
Topological Data Analysys (TDA), Dim Reduction:
https://en.wikipedia.org/wiki/Topological_data_analysis
https://en.wikipedia.org/wiki/Nonlinear_dimensionality_reduction
Meta Learning:
http://blog.kaggle.com/2016/07/21/approaching-almost-any-machine-learning-problem-abhishek-thakur/
https://papers.nips.cc/paper/5872-efficient-and-robust-automated-machine-learning.pdf
https://arxiv.org/abs/1611.03824v1
43
Natalino Busa - @natbusa
Curated list of links
Cognitive sciences:
https://en.wikipedia.org/wiki/History_of_scientific_method
https://en.wikipedia.org/wiki/List_of_cognitive_biases
Cloud:
The Making of a Cloud Native Application Platform - Sam Ramji https://www.youtube.com/watch?v=7oCSFcUW-Qk
https://en.wikipedia.org/wiki/Ephemerality
http://conferences.oreilly.com/oscon/oscon2011/public/schedule/detail/19812
GPU and distributed Computing:
https://www.oreilly.com/learning/accelerating-spark-workloads-using-gpus
http://www.slideshare.net/databricks/tensorframes-google-tensorflow-on-apache-spark
Collaborative coding and research:
https://github.com/tensorflow/models
https://github.com/jupyter
http://www.arxiv-sanity.com/

Contenu connexe

En vedette

Word2Vec: Vector presentation of words - Mohammad Mahdavi
Word2Vec: Vector presentation of words - Mohammad MahdaviWord2Vec: Vector presentation of words - Mohammad Mahdavi
Word2Vec: Vector presentation of words - Mohammad Mahdaviirpycon
 
임태현, Text-CNN을 이용한 Sentiment 분설모델 구현
임태현, Text-CNN을 이용한 Sentiment 분설모델 구현임태현, Text-CNN을 이용한 Sentiment 분설모델 구현
임태현, Text-CNN을 이용한 Sentiment 분설모델 구현태현 임
 
Locality Sensitive Hashing By Spark
Locality Sensitive Hashing By SparkLocality Sensitive Hashing By Spark
Locality Sensitive Hashing By SparkSpark Summit
 
Scalable and Flexible Machine Learning With Scala @ LinkedIn
Scalable and Flexible Machine Learning With Scala @ LinkedInScalable and Flexible Machine Learning With Scala @ LinkedIn
Scalable and Flexible Machine Learning With Scala @ LinkedInVitaly Gordon
 
Python Visualisation for Data Science
Python Visualisation for Data SciencePython Visualisation for Data Science
Python Visualisation for Data ScienceAmit Kapoor
 
Drawing word2vec
Drawing word2vecDrawing word2vec
Drawing word2vecKai Sasaki
 
Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...
Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...
Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...Daniele Di Mitri
 
NDC 2016 김정주 - 기계학습을 활용한 게임어뷰징 검출
NDC 2016 김정주 - 기계학습을 활용한 게임어뷰징 검출 NDC 2016 김정주 - 기계학습을 활용한 게임어뷰징 검출
NDC 2016 김정주 - 기계학습을 활용한 게임어뷰징 검출 정주 김
 
word2vec - From theory to practice
word2vec - From theory to practiceword2vec - From theory to practice
word2vec - From theory to practicehen_drik
 
Representation Learning of Vectors of Words and Phrases
Representation Learning of Vectors of Words and PhrasesRepresentation Learning of Vectors of Words and Phrases
Representation Learning of Vectors of Words and PhrasesFelipe Moraes
 
Deep learning Tutorial - Part II
Deep learning Tutorial - Part IIDeep learning Tutorial - Part II
Deep learning Tutorial - Part IIQuantUniversity
 
An introduction to Deep Learning
An introduction to Deep LearningAn introduction to Deep Learning
An introduction to Deep LearningDavid Rostcheck
 
Deep Learning for NLP
Deep Learning for NLPDeep Learning for NLP
Deep Learning for NLPAmit Kapoor
 
Word2vec algorithm
Word2vec algorithmWord2vec algorithm
Word2vec algorithmAndrew Koo
 
Deep Learning for Computer Vision: Generative models and adversarial training...
Deep Learning for Computer Vision: Generative models and adversarial training...Deep Learning for Computer Vision: Generative models and adversarial training...
Deep Learning for Computer Vision: Generative models and adversarial training...Universitat Politècnica de Catalunya
 
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
Real-Time Anomaly Detection  with Spark MLlib, Akka and  CassandraReal-Time Anomaly Detection  with Spark MLlib, Akka and  Cassandra
Real-Time Anomaly Detection with Spark MLlib, Akka and CassandraNatalino Busa
 

En vedette (17)

Word2Vec: Vector presentation of words - Mohammad Mahdavi
Word2Vec: Vector presentation of words - Mohammad MahdaviWord2Vec: Vector presentation of words - Mohammad Mahdavi
Word2Vec: Vector presentation of words - Mohammad Mahdavi
 
임태현, Text-CNN을 이용한 Sentiment 분설모델 구현
임태현, Text-CNN을 이용한 Sentiment 분설모델 구현임태현, Text-CNN을 이용한 Sentiment 분설모델 구현
임태현, Text-CNN을 이용한 Sentiment 분설모델 구현
 
Locality Sensitive Hashing By Spark
Locality Sensitive Hashing By SparkLocality Sensitive Hashing By Spark
Locality Sensitive Hashing By Spark
 
Scalable and Flexible Machine Learning With Scala @ LinkedIn
Scalable and Flexible Machine Learning With Scala @ LinkedInScalable and Flexible Machine Learning With Scala @ LinkedIn
Scalable and Flexible Machine Learning With Scala @ LinkedIn
 
Python Visualisation for Data Science
Python Visualisation for Data SciencePython Visualisation for Data Science
Python Visualisation for Data Science
 
Drawing word2vec
Drawing word2vecDrawing word2vec
Drawing word2vec
 
Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...
Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...
Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...
 
NDC 2016 김정주 - 기계학습을 활용한 게임어뷰징 검출
NDC 2016 김정주 - 기계학습을 활용한 게임어뷰징 검출 NDC 2016 김정주 - 기계학습을 활용한 게임어뷰징 검출
NDC 2016 김정주 - 기계학습을 활용한 게임어뷰징 검출
 
word2vec - From theory to practice
word2vec - From theory to practiceword2vec - From theory to practice
word2vec - From theory to practice
 
Representation Learning of Vectors of Words and Phrases
Representation Learning of Vectors of Words and PhrasesRepresentation Learning of Vectors of Words and Phrases
Representation Learning of Vectors of Words and Phrases
 
Deep learning Tutorial - Part II
Deep learning Tutorial - Part IIDeep learning Tutorial - Part II
Deep learning Tutorial - Part II
 
An introduction to Deep Learning
An introduction to Deep LearningAn introduction to Deep Learning
An introduction to Deep Learning
 
Deep Learning for NLP
Deep Learning for NLPDeep Learning for NLP
Deep Learning for NLP
 
Deep learning - Part I
Deep learning - Part IDeep learning - Part I
Deep learning - Part I
 
Word2vec algorithm
Word2vec algorithmWord2vec algorithm
Word2vec algorithm
 
Deep Learning for Computer Vision: Generative models and adversarial training...
Deep Learning for Computer Vision: Generative models and adversarial training...Deep Learning for Computer Vision: Generative models and adversarial training...
Deep Learning for Computer Vision: Generative models and adversarial training...
 
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
Real-Time Anomaly Detection  with Spark MLlib, Akka and  CassandraReal-Time Anomaly Detection  with Spark MLlib, Akka and  Cassandra
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
 

Plus de Natalino Busa

Data Production Pipelines: Legacy, practices, and innovation
Data Production Pipelines: Legacy, practices, and innovationData Production Pipelines: Legacy, practices, and innovation
Data Production Pipelines: Legacy, practices, and innovationNatalino Busa
 
Data science apps powered by Jupyter Notebooks
Data science apps powered by Jupyter NotebooksData science apps powered by Jupyter Notebooks
Data science apps powered by Jupyter NotebooksNatalino Busa
 
7 steps for highly effective deep neural networks
7 steps for highly effective deep neural networks7 steps for highly effective deep neural networks
7 steps for highly effective deep neural networksNatalino Busa
 
Strata London 16: sightseeing, venues, and friends
Strata  London 16: sightseeing, venues, and friendsStrata  London 16: sightseeing, venues, and friends
Strata London 16: sightseeing, venues, and friendsNatalino Busa
 
The evolution of data analytics
The evolution of data analyticsThe evolution of data analytics
The evolution of data analyticsNatalino Busa
 
Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...
Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...
Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...Natalino Busa
 
Streaming Api Design with Akka, Scala and Spray
Streaming Api Design with Akka, Scala and SprayStreaming Api Design with Akka, Scala and Spray
Streaming Api Design with Akka, Scala and SprayNatalino Busa
 
Hadoop + Cassandra: Fast queries on data lakes, and wikipedia search tutorial.
Hadoop + Cassandra: Fast queries on data lakes, and  wikipedia search tutorial.Hadoop + Cassandra: Fast queries on data lakes, and  wikipedia search tutorial.
Hadoop + Cassandra: Fast queries on data lakes, and wikipedia search tutorial.Natalino Busa
 
Big data solutions for advanced marketing analytics
Big data solutions for advanced marketing analyticsBig data solutions for advanced marketing analytics
Big data solutions for advanced marketing analyticsNatalino Busa
 
Awesome Banking API's
Awesome Banking API'sAwesome Banking API's
Awesome Banking API'sNatalino Busa
 
Yo. big data. understanding data science in the era of big data.
Yo. big data. understanding data science in the era of big data.Yo. big data. understanding data science in the era of big data.
Yo. big data. understanding data science in the era of big data.Natalino Busa
 
Big and fast a quest for relevant and real-time analytics
Big and fast a quest for relevant and real-time analyticsBig and fast a quest for relevant and real-time analytics
Big and fast a quest for relevant and real-time analyticsNatalino Busa
 
Big Data and APIs - a recon tour on how to successfully do Big Data analytics
Big Data and APIs - a recon tour on how to successfully do Big Data analyticsBig Data and APIs - a recon tour on how to successfully do Big Data analytics
Big Data and APIs - a recon tour on how to successfully do Big Data analyticsNatalino Busa
 
Strata 2014: Data science and big data trending topics
Strata 2014: Data science and big data trending topicsStrata 2014: Data science and big data trending topics
Strata 2014: Data science and big data trending topicsNatalino Busa
 
Streaming computing: architectures, and tchnologies
Streaming computing: architectures, and tchnologiesStreaming computing: architectures, and tchnologies
Streaming computing: architectures, and tchnologiesNatalino Busa
 

Plus de Natalino Busa (17)

Data Production Pipelines: Legacy, practices, and innovation
Data Production Pipelines: Legacy, practices, and innovationData Production Pipelines: Legacy, practices, and innovation
Data Production Pipelines: Legacy, practices, and innovation
 
Data science apps powered by Jupyter Notebooks
Data science apps powered by Jupyter NotebooksData science apps powered by Jupyter Notebooks
Data science apps powered by Jupyter Notebooks
 
7 steps for highly effective deep neural networks
7 steps for highly effective deep neural networks7 steps for highly effective deep neural networks
7 steps for highly effective deep neural networks
 
Strata London 16: sightseeing, venues, and friends
Strata  London 16: sightseeing, venues, and friendsStrata  London 16: sightseeing, venues, and friends
Strata London 16: sightseeing, venues, and friends
 
Data in Action
Data in ActionData in Action
Data in Action
 
The evolution of data analytics
The evolution of data analyticsThe evolution of data analytics
The evolution of data analytics
 
Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...
Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...
Towards Real-Time banking API's: Introducing Coral, a web api for realtime st...
 
Streaming Api Design with Akka, Scala and Spray
Streaming Api Design with Akka, Scala and SprayStreaming Api Design with Akka, Scala and Spray
Streaming Api Design with Akka, Scala and Spray
 
Hadoop + Cassandra: Fast queries on data lakes, and wikipedia search tutorial.
Hadoop + Cassandra: Fast queries on data lakes, and  wikipedia search tutorial.Hadoop + Cassandra: Fast queries on data lakes, and  wikipedia search tutorial.
Hadoop + Cassandra: Fast queries on data lakes, and wikipedia search tutorial.
 
Big data solutions for advanced marketing analytics
Big data solutions for advanced marketing analyticsBig data solutions for advanced marketing analytics
Big data solutions for advanced marketing analytics
 
Awesome Banking API's
Awesome Banking API'sAwesome Banking API's
Awesome Banking API's
 
Yo. big data. understanding data science in the era of big data.
Yo. big data. understanding data science in the era of big data.Yo. big data. understanding data science in the era of big data.
Yo. big data. understanding data science in the era of big data.
 
Big and fast a quest for relevant and real-time analytics
Big and fast a quest for relevant and real-time analyticsBig and fast a quest for relevant and real-time analytics
Big and fast a quest for relevant and real-time analytics
 
Big Data and APIs - a recon tour on how to successfully do Big Data analytics
Big Data and APIs - a recon tour on how to successfully do Big Data analyticsBig Data and APIs - a recon tour on how to successfully do Big Data analytics
Big Data and APIs - a recon tour on how to successfully do Big Data analytics
 
Strata 2014: Data science and big data trending topics
Strata 2014: Data science and big data trending topicsStrata 2014: Data science and big data trending topics
Strata 2014: Data science and big data trending topics
 
Streaming computing: architectures, and tchnologies
Streaming computing: architectures, and tchnologiesStreaming computing: architectures, and tchnologies
Streaming computing: architectures, and tchnologies
 
Big data landscape
Big data landscapeBig data landscape
Big data landscape
 

Dernier

1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 

Dernier (20)

1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 

Are we reaching a data science singularity ?

  • 1. 1 Natalino Busa - @natbusa Natalino Busa Head of Data Science Teradata Are we reaching a data science singularity?
  • 2. 2 Natalino Busa - @natbusa
  • 3. 3 Natalino Busa - @natbusa
  • 4. 4 Natalino Busa - @natbusa
  • 5. 5 Natalino Busa - @natbusa
  • 6. 6 Natalino Busa - @natbusa What about (data) science? - technologies and tools are driving innovation in data analytics -
  • 7. 7 Natalino Busa - @natbusa Man - Machine as integrated cognitive systems
  • 8. 8 Natalino Busa - @natbusa Learning: The Scientific Method Ørsted's "First Introduction to General Physics" (1811) https://en.m.wikipedia.org/wiki/History_of_scientific_method observation hypothesis deduction synthesis Hans Christian Ørsted experiment Icons made by Gregor Cresnar from www.flaticon.com is licensed by CC 3.0 BY
  • 9. 9 Natalino Busa - @natbusa Innovation in Data Analytics Cloud Community AI & ML
  • 10. 10 Natalino Busa - @natbusa Cloud
  • 11. 11 Natalino Busa - @natbusa “we live in an age of open source datacenters, so we can stack all these things together and we have open source from the ground to ceiling.” Sam Ramji, CEO of Cloud Foundry https://www.youtube.com/watch?v=7oCSFcUW-Qk
  • 12. 12 Natalino Busa - @natbusa Analytics in the cloud Bare Metal: Physical Machines IAAS: Virtual Resources CAAS: Containers, dPAAS: Datastores, Data Engines iPAAS: Tools Integration, Flows & Processes DAAAS: Data Analytics as a Service
  • 13. 13 Natalino Busa - @natbusa DAAAS: AI and ML API’s Cloud Computing for Deep Neural Networks > Models, Compute (Train, Score), and Data AI and ML models for: ● Speech (audio) ● Language (text) ● Vision (images/video) ● Data (classification, regression, clustering, anomaly detection)
  • 14. 14 Natalino Busa - @natbusa Ephemeral Computing Clusters on a Cloud data create load compute store timeline destroy
  • 15. 15 Natalino Busa - @natbusa dPaaS: Analytical clusters Ephemeral Short-Lived Data Exploration Isolated, Personal Simple Access Management Permanent Long Lived Production / Operations Co-Ordinated Complex Access Management vs
  • 16. 16 Natalino Busa - @natbusa GPU’s and Distributed Computing GPU support is coming in Kubernetes, Mesos, Spark https://www.oreilly.com/learning/accelerating-spark-workloads-using-gpus http://www.slideshare.net/databricks/tensorframes-google-tensorflow-on-apache-spark out up CPU R,Python Spark TensorFrames
  • 17. 17 Natalino Busa - @natbusa Community
  • 18. 18 Natalino Busa - @natbusa Community Develop - Use - Share
  • 19. 19 Natalino Busa - @natbusa Sharing is caring … speed github.com + Jupyter notebooks, share ideas, code, and data arxiv.org share innovation and scientific results
  • 20. 20 Natalino Busa - @natbusa Artificial Intelligence Machine Learning
  • 21. 21 Natalino Busa - @natbusa Google: open-sources NLP parser scoring 95% in grammar accuracy https://github.com/tensorflow/models/tree/master/syntaxnet
  • 22. 22 Natalino Busa - @natbusa Deep Learning in Language Parsing https://github.com/tensorflow/models/blob/master/syntaxnet/ff_nn_schematic.png
  • 23. 23 Natalino Busa - @natbusa Semantic Search: TDA + NNs Word2Vec, Par2Vec, Doc2Vec https://arxiv.org/pdf/1405.4053v2.pdf https://arxiv.org/pdf/1301.3781v3.pdf
  • 24. 24 Natalino Busa - @natbusa Lip reading LipNet achieves 93.4% accuracy, on GRID corpus. https://arxiv.org/pdf/1611.01599v1.pdf
  • 25. 25 Natalino Busa - @natbusa Ask me Anything Dynamic Memory Networks for Natural Language Processing https://arxiv.org/pdf/1603.01417v1.pdf https://youtu.be/oGk1v1jQITw Caiming Xiong, Stephen Merity, Richard Socher
  • 26. 26 Natalino Busa - @natbusa Ask me Anything http://www.socher.org/index.php/DeepLearningTutorial/DeepLearningTutorial Dynamic Memory Networks for Natural Language Processing https://arxiv.org/pdf/1603.01417v1.pdf http://www.socher.org/ Local context Wider context NLP, Attention Masks Semantic Embeddings from Text, Images
  • 27. 27 Natalino Busa - @natbusa Network Traffic Patterns Classification
  • 28. 28 Natalino Busa - @natbusa Network Intrusion Detection http://billsdata.net/?p=105 It contains 130 million flow records involving 12,027 distinct computers over 36 days (not the full 58 days claimed for the entire data release). Each record consists of: time (to nearest second), duration, source and destination computer ids, source and destination ports, protocol, number of packets and number of bytes Techniques: TDA, Dimensionality Reduction https://en.wikipedia.org/wiki/Nonlinear_dimensionality_reduction
  • 29. 29 Natalino Busa - @natbusa Approaching (Almost) Any Machine Learning Problem - Abhishek Thakur, Kaggle Grandmaster - data labels raw data: tables, files Useful dataData munging Feature Engineering Tabular Data ready for ML
  • 30. 30 Natalino Busa - @natbusa AutoML challenge - based on scikit-learn - 15 classifiers, - 14 feature preprocessing methods - 4 data preprocessing methods - 110 hyperparameters - Supervised classification challenge: 100 different datasets https://arxiv.org/abs/1611.03824v1 Natalino Busa - @natbusa
  • 31. 31 Natalino Busa - @natbusa Artificial + Human Intelligence
  • 32. 32 Natalino Busa - @natbusa Human cognitive biases : Too much information Not enough meaning What should we remember? Need to act fast https://en.wikipedia.org/wiki/List_of_cognitive_biases
  • 33. 33 Natalino Busa - @natbusa Man vs Machine cognitive limits Model generation Explanation Unsupervised Planning Too much information Not enough meaning Need to act quickly Memory limits
  • 34. 34 Natalino Busa - @natbusa Theorems often tell us complex truths about the simple things, but only rarely tell us simple truths about the complex ones Marvin Minsky K-Linesː A Theory of Memory (1980)
  • 35. 35 Natalino Busa - @natbusa Data Science: wear the AI/ML Lenses We are entering a new era of intelligent machines Boost our understanding of data Focus on higher level analyses
  • 36. 36 Natalino Busa - @natbusa Intelligent Data Systems: Long live the “database” Wikipedia: A database is an organized collection of data. DATA New-SQL ML AI SQL Python - Scala - R NLP UX Speech COG
  • 37. 37 Natalino Busa - @natbusa The Database. is never going to be the same.
  • 38. 38 Natalino Busa - @natbusa Thank you. @natbusa
  • 39. 39 Natalino Busa - @natbusa Credits Cover: courtesy of Big Data Spain - https://www.bigdataspain.org/ Pictures: https://commons.wikimedia.org/wiki/File:PurportedUFO2.jpg https://commons.wikimedia.org/wiki/File:Amazing_Stories_October_1957.jpg https://commons.wikimedia.org/wiki/File:DJI_Phantom_2_Vision%2B_V3_hovering_over_Weissfluhjoch_(cropped).jpg https://commons.wikimedia.org/wiki/File:Leonard_Nimoy_as_Spock_1967.jpg https://en.wikipedia.org/wiki/File:STUltimate_Cp.jpg https://github.com/tensorflow/models/blob/master/syntaxnet/ff_nn_schematic.png http://billsdata.net/wordpress/wp-content/uploads/2015/11/wikimap2.jpg http://billsdata.net/wordpress/wp-content/uploads/2015/11/netflow.png https://commons.wikimedia.org/wiki/File:Girls_learning_sign_language.jpg https://arxiv.org/pdf/1603.01417v1.pdf http://www.socher.org/uploads/Main/RichardSocher.jpg https://papers.nips.cc/paper/5872-efficient-and-robust-automated-machine-learning.pdf https://commons.wikimedia.org/wiki/File:Cognitive_Bias_Codex_-_180%2B_biases,_designed_by_John_Manoogian_III_(jm3).jpg Visualizations: https://github.com/caffeinalab/siriwavejs https://gist.github.com/AnanthaRajuC/91beee3eb04d11cb3af5 https://dribbble.com/shots/1714369-Cortana-Animation Icons: Icons made by Gregor Cresnar from www.flaticon.com is licensed by CC 3.0 BY
  • 40. 40 Natalino Busa - @natbusa bonus slides
  • 41. 41 Natalino Busa - @natbusa AI & ML: curated list of links Applications http://www.wsj.com/articles/googles-self-driving-car-program-odometer-reaches-2-million-miles-1475683321 http://www.nature.com/articles/srep26286 Why is AI so difficult? http://waitbutwhy.com/2015/01/artificial-intelligence-revolution-1.html http://www.forbes.com/sites/gilpress/2016/10/31/12-observations-about-artificial-intelligence-from-the-oreilly-ai-conference/ http://www.tor.com/2011/06/21/norvig-vs-chomsky-and-the-fight-for-the-future-of-ai/ https://www.safaribooksonline.com/library/view/oreilly-ai-conference/9781491973912/video260721.html You Tube, great videos on AI Yann LeCunn: https://youtu.be/_1Cyyt-4-n8 Andrej Karpathy: https://youtu.be/u6aEYuemt0M Nando de Freitas: https://youtu.be/bEUX_56Lojc Richard Socher:https://youtu.be/oGk1v1jQITw
  • 42. 42 Natalino Busa - @natbusa AI & ML: curated list of links NLP https://github.com/tensorflow/models/tree/master/syntaxnet https://arxiv.org/abs/1405.4053v2 https://arxiv.org/abs/1603.06042 https://arxiv.org/abs/1301.3781v3 Video, Images, Hybrid Deep Learning Networks https://arxiv.org/abs/1611.01599v1 https://arxiv.org/abs/1603.01417v1 Topological Data Analysys (TDA), Dim Reduction: https://en.wikipedia.org/wiki/Topological_data_analysis https://en.wikipedia.org/wiki/Nonlinear_dimensionality_reduction Meta Learning: http://blog.kaggle.com/2016/07/21/approaching-almost-any-machine-learning-problem-abhishek-thakur/ https://papers.nips.cc/paper/5872-efficient-and-robust-automated-machine-learning.pdf https://arxiv.org/abs/1611.03824v1
  • 43. 43 Natalino Busa - @natbusa Curated list of links Cognitive sciences: https://en.wikipedia.org/wiki/History_of_scientific_method https://en.wikipedia.org/wiki/List_of_cognitive_biases Cloud: The Making of a Cloud Native Application Platform - Sam Ramji https://www.youtube.com/watch?v=7oCSFcUW-Qk https://en.wikipedia.org/wiki/Ephemerality http://conferences.oreilly.com/oscon/oscon2011/public/schedule/detail/19812 GPU and distributed Computing: https://www.oreilly.com/learning/accelerating-spark-workloads-using-gpus http://www.slideshare.net/databricks/tensorframes-google-tensorflow-on-apache-spark Collaborative coding and research: https://github.com/tensorflow/models https://github.com/jupyter http://www.arxiv-sanity.com/