SlideShare une entreprise Scribd logo
1  sur  48
Télécharger pour lire hors ligne
H2O.ai

Machine Intelligence
Data Science for
Non-Data Scientists
Erin LeDell Ph.D.
Silicon Valley Big Data Science
August 2015
H2O.ai

Machine Intelligence
H2O.ai
H2O Company
H2O Software
• Team: 35. Founded in 2012, Mountain View, CA
• Stanford Math & Systems Engineers
• Open Source Software

• Ease of Use via Web Interface
• R, Python, Scala, Spark & Hadoop Interfaces
• Distributed Algorithms Scale to Big Data
H2O.ai

Machine Intelligence
Scientific Advisory Council
Dr. Trevor Hastie
Dr. Rob Tibshirani
Dr. Stephen Boyd
• John A. Overdeck Professor of Mathematics, Stanford University
• PhD in Statistics, Stanford University
• Co-author, The Elements of Statistical Learning: Prediction, Inference and Data Mining
• Co-author with John Chambers, Statistical Models in S
• Co-author, Generalized Additive Models
• 108,404 citations (via Google Scholar)
• Professor of Statistics and Health Research and Policy, Stanford University
• PhD in Statistics, Stanford University
• COPPS Presidents’ Award recipient
• Co-author, The Elements of Statistical Learning: Prediction, Inference and Data Mining
• Author, Regression Shrinkage and Selection via the Lasso
• Co-author, An Introduction to the Bootstrap
• Professor of Electrical Engineering and Computer Science, Stanford University
• PhD in Electrical Engineering and Computer Science, UC Berkeley
• Co-author, Convex Optimization
• Co-author, Linear Matrix Inequalities in System and Control Theory
• Co-author, Distributed Optimization and Statistical Learning via the Alternating Direction
Method of Multipliers
H2O.ai

Machine Intelligence
What is Data Science?
Problem
Formulation
• Identify an outcome of interest and the type of task:
classification / regression / clustering
• Identify the potential predictor variables
• Identify the independent sampling units
• Conduct research experiment (e.g. Clinical Trial)
• Collect examples / randomly sample the population
• Transform, clean, impute, filter, aggregate data
• Prepare the data for machine learning — X, Y
• Modeling using a machine learning algorithm (training)
• Model evaluation and comparison
• Sensitivity & Cost Analysis
• Translate results into action items
• Feed results into research pipeline
Collect &
Process Data
Machine Learning
Insights & Action
H2O.ai

Machine Intelligence Source: marketingdistillery.com
H2O.ai

Machine Intelligence
What is Machine Learning?
What it is: ✤ “Field of study that gives computers the ability to learn
without being explicitly programmed.” (Samuel, 1959)
✤ “Machine learning and statistics are closely related
fields. The ideas of machine learning, from
methodological principles to theoretical tools, have
had a long pre-history in statistics.” (Jordan, 2014)
✤ M.I. Jordan also suggested the term data science as
a placeholder to call the overall field.
Unlike rules-based systems which require a human
expert to hard-code domain knowledge directly into
the system, a machine learning algorithm learns how
to make decisions from the data alone.
What it’s not:
H2O.ai

Machine Intelligence
Classification
Clustering
Machine Learning Overview
• Predict a real-valued response (viral load, weight)
• Gaussian, Gamma, Poisson and Tweedie
• MSE and R^2
• Multi-class or Binary classification
• Ranking
• Accuracy and AUC
• Unsupervised learning (no training labels)
• Partition the data / identify clusters
• AIC and BIC
Regression
H2O.ai

Machine Intelligence
Machine Learning Workflow
Source: NLTK
Example of a supervised machine learning workflow.
H2O.ai

Machine Intelligence
ML Model Performance
Test & Train
• Partition the original data (randomly) into a training set
and a test set. (e.g. 70/30)
• Train a model using the “training set” and evaluate
performance on the “test set” or “validation set.”
• Train & test K
models as shown.
• Average the model
performance over
the K test sets.
• Report cross-
validated metrics.
• Regression: R^2, MSE, RMSE
• Classification: Accuracy, F1, H-measure
• Ranking (Binary Outcome): AUC, Partial AUC
K-fold
Cross-validation
Performance
Metrics
H2O.ai

Machine Intelligence
What is Deep Learning?
What it is: ✤ “A branch of machine learning based on a set of
algorithms that attempt to model high-level
abstractions in data by using model architectures,
composed of multiple non-linear
transformations.” (Wikipedia, 2015)
✤ Deep neural networks have more than one hidden
layer in their architecture. That’s what’s “deep.”
✤ Very useful for complex input data such as images,
video, audio.
Deep learning architectures, specifically artificial
neural networks (ANNs) have been around since
1980, so they are not new. However, there were
breakthroughs in training techniques that lead to their
recent resurgence (mid 2000’s). Combined with
modern computing power, they are quite effective.
What it’s not:
H2O.ai

Machine Intelligence
Deep Learning Architecture
Example of a deep neural net architecture.
H2O.ai

Machine Intelligence
What is Ensemble Learning?
What it is: ✤ “Ensemble methods use multiple learning algorithms
to obtain better predictive performance that could be
obtained from any of the constituent learning
algorithms.” (Wikipedia, 2015)
✤ Random Forests and Gradient Boosting Machines
(GBM) are both ensembles of decision trees.
✤ Stacking, or Super Learning, is technique for
combining various learners into a single, powerful
learner using a second-level metalearning algorithm.
Ensembles typically achieve superior model
performance over singular methods. However, this
comes at a price — computation time.
What it’s not:
H2O.ai

Machine Intelligence
Where to learn more?
• H2O Online Training (free): http://learn.h2o.ai
• H2O Slidedecks: http://www.slideshare.net/0xdata
• H2O Video Presentations: https://www.youtube.com/user/0xdata
• H2O Community Events & Meetups: http://h2o.ai/events
• Machine Learning & Data Science courses: http://coursebuffet.com
Customers ! Community ! Evangelists
November 9, 10, 11
Computer History Museum

H 2 O W O R L D . H 2 O . A I

!
20% off registration
using code:

h2ocommunity
!
H2O.ai

Machine Intelligence
Questions?
@ledell on Twitter, GitHub
erin@h2o.ai
http://www.stat.berkeley.edu/~ledell
Data Science for Non-Data
Scientists 



aka. How the Business Views Data
Science
Chen Huang
August 20, 2015
Agenda
•  Introduction
•  Data Science Primer
•  Working with Data Scientists
•  Decoding the Data Science Lingo
•  Q&A
Introduction
•  Who am I?
•  Why am I giving this talk?
Who am I?
•  Data Strategist
•  Career in Business Intelligence,
Analytics, and Big Data
•  Various roles
•  Consultant
•  Developer
•  Business and Data Analyst
•  Product Manager
•  Functional and Technical Trainer
•  Client Services
•  Worked in various industries
•  Health care, pharmaceutics,
communications and high tech,
consumer products, automotive,
finance, government contracting
August, 2015 – San Francisco, CA
Why am I giving this talk?
July, 2011 – Beijing, China
Data Science Primer
•  What can Data Science do for the Business?
•  Applications of Data Science
•  Data-Driven Decisions
•  What does a Data Scientist do?
•  Data Science Skills
What can Data Science do for the
Business?
A: Data science! Extracting useful
information and knowledge from large
volumes of data in order to improve
business decision-making or
providing the business insights to make
data-driven decisions
DataBusiness
What can Data do?
Image: http://www.slideshare.net/andrewgardner5811/big-data-and-the-art-of-data-science
Applications of Data Science
Image: http://www.slideshare.net/andrewgardner5811/big-data-and-the-art-of-data-science
Data-Driven Decisions
•  Practice of basing decisions on data, rather than purely
on intuition
•  There is evidence that data-driven decision making and
big data technologies substantially improve business
performance
The Art and Science of Data Science
•  Discover unknowns in data
•  Obtain predictive, actionable insights
•  Communicate business data stories
•  Build confidence in decision making
•  Create valuable Data Products that has business
impacts
http://www.slideshare.net/datasciencelondon/big-data-sorry-data-science-what-does-a-data-scientist-do
What does a Data Scientist do?
•  Data curiosity. Explore data. Discover unknowns
•  Understand data relationships
•  Understand the business, has domain knowledge
•  Can tell relevant stories with data
•  Holistic view of the business
•  Knows machine learning, statistics, probability
•  Can hack and code
•  Define and test an hypothesis, run experiences
•  Asks good questions
http://www.slideshare.net/andrewgardner5811/big-data-and-the-art-of-data-science
Data Science Skills
Image: http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
Image: http://www.slideshare.net/galvanizeHQ/how-to-become-a-data-scientist-by-ryan-orban-vp-of-operations-and-expansion-galvanize
Image: http://www.slideshare.net/galvanizeHQ/how-to-become-a-data-scientist-by-ryan-orban-vp-of-operations-and-expansion-galvanize
Working with Data Scientists
•  Collaboration
•  Data Science Cycle
•  Organizational Models for Data Science Teams
Working with Data Scientists
Data
Science
Business
Data
Engineering
Data Science Cycle
Image: https://en.wikipedia.org/wiki/Data_science
Organizational Models for Data
Science Teams
Image: http://www.slideshare.net/emcacademics/building-data-science-teams-31057129
Decoding the Data Science Lingo
Machine Learning
•  A subfield of computer science
and artificial intelligence (AI) that
focuses on the design of
systems that can learn from and
make decisions and predictions
based on data.
•  Machine learning enables
computers to act and make
data-driven decisions rather than
being explicitly programmed to
carry out a certain task.
•  Machine Learning programs are
also designed to learn and
improve over time when
exposed to new data.
•  Everything!
Data Science Definition: Business Application:
Definition: http://blog.aylien.com/post/121281850733/10-machine-learning-terms-explained-in-simple
Unsupervised Learning
Data Science Definition:
•  Where a program, given a
dataset, can automatically find
patterns and relationships
within the dataset.
•  The business will decide how
deeply or many categories
there are.
•  Clustering or grouping of like
data.
•  Examples: k-means clustering,
hierarchical clustering
Business Application:
•  Customer segmentation
•  Understanding users and
behaviors
•  Classifying unknown and pre-
defined images into categories
Definition: http://blog.aylien.com/post/121281850733/10-machine-learning-terms-explained-in-simple
Supervised Learning
•  Where a program is “trained”
on a pre-defined dataset.
•  Based off its training data the
program can make accurate
decisions when given new
data.
•  Classifying Twitter sentiments
•  Recommender systems
Data Science Definition: Business Application:
Definition: http://blog.aylien.com/post/121281850733/10-machine-learning-terms-explained-in-simple
Score
•  Number of ways to evaluate
how well the model assigns the
correct class value to the test
instances.
•  Confidence gauge
Data Science Definition: Business Application:
Definition: https://mlcorner.wordpress.com/tag/scoring/
Score Cont.
•  True Positive (TP):    If the instance
is positive and it is classified as
positive False
•  Negative (FN): If the instance is
positive but it is classified as
negative True
•  Negative (TN):  If the instance is
negative and it is classified as
negative False
•  Positive (FP):   If the instance is
negative but it is classified as
positive
•  Classification problems:
•  Precision = the number of times you correctly classify = TP/(TP+FP)
•  Accuracy = proportion of correctly classified instances = (TP+TN)/(TP+TN
+FP+FN)
•  Recall or Sensitivity = the number of positive that you correctly classify out
of all the actual positives = TP/(TP+FN)
•  Specificity = classifier’s ability to identify negative results = TN/(TN+FP)
Classification
•  Sub-category of Supervised
Learning
•  Classification is the process of
taking some sort of input and
assign a label to it. The
predictions are discrete,
categories, or “yes or no”
nature.
•  Examples: Logistic
Regression, Random Forest
•  What customers should a
company target with its
marketing campaigns?
•  Is this Nigerian prince
committing fraud? (Spam
classification)
•  Is this actually Barack
Obama’s Facebook profile and
review on Amazon? (Fraud
detection)
Data Science Definition: Business Application:
Definition: http://blog.aylien.com/post/121281850733/10-machine-learning-terms-explained-in-simple
Regression
•  Sub-category of Supervised
Learning
•  Regression is a type of
algorithm that predicts a
continuous values.
•  How much would a user spend
on a mobile game like
CandyCrush?
•  How much would someone
spend on healthcare out of
pocket?
•  How many attendees will come
to this event based on past
registration?
Data Science Definition: Business Application:
Definition: http://blog.aylien.com/post/121281850733/10-machine-learning-terms-explained-in-simple
Decision Trees
•  Using a tree-like graph or model
of decisions and their possible
consequence.
•  Medical Testing (e.g. health
incidences, etc.)
•  Genealogy breakdowns (e.g.
eye color, blood type, etc.)
Data Science Definition: Business Application:
Definition: http://blog.aylien.com/post/121281850733/10-machine-learning-terms-explained-in-simple
Deep Learning
•  A category of machine learning
algorithms that often use
Artificial Neural Networks to
generate model.
•  Image classification
•  Language processing
•  Audio processing
•  Outlier and fraud detection
Data Science Definition: Business Application:
Definition: http://blog.aylien.com/post/121281850733/10-machine-learning-terms-explained-in-simple
Questions?

Contenu connexe

Tendances

Tendances (20)

Data science presentation
Data science presentationData science presentation
Data science presentation
 
Introduction of Data Science
Introduction of Data ScienceIntroduction of Data Science
Introduction of Data Science
 
The future of big data analytics
The future of big data analyticsThe future of big data analytics
The future of big data analytics
 
Big Data
Big DataBig Data
Big Data
 
Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Data Science Tutorial | Introduction To Data Science | Data Science Training ...Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Data Science Tutorial | Introduction To Data Science | Data Science Training ...
 
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
 
Career in Data Science
Career in Data ScienceCareer in Data Science
Career in Data Science
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
Introduction to Data Science and Analytics
Introduction to Data Science and AnalyticsIntroduction to Data Science and Analytics
Introduction to Data Science and Analytics
 
How to Become a Data Scientist
How to Become a Data ScientistHow to Become a Data Scientist
How to Become a Data Scientist
 
1. introduction to data science —
1. introduction to data science —1. introduction to data science —
1. introduction to data science —
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data Science
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 
Data science workshop
Data science workshopData science workshop
Data science workshop
 
Python Machine Learning Tutorial | Machine Learning Algorithms | Python Train...
Python Machine Learning Tutorial | Machine Learning Algorithms | Python Train...Python Machine Learning Tutorial | Machine Learning Algorithms | Python Train...
Python Machine Learning Tutorial | Machine Learning Algorithms | Python Train...
 
introduction to data science
introduction to data scienceintroduction to data science
introduction to data science
 
Data science applications and usecases
Data science applications and usecasesData science applications and usecases
Data science applications and usecases
 
Big Data
Big DataBig Data
Big Data
 

En vedette

Introduction on Data Science
Introduction on Data ScienceIntroduction on Data Science
Introduction on Data Science
Edureka!
 

En vedette (20)

Python for Data Science - TDC 2015
Python for Data Science - TDC 2015Python for Data Science - TDC 2015
Python for Data Science - TDC 2015
 
Data Science Driven Malware Detection
Data Science Driven Malware DetectionData Science Driven Malware Detection
Data Science Driven Malware Detection
 
[FAST CAMPUS] 1강 data science overview
[FAST CAMPUS] 1강 data science overview [FAST CAMPUS] 1강 data science overview
[FAST CAMPUS] 1강 data science overview
 
Pivotal Digital Transformation Forum: Accelerate Time to Market with Business...
Pivotal Digital Transformation Forum: Accelerate Time to Market with Business...Pivotal Digital Transformation Forum: Accelerate Time to Market with Business...
Pivotal Digital Transformation Forum: Accelerate Time to Market with Business...
 
Pivotal Digital Transformation Forum: Data Science
Pivotal Digital Transformation Forum: Data Science Pivotal Digital Transformation Forum: Data Science
Pivotal Digital Transformation Forum: Data Science
 
Pivotal Digital Transformation Forum: Becoming a Data Driven Enterprise
Pivotal Digital Transformation Forum: Becoming a Data Driven EnterprisePivotal Digital Transformation Forum: Becoming a Data Driven Enterprise
Pivotal Digital Transformation Forum: Becoming a Data Driven Enterprise
 
저성장 시대 데이터 경제만이 살길이다
저성장 시대 데이터 경제만이 살길이다저성장 시대 데이터 경제만이 살길이다
저성장 시대 데이터 경제만이 살길이다
 
What Is the Future of Data Sharing?
What Is the Future of Data Sharing?What Is the Future of Data Sharing?
What Is the Future of Data Sharing?
 
Intro to Data Science for Enterprise Big Data
Intro to Data Science for Enterprise Big DataIntro to Data Science for Enterprise Big Data
Intro to Data Science for Enterprise Big Data
 
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
 
Introduction on Data Science
Introduction on Data ScienceIntroduction on Data Science
Introduction on Data Science
 
Titan: Big Graph Data with Cassandra
Titan: Big Graph Data with CassandraTitan: Big Graph Data with Cassandra
Titan: Big Graph Data with Cassandra
 
How to Interview a Data Scientist
How to Interview a Data ScientistHow to Interview a Data Scientist
How to Interview a Data Scientist
 
Data Science - Part XIV - Genetic Algorithms
Data Science - Part XIV - Genetic AlgorithmsData Science - Part XIV - Genetic Algorithms
Data Science - Part XIV - Genetic Algorithms
 
Data Science - Part XI - Text Analytics
Data Science - Part XI - Text AnalyticsData Science - Part XI - Text Analytics
Data Science - Part XI - Text Analytics
 
Data Science - Part X - Time Series Forecasting
Data Science - Part X - Time Series ForecastingData Science - Part X - Time Series Forecasting
Data Science - Part X - Time Series Forecasting
 
Data Science - Part XIII - Hidden Markov Models
Data Science - Part XIII - Hidden Markov ModelsData Science - Part XIII - Hidden Markov Models
Data Science - Part XIII - Hidden Markov Models
 
Data Science - Part XVII - Deep Learning & Image Processing
Data Science - Part XVII - Deep Learning & Image ProcessingData Science - Part XVII - Deep Learning & Image Processing
Data Science - Part XVII - Deep Learning & Image Processing
 
To Serve and Protect: Making Sense of Hadoop Security
To Serve and Protect: Making Sense of Hadoop Security To Serve and Protect: Making Sense of Hadoop Security
To Serve and Protect: Making Sense of Hadoop Security
 
MATATABI: Cyber Threat Analysis and Defense Platform using Huge Amount of Dat...
MATATABI: Cyber Threat Analysis and Defense Platform using Huge Amount of Dat...MATATABI: Cyber Threat Analysis and Defense Platform using Huge Amount of Dat...
MATATABI: Cyber Threat Analysis and Defense Platform using Huge Amount of Dat...
 

Similaire à Intro to Data Science for Non-Data Scientists

Similaire à Intro to Data Science for Non-Data Scientists (20)

H2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User GroupH2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User Group
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
 
intro to data science Clustering and visualization of data science subfields ...
intro to data science Clustering and visualization of data science subfields ...intro to data science Clustering and visualization of data science subfields ...
intro to data science Clustering and visualization of data science subfields ...
 
Machine Learning in Modern Medicine with Erin LeDell at Stanford Med
Machine Learning in Modern Medicine with Erin LeDell at Stanford MedMachine Learning in Modern Medicine with Erin LeDell at Stanford Med
Machine Learning in Modern Medicine with Erin LeDell at Stanford Med
 
Which institute is best for data science?
Which institute is best for data science?Which institute is best for data science?
Which institute is best for data science?
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification course
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)
 
Data science training institute in hyderabad
Data science training institute in hyderabadData science training institute in hyderabad
Data science training institute in hyderabad
 
Data science training in Hyderabad
Data science  training in HyderabadData science  training in Hyderabad
Data science training in Hyderabad
 
Data science training Hyderabad
Data science training HyderabadData science training Hyderabad
Data science training Hyderabad
 
Data science online training in hyderabad
Data science online training in hyderabadData science online training in hyderabad
Data science online training in hyderabad
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)
 
data science training and placement
data science training and placementdata science training and placement
data science training and placement
 
online data science training
online data science trainingonline data science training
online data science training
 
Data science online training in hyderabad
Data science online training in hyderabadData science online training in hyderabad
Data science online training in hyderabad
 
data science online training in hyderabad
data science online training in hyderabaddata science online training in hyderabad
data science online training in hyderabad
 
Best data science training in Hyderabad
Best data science training in HyderabadBest data science training in Hyderabad
Best data science training in Hyderabad
 
Data science training Hyderabad
Data science training HyderabadData science training Hyderabad
Data science training Hyderabad
 
Data Science Training and Placement
Data Science Training and PlacementData Science Training and Placement
Data Science Training and Placement
 
Breed data scientists_ A Presentation.pptx
Breed data scientists_ A Presentation.pptxBreed data scientists_ A Presentation.pptx
Breed data scientists_ A Presentation.pptx
 

Plus de Sri Ambati

Plus de Sri Ambati (20)

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Generative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxGenerative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptx
 
AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek
 
LLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thLLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5th
 
Building, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionBuilding, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for Production
 
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
 
Risk Management for LLMs
Risk Management for LLMsRisk Management for LLMs
Risk Management for LLMs
 
Open-Source AI: Community is the Way
Open-Source AI: Community is the WayOpen-Source AI: Community is the Way
Open-Source AI: Community is the Way
 
Building Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2O
 
Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical
 
Cutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM Papers
 
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
 
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
 
LLM Interpretability
LLM Interpretability LLM Interpretability
LLM Interpretability
 
Never Reply to an Email Again
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email Again
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)
 
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
 
AI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneyAI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation Journey
 

Dernier

CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
masabamasaba
 

Dernier (20)

%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Generic or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisionsGeneric or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisions
 
%in Durban+277-882-255-28 abortion pills for sale in Durban
%in Durban+277-882-255-28 abortion pills for sale in Durban%in Durban+277-882-255-28 abortion pills for sale in Durban
%in Durban+277-882-255-28 abortion pills for sale in Durban
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 

Intro to Data Science for Non-Data Scientists

  • 1. H2O.ai
 Machine Intelligence Data Science for Non-Data Scientists Erin LeDell Ph.D. Silicon Valley Big Data Science August 2015
  • 2. H2O.ai
 Machine Intelligence H2O.ai H2O Company H2O Software • Team: 35. Founded in 2012, Mountain View, CA • Stanford Math & Systems Engineers • Open Source Software
 • Ease of Use via Web Interface • R, Python, Scala, Spark & Hadoop Interfaces • Distributed Algorithms Scale to Big Data
  • 3. H2O.ai
 Machine Intelligence Scientific Advisory Council Dr. Trevor Hastie Dr. Rob Tibshirani Dr. Stephen Boyd • John A. Overdeck Professor of Mathematics, Stanford University • PhD in Statistics, Stanford University • Co-author, The Elements of Statistical Learning: Prediction, Inference and Data Mining • Co-author with John Chambers, Statistical Models in S • Co-author, Generalized Additive Models • 108,404 citations (via Google Scholar) • Professor of Statistics and Health Research and Policy, Stanford University • PhD in Statistics, Stanford University • COPPS Presidents’ Award recipient • Co-author, The Elements of Statistical Learning: Prediction, Inference and Data Mining • Author, Regression Shrinkage and Selection via the Lasso • Co-author, An Introduction to the Bootstrap • Professor of Electrical Engineering and Computer Science, Stanford University • PhD in Electrical Engineering and Computer Science, UC Berkeley • Co-author, Convex Optimization • Co-author, Linear Matrix Inequalities in System and Control Theory • Co-author, Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers
  • 4. H2O.ai
 Machine Intelligence What is Data Science? Problem Formulation • Identify an outcome of interest and the type of task: classification / regression / clustering • Identify the potential predictor variables • Identify the independent sampling units • Conduct research experiment (e.g. Clinical Trial) • Collect examples / randomly sample the population • Transform, clean, impute, filter, aggregate data • Prepare the data for machine learning — X, Y • Modeling using a machine learning algorithm (training) • Model evaluation and comparison • Sensitivity & Cost Analysis • Translate results into action items • Feed results into research pipeline Collect & Process Data Machine Learning Insights & Action
  • 5. H2O.ai
 Machine Intelligence Source: marketingdistillery.com
  • 6. H2O.ai
 Machine Intelligence What is Machine Learning? What it is: ✤ “Field of study that gives computers the ability to learn without being explicitly programmed.” (Samuel, 1959) ✤ “Machine learning and statistics are closely related fields. The ideas of machine learning, from methodological principles to theoretical tools, have had a long pre-history in statistics.” (Jordan, 2014) ✤ M.I. Jordan also suggested the term data science as a placeholder to call the overall field. Unlike rules-based systems which require a human expert to hard-code domain knowledge directly into the system, a machine learning algorithm learns how to make decisions from the data alone. What it’s not:
  • 7. H2O.ai
 Machine Intelligence Classification Clustering Machine Learning Overview • Predict a real-valued response (viral load, weight) • Gaussian, Gamma, Poisson and Tweedie • MSE and R^2 • Multi-class or Binary classification • Ranking • Accuracy and AUC • Unsupervised learning (no training labels) • Partition the data / identify clusters • AIC and BIC Regression
  • 8. H2O.ai
 Machine Intelligence Machine Learning Workflow Source: NLTK Example of a supervised machine learning workflow.
  • 9. H2O.ai
 Machine Intelligence ML Model Performance Test & Train • Partition the original data (randomly) into a training set and a test set. (e.g. 70/30) • Train a model using the “training set” and evaluate performance on the “test set” or “validation set.” • Train & test K models as shown. • Average the model performance over the K test sets. • Report cross- validated metrics. • Regression: R^2, MSE, RMSE • Classification: Accuracy, F1, H-measure • Ranking (Binary Outcome): AUC, Partial AUC K-fold Cross-validation Performance Metrics
  • 10. H2O.ai
 Machine Intelligence What is Deep Learning? What it is: ✤ “A branch of machine learning based on a set of algorithms that attempt to model high-level abstractions in data by using model architectures, composed of multiple non-linear transformations.” (Wikipedia, 2015) ✤ Deep neural networks have more than one hidden layer in their architecture. That’s what’s “deep.” ✤ Very useful for complex input data such as images, video, audio. Deep learning architectures, specifically artificial neural networks (ANNs) have been around since 1980, so they are not new. However, there were breakthroughs in training techniques that lead to their recent resurgence (mid 2000’s). Combined with modern computing power, they are quite effective. What it’s not:
  • 11. H2O.ai
 Machine Intelligence Deep Learning Architecture Example of a deep neural net architecture.
  • 12. H2O.ai
 Machine Intelligence What is Ensemble Learning? What it is: ✤ “Ensemble methods use multiple learning algorithms to obtain better predictive performance that could be obtained from any of the constituent learning algorithms.” (Wikipedia, 2015) ✤ Random Forests and Gradient Boosting Machines (GBM) are both ensembles of decision trees. ✤ Stacking, or Super Learning, is technique for combining various learners into a single, powerful learner using a second-level metalearning algorithm. Ensembles typically achieve superior model performance over singular methods. However, this comes at a price — computation time. What it’s not:
  • 13. H2O.ai
 Machine Intelligence Where to learn more? • H2O Online Training (free): http://learn.h2o.ai • H2O Slidedecks: http://www.slideshare.net/0xdata • H2O Video Presentations: https://www.youtube.com/user/0xdata • H2O Community Events & Meetups: http://h2o.ai/events • Machine Learning & Data Science courses: http://coursebuffet.com
  • 14. Customers ! Community ! Evangelists November 9, 10, 11 Computer History Museum H 2 O W O R L D . H 2 O . A I ! 20% off registration using code: h2ocommunity !
  • 15. H2O.ai
 Machine Intelligence Questions? @ledell on Twitter, GitHub erin@h2o.ai http://www.stat.berkeley.edu/~ledell
  • 16. Data Science for Non-Data Scientists 
 
 aka. How the Business Views Data Science Chen Huang August 20, 2015
  • 17.
  • 18. Agenda •  Introduction •  Data Science Primer •  Working with Data Scientists •  Decoding the Data Science Lingo •  Q&A
  • 19. Introduction •  Who am I? •  Why am I giving this talk?
  • 20. Who am I? •  Data Strategist •  Career in Business Intelligence, Analytics, and Big Data •  Various roles •  Consultant •  Developer •  Business and Data Analyst •  Product Manager •  Functional and Technical Trainer •  Client Services •  Worked in various industries •  Health care, pharmaceutics, communications and high tech, consumer products, automotive, finance, government contracting August, 2015 – San Francisco, CA
  • 21. Why am I giving this talk? July, 2011 – Beijing, China
  • 22. Data Science Primer •  What can Data Science do for the Business? •  Applications of Data Science •  Data-Driven Decisions •  What does a Data Scientist do? •  Data Science Skills
  • 23. What can Data Science do for the Business? A: Data science! Extracting useful information and knowledge from large volumes of data in order to improve business decision-making or providing the business insights to make data-driven decisions DataBusiness
  • 24. What can Data do? Image: http://www.slideshare.net/andrewgardner5811/big-data-and-the-art-of-data-science
  • 25. Applications of Data Science Image: http://www.slideshare.net/andrewgardner5811/big-data-and-the-art-of-data-science
  • 26. Data-Driven Decisions •  Practice of basing decisions on data, rather than purely on intuition •  There is evidence that data-driven decision making and big data technologies substantially improve business performance
  • 27. The Art and Science of Data Science •  Discover unknowns in data •  Obtain predictive, actionable insights •  Communicate business data stories •  Build confidence in decision making •  Create valuable Data Products that has business impacts http://www.slideshare.net/datasciencelondon/big-data-sorry-data-science-what-does-a-data-scientist-do
  • 28.
  • 29. What does a Data Scientist do? •  Data curiosity. Explore data. Discover unknowns •  Understand data relationships •  Understand the business, has domain knowledge •  Can tell relevant stories with data •  Holistic view of the business •  Knows machine learning, statistics, probability •  Can hack and code •  Define and test an hypothesis, run experiences •  Asks good questions http://www.slideshare.net/andrewgardner5811/big-data-and-the-art-of-data-science
  • 30. Data Science Skills Image: http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
  • 33. Working with Data Scientists •  Collaboration •  Data Science Cycle •  Organizational Models for Data Science Teams
  • 34.
  • 35. Working with Data Scientists Data Science Business Data Engineering
  • 36. Data Science Cycle Image: https://en.wikipedia.org/wiki/Data_science
  • 37. Organizational Models for Data Science Teams Image: http://www.slideshare.net/emcacademics/building-data-science-teams-31057129
  • 38. Decoding the Data Science Lingo
  • 39. Machine Learning •  A subfield of computer science and artificial intelligence (AI) that focuses on the design of systems that can learn from and make decisions and predictions based on data. •  Machine learning enables computers to act and make data-driven decisions rather than being explicitly programmed to carry out a certain task. •  Machine Learning programs are also designed to learn and improve over time when exposed to new data. •  Everything! Data Science Definition: Business Application: Definition: http://blog.aylien.com/post/121281850733/10-machine-learning-terms-explained-in-simple
  • 40. Unsupervised Learning Data Science Definition: •  Where a program, given a dataset, can automatically find patterns and relationships within the dataset. •  The business will decide how deeply or many categories there are. •  Clustering or grouping of like data. •  Examples: k-means clustering, hierarchical clustering Business Application: •  Customer segmentation •  Understanding users and behaviors •  Classifying unknown and pre- defined images into categories Definition: http://blog.aylien.com/post/121281850733/10-machine-learning-terms-explained-in-simple
  • 41. Supervised Learning •  Where a program is “trained” on a pre-defined dataset. •  Based off its training data the program can make accurate decisions when given new data. •  Classifying Twitter sentiments •  Recommender systems Data Science Definition: Business Application: Definition: http://blog.aylien.com/post/121281850733/10-machine-learning-terms-explained-in-simple
  • 42. Score •  Number of ways to evaluate how well the model assigns the correct class value to the test instances. •  Confidence gauge Data Science Definition: Business Application: Definition: https://mlcorner.wordpress.com/tag/scoring/
  • 43. Score Cont. •  True Positive (TP):    If the instance is positive and it is classified as positive False •  Negative (FN): If the instance is positive but it is classified as negative True •  Negative (TN):  If the instance is negative and it is classified as negative False •  Positive (FP):   If the instance is negative but it is classified as positive •  Classification problems: •  Precision = the number of times you correctly classify = TP/(TP+FP) •  Accuracy = proportion of correctly classified instances = (TP+TN)/(TP+TN +FP+FN) •  Recall or Sensitivity = the number of positive that you correctly classify out of all the actual positives = TP/(TP+FN) •  Specificity = classifier’s ability to identify negative results = TN/(TN+FP)
  • 44. Classification •  Sub-category of Supervised Learning •  Classification is the process of taking some sort of input and assign a label to it. The predictions are discrete, categories, or “yes or no” nature. •  Examples: Logistic Regression, Random Forest •  What customers should a company target with its marketing campaigns? •  Is this Nigerian prince committing fraud? (Spam classification) •  Is this actually Barack Obama’s Facebook profile and review on Amazon? (Fraud detection) Data Science Definition: Business Application: Definition: http://blog.aylien.com/post/121281850733/10-machine-learning-terms-explained-in-simple
  • 45. Regression •  Sub-category of Supervised Learning •  Regression is a type of algorithm that predicts a continuous values. •  How much would a user spend on a mobile game like CandyCrush? •  How much would someone spend on healthcare out of pocket? •  How many attendees will come to this event based on past registration? Data Science Definition: Business Application: Definition: http://blog.aylien.com/post/121281850733/10-machine-learning-terms-explained-in-simple
  • 46. Decision Trees •  Using a tree-like graph or model of decisions and their possible consequence. •  Medical Testing (e.g. health incidences, etc.) •  Genealogy breakdowns (e.g. eye color, blood type, etc.) Data Science Definition: Business Application: Definition: http://blog.aylien.com/post/121281850733/10-machine-learning-terms-explained-in-simple
  • 47. Deep Learning •  A category of machine learning algorithms that often use Artificial Neural Networks to generate model. •  Image classification •  Language processing •  Audio processing •  Outlier and fraud detection Data Science Definition: Business Application: Definition: http://blog.aylien.com/post/121281850733/10-machine-learning-terms-explained-in-simple