SlideShare une entreprise Scribd logo
1  sur  14
Télécharger pour lire hors ligne
What's New!
Python
Flow
R
Cliff Click
CTO, Co-Founder
What do these stickers mean?
Pick up stickers by getting installed!!!!
or get install help at the information booth
Hands-On sessions need H2O installed!
Generally need Datasets
Generally need one of Python or R (or both!)
Python 2.7 is a must (3.0 still WIP!)
R 3.2.2 is recommended
R 3.1.0 will not work
Python 2.7 is a must (3.0 still WIP!)
H2O-3

Complete rewrite of H2O Core Internals
– 14700 commits, 526 releases, 94 branches

Complete {Rapids, Model building, API} rewrite

Loads of Algo improvements; here's a few:
– Grid search, stopping criteria, row weights,
n-folds CV, checkpoints, distributions
• New data source ORC; Strings: UTF-8 handling
H2O2 commits H2O3 commits
H2O-3 Algos
• Old ones improve...
– GLM beta constraints, offsets, multinomial
– GBM, DRF stochastic sampling, ncats_bin,
poisson, gamma, tweedie, huber, laplace

Generalized Low Rank Modeling - Anqi
– Dimensionality reduction, imputation

Ensembles - Erin
Python

Fully function Python Client
 Parity with R client
 Uses your 2.7 Python install directly (3.0 WIP)
 IPython, Jupyter

Load data, munge & clean, build models
– Pythonic column & row selection & slicing
– dataframe.apply(lambda x : …)

Interactive console response on Big Data
# Mean of the squares minus square of the means
means = H2OFrame(zip(*[df.mean()]))
sdev = df.apply(lambda x: (x*x).sum(),0) / df.nrow – means*means
Pythonic

Load from csv, hdfs, nfs, hive, s3, or
any 1-d or 2-d python obj:

Full set of Pythonic column & row selectors

Iteration, list comprehension
df = h2o.import_file('file_path.csv')
df = h2o.H2OFrame([any python obj])
df # All columns & rows
df['sepal_len'] # Column by name
df[2] # Column by index
df[-2] # 2nd
column from end
df[0:5,:] # First 5 rows, all cols
df[[1,3,5]] # List of columns (or rows)
df[df[0]>0.5,:] # Filter rows (or cols)
df[df[0]==None,0] = mean # Impute the mean
sum_sqs = [(col*col).sum() for col in df]
Python - Data Munging Pipelines

Full Data Munging Pipeline Support
– Complex munging and string ops
– Feature generation; outliers; imputation
– All at H2O in-memory speeds

Generate POJOs with Pipelines
– POJOs do data munging! then run Model

Plug into e.g. Storm Bolt, any Java App

See Spencer, Pipelines @ 11:30am; Hank tomorrow
DB
CSV logs
split()
asDate
join
impute,
outliers
groupby
sort train
model
POJO
Python – Big Data, Big Temps

Temps managed by Python's Ref-counting
– Aggressively removes temps
– No need for explicit management
– User named objects, loaded datasets,
models must be explicitly removed

Standard Python reference-semantics

Backed by copy-on-write optimization in H2O
– i.e., defensive copies are “free” until modified
Long running loops:
tmp = …
...tmp...
reclaim tmp!
R – Big Data Temp Management

Temps managed by R's GC
– Run gc() to flush extra temps
– No need for explicit management
– User named objects, loaded datasets,
models must be explicitly removed

Full R copy-by-value semantics

Backed by copy-on-write optimization in H2O
– i.e., copies are “free” until modified
Long running loops:
tmp = …
...tmp…
gc()
reclaim tmp!
Rapids – Driving H2O for Munging

A Big Data Language for Machines
– Used by R and Python clients, via REST
– Simple LISP syntax (1st
class functions!)
– Optimized for bulk Array ops

Functional LISP semantics – Pass-by-Value

COW: Copy-on-Write optimization
– Copies are “free” unless data is modified

Lifetimes tracked by the client
REST
Rapids – Join, Sort, & GroupBy

Big Sort, Big Join by Matt Dowle (of data.table)
– Parallel & Distributed; data too big for one machine
– Working: 1bx5 joined with 1bx5 yielding 1bx9
– Now testing 10b rows by 10b rows join on 10 nodes
– Stable sort, index built, can binary search
– Opens door for rolling joins

Any lambda function on Group-Bys
data.table 505s
H2O 1 node 236s
H2O 4 node 113s
Flow

Improvements:
– CMs, ROC, scoring history, deviance plots, cross-
validation metrics, POJO listings, parameter selection
– Grid search. Model & Frame import & export. Change
column type, impute, split frame
– Save/load/share flows
– Diagnostics: cluster status, log files, network tests,
profiler, stack trace, timeline
– Faster for wide datasets
INSTALLED
What's (Not) New!

Same commitment to Quality, Speed, Size, Scale
– 10b x 10b row joins! GLRM! Grid search!

Same Rapid pace of Innovation
– Tons of new code! ~15000 commits!

Same Quality-Driven Culture
– Bigger team! New faces!

Community, Culture, Code
INSTALLED

Contenu connexe

Tendances

Dumbo Hadoop Streaming Made Elegant And Easy Klaas Bosteels
Dumbo Hadoop Streaming Made Elegant And Easy Klaas BosteelsDumbo Hadoop Streaming Made Elegant And Easy Klaas Bosteels
Dumbo Hadoop Streaming Made Elegant And Easy Klaas BosteelsGeorge Ang
 
Scalable Hadoop with succinct Python: the best of both worlds
Scalable Hadoop with succinct Python: the best of both worldsScalable Hadoop with succinct Python: the best of both worlds
Scalable Hadoop with succinct Python: the best of both worldsDataWorks Summit
 
Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)
Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)
Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)Spark Summit
 
Speed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudSpeed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudRevolution Analytics
 
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDSAccelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDSDatabricks
 
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)Thorny path to the Large-Scale Graph Processing (Highload++, 2014)
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)Alexey Zinoviev
 
Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk ...
Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk ...Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk ...
Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk ...Spark Summit
 
Spark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca CanaliSpark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca CanaliSpark Summit
 
Joker'14 Java as a fundamental working tool of the Data Scientist
Joker'14 Java as a fundamental working tool of the Data ScientistJoker'14 Java as a fundamental working tool of the Data Scientist
Joker'14 Java as a fundamental working tool of the Data ScientistAlexey Zinoviev
 
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + FugueIntuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + FugueDatabricks
 
Samza memory capacity_2015_ieee_big_data_data_quality_workshop
Samza memory capacity_2015_ieee_big_data_data_quality_workshopSamza memory capacity_2015_ieee_big_data_data_quality_workshop
Samza memory capacity_2015_ieee_big_data_data_quality_workshopTao Feng
 
Building a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with HadoopBuilding a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with HadoopHadoop User Group
 
Yahoo! Mail antispam - Bay area Hadoop user group
Yahoo! Mail antispam - Bay area Hadoop user groupYahoo! Mail antispam - Bay area Hadoop user group
Yahoo! Mail antispam - Bay area Hadoop user groupHadoop User Group
 
Building, Debugging, and Tuning Spark Machine Leaning Pipelines-(Joseph Bradl...
Building, Debugging, and Tuning Spark Machine Leaning Pipelines-(Joseph Bradl...Building, Debugging, and Tuning Spark Machine Leaning Pipelines-(Joseph Bradl...
Building, Debugging, and Tuning Spark Machine Leaning Pipelines-(Joseph Bradl...Spark Summit
 
Flink Forward SF 2017: Dean Wampler - Streaming Deep Learning Scenarios with...
Flink Forward SF 2017: Dean Wampler -  Streaming Deep Learning Scenarios with...Flink Forward SF 2017: Dean Wampler -  Streaming Deep Learning Scenarios with...
Flink Forward SF 2017: Dean Wampler - Streaming Deep Learning Scenarios with...Flink Forward
 
Scalable Data Science with SparkR: Spark Summit East talk by Felix Cheung
Scalable Data Science with SparkR: Spark Summit East talk by Felix CheungScalable Data Science with SparkR: Spark Summit East talk by Felix Cheung
Scalable Data Science with SparkR: Spark Summit East talk by Felix CheungSpark Summit
 
Re-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityRe-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityJen Aman
 
How We Scaled Bert To Serve 1+ Billion Daily Requests on CPU
How We Scaled Bert To Serve 1+ Billion Daily Requests on CPUHow We Scaled Bert To Serve 1+ Billion Daily Requests on CPU
How We Scaled Bert To Serve 1+ Billion Daily Requests on CPUDatabricks
 
Prediction as a service with ensemble model in SparkML and Python ScikitLearn
Prediction as a service with ensemble model in SparkML and Python ScikitLearnPrediction as a service with ensemble model in SparkML and Python ScikitLearn
Prediction as a service with ensemble model in SparkML and Python ScikitLearnJosef A. Habdank
 

Tendances (20)

Dumbo Hadoop Streaming Made Elegant And Easy Klaas Bosteels
Dumbo Hadoop Streaming Made Elegant And Easy Klaas BosteelsDumbo Hadoop Streaming Made Elegant And Easy Klaas Bosteels
Dumbo Hadoop Streaming Made Elegant And Easy Klaas Bosteels
 
Scalable Hadoop with succinct Python: the best of both worlds
Scalable Hadoop with succinct Python: the best of both worldsScalable Hadoop with succinct Python: the best of both worlds
Scalable Hadoop with succinct Python: the best of both worlds
 
Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)
Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)
Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)
 
Speed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudSpeed up R with parallel programming in the Cloud
Speed up R with parallel programming in the Cloud
 
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDSAccelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
 
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)Thorny path to the Large-Scale Graph Processing (Highload++, 2014)
Thorny path to the Large-Scale Graph Processing (Highload++, 2014)
 
Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk ...
Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk ...Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk ...
Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk ...
 
Spark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca CanaliSpark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca Canali
 
Joker'14 Java as a fundamental working tool of the Data Scientist
Joker'14 Java as a fundamental working tool of the Data ScientistJoker'14 Java as a fundamental working tool of the Data Scientist
Joker'14 Java as a fundamental working tool of the Data Scientist
 
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + FugueIntuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
 
Samza memory capacity_2015_ieee_big_data_data_quality_workshop
Samza memory capacity_2015_ieee_big_data_data_quality_workshopSamza memory capacity_2015_ieee_big_data_data_quality_workshop
Samza memory capacity_2015_ieee_big_data_data_quality_workshop
 
Building a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with HadoopBuilding a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with Hadoop
 
Yahoo! Mail antispam - Bay area Hadoop user group
Yahoo! Mail antispam - Bay area Hadoop user groupYahoo! Mail antispam - Bay area Hadoop user group
Yahoo! Mail antispam - Bay area Hadoop user group
 
Building, Debugging, and Tuning Spark Machine Leaning Pipelines-(Joseph Bradl...
Building, Debugging, and Tuning Spark Machine Leaning Pipelines-(Joseph Bradl...Building, Debugging, and Tuning Spark Machine Leaning Pipelines-(Joseph Bradl...
Building, Debugging, and Tuning Spark Machine Leaning Pipelines-(Joseph Bradl...
 
Cascalog internal dsl_preso
Cascalog internal dsl_presoCascalog internal dsl_preso
Cascalog internal dsl_preso
 
Flink Forward SF 2017: Dean Wampler - Streaming Deep Learning Scenarios with...
Flink Forward SF 2017: Dean Wampler -  Streaming Deep Learning Scenarios with...Flink Forward SF 2017: Dean Wampler -  Streaming Deep Learning Scenarios with...
Flink Forward SF 2017: Dean Wampler - Streaming Deep Learning Scenarios with...
 
Scalable Data Science with SparkR: Spark Summit East talk by Felix Cheung
Scalable Data Science with SparkR: Spark Summit East talk by Felix CheungScalable Data Science with SparkR: Spark Summit East talk by Felix Cheung
Scalable Data Science with SparkR: Spark Summit East talk by Felix Cheung
 
Re-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityRe-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance Understandability
 
How We Scaled Bert To Serve 1+ Billion Daily Requests on CPU
How We Scaled Bert To Serve 1+ Billion Daily Requests on CPUHow We Scaled Bert To Serve 1+ Billion Daily Requests on CPU
How We Scaled Bert To Serve 1+ Billion Daily Requests on CPU
 
Prediction as a service with ensemble model in SparkML and Python ScikitLearn
Prediction as a service with ensemble model in SparkML and Python ScikitLearnPrediction as a service with ensemble model in SparkML and Python ScikitLearn
Prediction as a service with ensemble model in SparkML and Python ScikitLearn
 

En vedette

H2O World - Self Guiding Applications with Venkatesh Yadav
H2O World - Self Guiding Applications with Venkatesh YadavH2O World - Self Guiding Applications with Venkatesh Yadav
H2O World - Self Guiding Applications with Venkatesh YadavSri Ambati
 
Basic H2O for Python with Eric Eckstrand
Basic H2O for Python with Eric EckstrandBasic H2O for Python with Eric Eckstrand
Basic H2O for Python with Eric EckstrandSri Ambati
 
H2O World - Translating Advanced Analytics for Business Users - Conor Jensen
H2O World - Translating Advanced Analytics for Business Users - Conor JensenH2O World - Translating Advanced Analytics for Business Users - Conor Jensen
H2O World - Translating Advanced Analytics for Business Users - Conor JensenSri Ambati
 
H2O World - Survey of Available Machine Learning Frameworks - Brendan Herger
H2O World - Survey of Available Machine Learning Frameworks - Brendan HergerH2O World - Survey of Available Machine Learning Frameworks - Brendan Herger
H2O World - Survey of Available Machine Learning Frameworks - Brendan HergerSri Ambati
 
H2O World - Quora: Machine Learning Algorithms to Grow the World's Knowledge ...
H2O World - Quora: Machine Learning Algorithms to Grow the World's Knowledge ...H2O World - Quora: Machine Learning Algorithms to Grow the World's Knowledge ...
H2O World - Quora: Machine Learning Algorithms to Grow the World's Knowledge ...Sri Ambati
 
H2O World - H2O Rains with Databricks Cloud
H2O World - H2O Rains with Databricks CloudH2O World - H2O Rains with Databricks Cloud
H2O World - H2O Rains with Databricks CloudSri Ambati
 
Sparkling Water Meetup 4.15.15
Sparkling Water Meetup 4.15.15Sparkling Water Meetup 4.15.15
Sparkling Water Meetup 4.15.15Sri Ambati
 
H2O World - Building a Smarter Application - Tom Kraljevic
H2O World - Building a Smarter Application - Tom KraljevicH2O World - Building a Smarter Application - Tom Kraljevic
H2O World - Building a Smarter Application - Tom KraljevicSri Ambati
 
Data & Data Alliances - Scott Mclellan
Data & Data Alliances - Scott MclellanData & Data Alliances - Scott Mclellan
Data & Data Alliances - Scott MclellanSri Ambati
 
H2O World - Solving Customer Churn with Machine Learning - Julian Bharadwaj
H2O World - Solving Customer Churn with Machine Learning - Julian BharadwajH2O World - Solving Customer Churn with Machine Learning - Julian Bharadwaj
H2O World - Solving Customer Churn with Machine Learning - Julian BharadwajSri Ambati
 
H2O World - What you need before doing predictive analysis - Keen.io
H2O World - What you need before doing predictive analysis - Keen.ioH2O World - What you need before doing predictive analysis - Keen.io
H2O World - What you need before doing predictive analysis - Keen.ioSri Ambati
 
The Joys of Clean Data with Matt Dowle
The Joys of Clean Data with Matt DowleThe Joys of Clean Data with Matt Dowle
The Joys of Clean Data with Matt DowleSri Ambati
 
H2O World - A Look Under Progressive's Big Data Hood - Pawan Divakarla & Bria...
H2O World - A Look Under Progressive's Big Data Hood - Pawan Divakarla & Bria...H2O World - A Look Under Progressive's Big Data Hood - Pawan Divakarla & Bria...
H2O World - A Look Under Progressive's Big Data Hood - Pawan Divakarla & Bria...Sri Ambati
 
Intro to H2O Machine Learning in R at Santa Clara University
Intro to H2O Machine Learning in R at Santa Clara UniversityIntro to H2O Machine Learning in R at Santa Clara University
Intro to H2O Machine Learning in R at Santa Clara UniversitySri Ambati
 
Introduction to Data Science with H2O- Mountain View
Introduction to Data Science with H2O- Mountain ViewIntroduction to Data Science with H2O- Mountain View
Introduction to Data Science with H2O- Mountain ViewSri Ambati
 
H2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to EveryoneH2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to EveryoneSri Ambati
 
H2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User GroupH2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User GroupSri Ambati
 
H2O PySparkling Water
H2O PySparkling WaterH2O PySparkling Water
H2O PySparkling WaterSri Ambati
 

En vedette (18)

H2O World - Self Guiding Applications with Venkatesh Yadav
H2O World - Self Guiding Applications with Venkatesh YadavH2O World - Self Guiding Applications with Venkatesh Yadav
H2O World - Self Guiding Applications with Venkatesh Yadav
 
Basic H2O for Python with Eric Eckstrand
Basic H2O for Python with Eric EckstrandBasic H2O for Python with Eric Eckstrand
Basic H2O for Python with Eric Eckstrand
 
H2O World - Translating Advanced Analytics for Business Users - Conor Jensen
H2O World - Translating Advanced Analytics for Business Users - Conor JensenH2O World - Translating Advanced Analytics for Business Users - Conor Jensen
H2O World - Translating Advanced Analytics for Business Users - Conor Jensen
 
H2O World - Survey of Available Machine Learning Frameworks - Brendan Herger
H2O World - Survey of Available Machine Learning Frameworks - Brendan HergerH2O World - Survey of Available Machine Learning Frameworks - Brendan Herger
H2O World - Survey of Available Machine Learning Frameworks - Brendan Herger
 
H2O World - Quora: Machine Learning Algorithms to Grow the World's Knowledge ...
H2O World - Quora: Machine Learning Algorithms to Grow the World's Knowledge ...H2O World - Quora: Machine Learning Algorithms to Grow the World's Knowledge ...
H2O World - Quora: Machine Learning Algorithms to Grow the World's Knowledge ...
 
H2O World - H2O Rains with Databricks Cloud
H2O World - H2O Rains with Databricks CloudH2O World - H2O Rains with Databricks Cloud
H2O World - H2O Rains with Databricks Cloud
 
Sparkling Water Meetup 4.15.15
Sparkling Water Meetup 4.15.15Sparkling Water Meetup 4.15.15
Sparkling Water Meetup 4.15.15
 
H2O World - Building a Smarter Application - Tom Kraljevic
H2O World - Building a Smarter Application - Tom KraljevicH2O World - Building a Smarter Application - Tom Kraljevic
H2O World - Building a Smarter Application - Tom Kraljevic
 
Data & Data Alliances - Scott Mclellan
Data & Data Alliances - Scott MclellanData & Data Alliances - Scott Mclellan
Data & Data Alliances - Scott Mclellan
 
H2O World - Solving Customer Churn with Machine Learning - Julian Bharadwaj
H2O World - Solving Customer Churn with Machine Learning - Julian BharadwajH2O World - Solving Customer Churn with Machine Learning - Julian Bharadwaj
H2O World - Solving Customer Churn with Machine Learning - Julian Bharadwaj
 
H2O World - What you need before doing predictive analysis - Keen.io
H2O World - What you need before doing predictive analysis - Keen.ioH2O World - What you need before doing predictive analysis - Keen.io
H2O World - What you need before doing predictive analysis - Keen.io
 
The Joys of Clean Data with Matt Dowle
The Joys of Clean Data with Matt DowleThe Joys of Clean Data with Matt Dowle
The Joys of Clean Data with Matt Dowle
 
H2O World - A Look Under Progressive's Big Data Hood - Pawan Divakarla & Bria...
H2O World - A Look Under Progressive's Big Data Hood - Pawan Divakarla & Bria...H2O World - A Look Under Progressive's Big Data Hood - Pawan Divakarla & Bria...
H2O World - A Look Under Progressive's Big Data Hood - Pawan Divakarla & Bria...
 
Intro to H2O Machine Learning in R at Santa Clara University
Intro to H2O Machine Learning in R at Santa Clara UniversityIntro to H2O Machine Learning in R at Santa Clara University
Intro to H2O Machine Learning in R at Santa Clara University
 
Introduction to Data Science with H2O- Mountain View
Introduction to Data Science with H2O- Mountain ViewIntroduction to Data Science with H2O- Mountain View
Introduction to Data Science with H2O- Mountain View
 
H2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to EveryoneH2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to Everyone
 
H2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User GroupH2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User Group
 
H2O PySparkling Water
H2O PySparkling WaterH2O PySparkling Water
H2O PySparkling Water
 

Similaire à What's New in H2O Python, R, and Flow

Cray XT Porting, Scaling, and Optimization Best Practices
Cray XT Porting, Scaling, and Optimization Best PracticesCray XT Porting, Scaling, and Optimization Best Practices
Cray XT Porting, Scaling, and Optimization Best PracticesJeff Larkin
 
Emerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataEmerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataRahul Jain
 
Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014Michael Renner
 
Python VS GO
Python VS GOPython VS GO
Python VS GOOfir Nir
 
May2010 hex-core-opt
May2010 hex-core-optMay2010 hex-core-opt
May2010 hex-core-optJeff Larkin
 
[Webinar Slides] Programming the Network Dataplane in P4
[Webinar Slides] Programming the Network Dataplane in P4[Webinar Slides] Programming the Network Dataplane in P4
[Webinar Slides] Programming the Network Dataplane in P4Open Networking Summits
 
(Julien le dem) parquet
(Julien le dem)   parquet(Julien le dem)   parquet
(Julien le dem) parquetNAVER D2
 
Quantifying Container Runtime Performance: OSCON 2017 Open Container Day
Quantifying Container Runtime Performance: OSCON 2017 Open Container DayQuantifying Container Runtime Performance: OSCON 2017 Open Container Day
Quantifying Container Runtime Performance: OSCON 2017 Open Container DayPhil Estes
 
Hadoop with Python
Hadoop with PythonHadoop with Python
Hadoop with PythonDonald Miner
 
An Overview Of Python With Functional Programming
An Overview Of Python With Functional ProgrammingAn Overview Of Python With Functional Programming
An Overview Of Python With Functional ProgrammingAdam Getchell
 
How to use Parquet as a Sasis for ETL and Analytics
How to use Parquet as a Sasis for ETL and AnalyticsHow to use Parquet as a Sasis for ETL and Analytics
How to use Parquet as a Sasis for ETL and AnalyticsDataWorks Summit
 
Using R on High Performance Computers
Using R on High Performance ComputersUsing R on High Performance Computers
Using R on High Performance ComputersDave Hiltbrand
 
Tajo_Meetup_20141120
Tajo_Meetup_20141120Tajo_Meetup_20141120
Tajo_Meetup_20141120Hyoungjun Kim
 
Serving Deep Learning Models At Scale With RedisAI: Luca Antiga
Serving Deep Learning Models At Scale With RedisAI: Luca AntigaServing Deep Learning Models At Scale With RedisAI: Luca Antiga
Serving Deep Learning Models At Scale With RedisAI: Luca AntigaRedis Labs
 
RAPIDS: GPU-Accelerated ETL and Feature Engineering
RAPIDS: GPU-Accelerated ETL and Feature EngineeringRAPIDS: GPU-Accelerated ETL and Feature Engineering
RAPIDS: GPU-Accelerated ETL and Feature EngineeringKeith Kraus
 
introduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pigintroduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and PigRicardo Varela
 
OSS EU: Deep Dive into Building Streaming Applications with Apache Pulsar
OSS EU:  Deep Dive into Building Streaming Applications with Apache PulsarOSS EU:  Deep Dive into Building Streaming Applications with Apache Pulsar
OSS EU: Deep Dive into Building Streaming Applications with Apache PulsarTimothy Spann
 

Similaire à What's New in H2O Python, R, and Flow (20)

Cray XT Porting, Scaling, and Optimization Best Practices
Cray XT Porting, Scaling, and Optimization Best PracticesCray XT Porting, Scaling, and Optimization Best Practices
Cray XT Porting, Scaling, and Optimization Best Practices
 
Emerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataEmerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big Data
 
Python redis talk
Python redis talkPython redis talk
Python redis talk
 
Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014
 
Python VS GO
Python VS GOPython VS GO
Python VS GO
 
May2010 hex-core-opt
May2010 hex-core-optMay2010 hex-core-opt
May2010 hex-core-opt
 
[Webinar Slides] Programming the Network Dataplane in P4
[Webinar Slides] Programming the Network Dataplane in P4[Webinar Slides] Programming the Network Dataplane in P4
[Webinar Slides] Programming the Network Dataplane in P4
 
Flink internals web
Flink internals web Flink internals web
Flink internals web
 
(Julien le dem) parquet
(Julien le dem)   parquet(Julien le dem)   parquet
(Julien le dem) parquet
 
Microsoft cosmos
Microsoft cosmosMicrosoft cosmos
Microsoft cosmos
 
Quantifying Container Runtime Performance: OSCON 2017 Open Container Day
Quantifying Container Runtime Performance: OSCON 2017 Open Container DayQuantifying Container Runtime Performance: OSCON 2017 Open Container Day
Quantifying Container Runtime Performance: OSCON 2017 Open Container Day
 
Hadoop with Python
Hadoop with PythonHadoop with Python
Hadoop with Python
 
An Overview Of Python With Functional Programming
An Overview Of Python With Functional ProgrammingAn Overview Of Python With Functional Programming
An Overview Of Python With Functional Programming
 
How to use Parquet as a Sasis for ETL and Analytics
How to use Parquet as a Sasis for ETL and AnalyticsHow to use Parquet as a Sasis for ETL and Analytics
How to use Parquet as a Sasis for ETL and Analytics
 
Using R on High Performance Computers
Using R on High Performance ComputersUsing R on High Performance Computers
Using R on High Performance Computers
 
Tajo_Meetup_20141120
Tajo_Meetup_20141120Tajo_Meetup_20141120
Tajo_Meetup_20141120
 
Serving Deep Learning Models At Scale With RedisAI: Luca Antiga
Serving Deep Learning Models At Scale With RedisAI: Luca AntigaServing Deep Learning Models At Scale With RedisAI: Luca Antiga
Serving Deep Learning Models At Scale With RedisAI: Luca Antiga
 
RAPIDS: GPU-Accelerated ETL and Feature Engineering
RAPIDS: GPU-Accelerated ETL and Feature EngineeringRAPIDS: GPU-Accelerated ETL and Feature Engineering
RAPIDS: GPU-Accelerated ETL and Feature Engineering
 
introduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pigintroduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pig
 
OSS EU: Deep Dive into Building Streaming Applications with Apache Pulsar
OSS EU:  Deep Dive into Building Streaming Applications with Apache PulsarOSS EU:  Deep Dive into Building Streaming Applications with Apache Pulsar
OSS EU: Deep Dive into Building Streaming Applications with Apache Pulsar
 

Plus de Sri Ambati

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Generative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxGenerative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxSri Ambati
 
AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek Sri Ambati
 
LLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thLLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thSri Ambati
 
Building, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionBuilding, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionSri Ambati
 
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Sri Ambati
 
Risk Management for LLMs
Risk Management for LLMsRisk Management for LLMs
Risk Management for LLMsSri Ambati
 
Open-Source AI: Community is the Way
Open-Source AI: Community is the WayOpen-Source AI: Community is the Way
Open-Source AI: Community is the WaySri Ambati
 
Building Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OSri Ambati
 
Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Sri Ambati
 
Cutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersSri Ambati
 
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Sri Ambati
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Sri Ambati
 
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...Sri Ambati
 
LLM Interpretability
LLM Interpretability LLM Interpretability
LLM Interpretability Sri Ambati
 
Never Reply to an Email Again
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email AgainSri Ambati
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Sri Ambati
 
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...Sri Ambati
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...Sri Ambati
 
AI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneyAI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneySri Ambati
 

Plus de Sri Ambati (20)

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Generative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxGenerative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptx
 
AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek
 
LLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thLLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5th
 
Building, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionBuilding, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for Production
 
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
 
Risk Management for LLMs
Risk Management for LLMsRisk Management for LLMs
Risk Management for LLMs
 
Open-Source AI: Community is the Way
Open-Source AI: Community is the WayOpen-Source AI: Community is the Way
Open-Source AI: Community is the Way
 
Building Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2O
 
Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical
 
Cutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM Papers
 
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
 
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
 
LLM Interpretability
LLM Interpretability LLM Interpretability
LLM Interpretability
 
Never Reply to an Email Again
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email Again
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)
 
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
 
AI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneyAI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation Journey
 

Dernier

Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsChristian Birchler
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 
cpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.pptcpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.pptrcbcrtm
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Mater
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Developmentvyaparkranti
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 

Dernier (20)

2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 
cpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.pptcpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.ppt
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 
Odoo Development Company in India | Devintelle Consulting Service
Odoo Development Company in India | Devintelle Consulting ServiceOdoo Development Company in India | Devintelle Consulting Service
Odoo Development Company in India | Devintelle Consulting Service
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Development
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 

What's New in H2O Python, R, and Flow

  • 2.
  • 3. What do these stickers mean? Pick up stickers by getting installed!!!! or get install help at the information booth Hands-On sessions need H2O installed! Generally need Datasets Generally need one of Python or R (or both!) Python 2.7 is a must (3.0 still WIP!) R 3.2.2 is recommended R 3.1.0 will not work Python 2.7 is a must (3.0 still WIP!)
  • 4. H2O-3  Complete rewrite of H2O Core Internals – 14700 commits, 526 releases, 94 branches  Complete {Rapids, Model building, API} rewrite  Loads of Algo improvements; here's a few: – Grid search, stopping criteria, row weights, n-folds CV, checkpoints, distributions • New data source ORC; Strings: UTF-8 handling H2O2 commits H2O3 commits
  • 5. H2O-3 Algos • Old ones improve... – GLM beta constraints, offsets, multinomial – GBM, DRF stochastic sampling, ncats_bin, poisson, gamma, tweedie, huber, laplace  Generalized Low Rank Modeling - Anqi – Dimensionality reduction, imputation  Ensembles - Erin
  • 6. Python  Fully function Python Client  Parity with R client  Uses your 2.7 Python install directly (3.0 WIP)  IPython, Jupyter  Load data, munge & clean, build models – Pythonic column & row selection & slicing – dataframe.apply(lambda x : …)  Interactive console response on Big Data # Mean of the squares minus square of the means means = H2OFrame(zip(*[df.mean()])) sdev = df.apply(lambda x: (x*x).sum(),0) / df.nrow – means*means
  • 7. Pythonic  Load from csv, hdfs, nfs, hive, s3, or any 1-d or 2-d python obj:  Full set of Pythonic column & row selectors  Iteration, list comprehension df = h2o.import_file('file_path.csv') df = h2o.H2OFrame([any python obj]) df # All columns & rows df['sepal_len'] # Column by name df[2] # Column by index df[-2] # 2nd column from end df[0:5,:] # First 5 rows, all cols df[[1,3,5]] # List of columns (or rows) df[df[0]>0.5,:] # Filter rows (or cols) df[df[0]==None,0] = mean # Impute the mean sum_sqs = [(col*col).sum() for col in df]
  • 8. Python - Data Munging Pipelines  Full Data Munging Pipeline Support – Complex munging and string ops – Feature generation; outliers; imputation – All at H2O in-memory speeds  Generate POJOs with Pipelines – POJOs do data munging! then run Model  Plug into e.g. Storm Bolt, any Java App  See Spencer, Pipelines @ 11:30am; Hank tomorrow DB CSV logs split() asDate join impute, outliers groupby sort train model POJO
  • 9. Python – Big Data, Big Temps  Temps managed by Python's Ref-counting – Aggressively removes temps – No need for explicit management – User named objects, loaded datasets, models must be explicitly removed  Standard Python reference-semantics  Backed by copy-on-write optimization in H2O – i.e., defensive copies are “free” until modified Long running loops: tmp = … ...tmp... reclaim tmp!
  • 10. R – Big Data Temp Management  Temps managed by R's GC – Run gc() to flush extra temps – No need for explicit management – User named objects, loaded datasets, models must be explicitly removed  Full R copy-by-value semantics  Backed by copy-on-write optimization in H2O – i.e., copies are “free” until modified Long running loops: tmp = … ...tmp… gc() reclaim tmp!
  • 11. Rapids – Driving H2O for Munging  A Big Data Language for Machines – Used by R and Python clients, via REST – Simple LISP syntax (1st class functions!) – Optimized for bulk Array ops  Functional LISP semantics – Pass-by-Value  COW: Copy-on-Write optimization – Copies are “free” unless data is modified  Lifetimes tracked by the client REST
  • 12. Rapids – Join, Sort, & GroupBy  Big Sort, Big Join by Matt Dowle (of data.table) – Parallel & Distributed; data too big for one machine – Working: 1bx5 joined with 1bx5 yielding 1bx9 – Now testing 10b rows by 10b rows join on 10 nodes – Stable sort, index built, can binary search – Opens door for rolling joins  Any lambda function on Group-Bys data.table 505s H2O 1 node 236s H2O 4 node 113s
  • 13. Flow  Improvements: – CMs, ROC, scoring history, deviance plots, cross- validation metrics, POJO listings, parameter selection – Grid search. Model & Frame import & export. Change column type, impute, split frame – Save/load/share flows – Diagnostics: cluster status, log files, network tests, profiler, stack trace, timeline – Faster for wide datasets INSTALLED
  • 14. What's (Not) New!  Same commitment to Quality, Speed, Size, Scale – 10b x 10b row joins! GLRM! Grid search!  Same Rapid pace of Innovation – Tons of new code! ~15000 commits!  Same Quality-Driven Culture – Bigger team! New faces!  Community, Culture, Code INSTALLED