SlideShare une entreprise Scribd logo
1  sur  31
Télécharger pour lire hors ligne
© 2012 IBM Corporation1
Revolution Confidential
Revolution R Enterprise for IBM Netezza
© 2012 IBM Corporation2
Revolution Confidential
IBM Netezza with Revolution Analytics
 High-performance, in-database analytics platform for Big Data
– Massively parallel processing delivers 10-100x performance
– Run analytics in-database and eliminate data movement
– Scalable architecture fosters experimentation
 Innovation with Advanced Analytics
– Analytic modeling with most current statistical methods and 2,500+
open source packages
 Enterprise ready advanced analytics software, services &
support
– Security, IDE, training, professional services
– Web Services stack enables integration with front-end
presentation layer
© 2012 IBM CorporationMarch 1, 2012
Revolution Analytics
© 2012 IBM Corporation4
Revolution Confidential
What is R?
 Data analysis software
 A programming language
– Development platform designed by and for statisticians
– Object-oriented: vector, matrix, model, …
– Built-in libraries of algorithms
 An environment
– Huge library of algorithms for data access, data manipulation, analysis
and graphics
 An open-source software project
– Free, open, and active
 A community
– Thousands of contributors, 2 million users
– Resources and help in every domain
Download the White Paper
R is Hot
bit.ly/r-is-hot
Revolution Confidential
The professor who invented analytic software for
the experts now wants to take it to the masses
Most advanced statistical
analysis software available
Half the cost of
commercial alternatives
2M+ Users
2,500+ Applications
Statistics
Predictive
Analytics
Data Mining
Visualization
Finance
Life Sciences
Manufacturing
Retail
Telecom
Social Media
Government
5
Power
Productivity
Enterprise
Readiness
Revolution Confidential
R evolution R E nterpris e has the Open-
S ource R E ngine at the core
2,500 community packages and growing exponentially
6
R Engine
Language Libraries
Open Source R
Packages
Technical
Support
Web Services
API
Big Data
Analysis
Revolution
Productivity
Environment
Build
Assurance
Parallel
Tools
Multi-Threaded
Math Libraries
Technology
Partners
© 2012 IBM CorporationMarch 1, 2012
Working with Revolution R
Enterprise for IBM Netezza
© 2012 IBM Corporation8
Revolution Confidential
Revolution R Enterprise for IBM Netezza
inside the IBM Netezza Architecture
IBM Netezza
Analytics
© 2012 IBM Corporation9
Revolution Confidential
In-Database Paradigms for using R
 In-database Scoring
– Family of apply functions which score
analytic models by using data
parallelism
– Underlying truism is that there is a fact
that can be applied across all data
 Big Data Analytics
– Family of parallelized, in-database
analytics that have R wrappers and
work on entire data set
– Underlying truism exists across all
data
 Grouped by Row (tapply)
– Data and Task Parallelism
• Data flow technique to apply analytics to
naturally occurring groups of data using
non-parallelized analytics
– Underlying relationship in data is by a
group
 Examples
– Customer lifetime value
– Credit score
– Affinity
– Good stock/bad stock
Big data analytics
– Clustering of all data to determine
groupings
– Models that are apply across a whole
data set – decision trees
– Data transformation – variable
selection, correlation
Group
– Forecasting – by store, stock symbol,
etc.
– Build model for each customer or
product or etc.
© 2012 IBM Corporation10
Revolution Confidential
Access In-Database Language Support from R
SQL Java
PythonC
Fortran C++
© 2012 IBM Corporation11
Revolution Confidential
Open Source R Package Support
Vertical
• Econometrics
• Experimental Design
• Computational
Physics
• Clinical Trials
• Environmetrics
• Finance
• Genetics
• Medical Imaging
• Pharmacokinetics
• Phylogenetics
• Psychometrics
• Social Sciences
Horizontal
• Bayesian
• Cluster
• Distributions
• Graphics
• Graphical Models
• Machine Learning
• Multivariate
• Natural Language
Processing
• Optimization
• Robust Statistical
Metrics
• Spatial
• Survival Analysis
• Time Series
2500+
community
packages
© 2012 IBM Corporation12
Revolution Confidential
Using Revolution R Enterprise with IBM Netezza
R Packages integrate and
push analytics processing
in-database
Revolution R Enterprise - Workstation
HTTP
Revolution R Enterprise - Server
RevoDeployR Server
Web Services Interface for R
Business Intelligence, Excel
or Third-Party Application
HostIBM Netezza Analytics
S-Blade
IBM Netezza Analytics
S-Blade
IBM Netezza Analytics
S-Blade
IBM Netezza Analytics
S-Blade
IBM Netezza Analytics
S-Blade
IBM Netezza Analytics
RODBC
&
nzODBC
RODBC
&
nzODBC
© 2012 IBM Corporation13
Revolution Confidential
Deploying Revolution R Enterprise to IBM Netezza
•Remote terminal connection to Host
•Create your R Script
•Compile and Register your R Script as an AE (UDAP)
•Execute SQL that will invoke the registered AE
•Go back Revolution R Client to retrieve results and continue
additional analysis
HostIBM Netezza Analytics
S-Blade
IBM Netezza Analytics
S-Blade
IBM Netezza Analytics
S-Blade
IBM Netezza Analytics
S-Blade
IBM Netezza Analytics
S-Blade
IBM Netezza Analytics
© 2012 IBM Corporation14
Revolution Confidential
Revolution R Enterprise Client Configuration
 Revolution R Enterprise
– Productivity Environment
 Netezza ODBC Drivers
 ‘nz’ R Packages
– nzA, nzR, nzMatrix
 R Package Dependencies
– RODBC
– caTools
– Tree
– Bitops
– E1071
– Rgl
– Ca
– MASS
– XML
© 2012 IBM Corporation15
Revolution ConfidentialIBM Netezza In-Database Analytics from Revolution R
nzR
Package
Encapsulate database and
expose “R”-like constructs
R data.frame =
database table
Apply an R function to a row
of data or grouped rows of
data
nzA
Package
Entry point to the
nzAnalytics
Explicitly parallelized
algorithms that run in
database
nzMatrix
Package
Encapsulation of Matrices
and operations in Database
nz.matrix construct in
R to access matrices in the
database
R operations on
nz.matrix translate to
matrix stored procedure
operations
© 2012 IBM Corporation16
Revolution Confidential
nzR Package
 Basic Functions  Sample Code
Database Connection nzConnect
nzConnectDSN
SQL Execution nzQuery,
nzScalarQuery
nzDeleteTable
Data Management as.nz.data.frame
nz.data.frame
Apply an R function nzApply
nzTApply
nzGroupedApply
R Package Management nzInstallPackages
nzIsPackageInstalled
#load packages
library(nzr)
#connect to a database via ODBC
nzConnect("admin", "xyz", "127.0.0.1", "iclasstest")
#load the iris table
nzdf <- nz.data.frame("iris")
#run a nzTApply against the nz dataframe
fun <- function(x) max(x[,1])
nzTApply(nzdf, nzdf[,5], fun)
© 2012 IBM Corporation17
Revolution Confidential
nzA Package
 Data Manipulation
Moments nz.moments
Quantiles nz.quantile, nz.quartile
Outlier Detection nz.outliers
Frequency Table nz.bitable
Histogram nz.hist
Pearson's Correlation nz.corr
Spearman's Correlation nz.spearman.corr, nz.spearman.corr.s
Covariance nz.cov, nz.cov.matrix
Mutual Information nz.mutualinfo
Chi-Square Test nzChisq.test, nz.chisq.test
t -Test t.ls.test, t.me.test, t.pmd.test, t.umd.test
Mann-Whitney-Wilcoxon Test nz.mww.test
Wilcoxon Test nz.wilcoxon.test
Canonical Correlation nz.canonical.corr
One-Way ANOVA nzAnova, nz.anova.CRD.test, nz.anova.RBD.test
Principal Component Analysis nzPCA
Tree-Shaped Bayesian Networks nz.TBNet Apply, nz.TBNet Grow, nz.BigBNControl,
nz.TBNet1g2p, nz.TBNet1g,nz.TBNet2g
© 2012 IBM Corporation18
Revolution Confidential
nzA Package
 Data Transformations
 Model Diagnostics
Discretization nz.efdisc, nz.emdisc, nz.ewdisc
Standardization and Normalization nz.std.norm
Data Imputation nz.impute.data
Misclassification Error nz.cerror
Confusion Matrix nz.acc, nz.CMATRIX STATS
Mean Absolute Error nz.mae
Mean Square Error nz.mse
Relative Absolute Error nz.rae
Percentage Split nz.percentage.split
Cross-Validation nz.cross.validation
© 2012 IBM Corporation19
Revolution Confidential
nzA Package
 Classification
 Regression
 Clustering
 Associative Rule Mining
Naive Bayes nzNaiveBayes,
nz.naivebayes,
nz.predict.naivebayes
Decision Trees nzDecTree,
nz.dectree,
nz.grow.dectree,
nz.print.dectree,
nz.prune.dectree,
nz.predict.dectree
Nearest Neighbors nz.knn
Linear Regression nzLm
Regression Trees nzRegTree,
nz.regtree,
nz.grow.regtree,
nz.print.regtree,
nz.predict.regtree
K-Means Clustering nzKMeans, nz.kmeans,
nz.predict.kmeans
Divisive Clustering nz.divcluster,
nz.predict.divcluster
FP-Growth nz.fpgrowth,
nz.prepare.fpgrowth
© 2012 IBM Corporation20
Revolution Confidential
nzMatrix Package
 Data Manipulation
Coerce or point to a nz.matrix as.nz.matrix, as.nz.matrix.matrix, nz.matrix
Combine Matrices nzCBind, nzRBind
Create Matrices From Tables nzCreateMatrixFromTable, nzCreateTableFromMatrix
Create Special Matrices nzIdentityMatrix, nzNormalMatrix, nzOnesMatrix,
nzRandomMatrix, nzVecToDiag
Decomposition nzSVD, svd, nzEigen
Delete Matrices nzDeleteMatrix, nzDeleteMatrixByName
Dimensions dim, NCOL, ncol, NROW, nrow
Mathematical Functions abs, add, aubtr, ceiling, div, exp, floor, ln, log10, mod,
mult, nzPowerMatrix, pow, rounding, sqrt, trunc
Matrix Engine Initialization nzMatrixEngineInitialization
Matrix Info is.nz.matrix, isSparse, nzExistMatrix, nzExistMatrixByName,
nzGetValidMatrixName
Operators *, +, -, <, ==, >, nzKronecker, nzPMax, nzPMin, nzSetValue,
[, scale, t
Printing Matrices print.nz.matrix
Solve nzInv, nzSolve, nzSolveLLS
Sparse Matrices isSparse, nzSparse2matrix
Summaries
nzAll, nzAny, nzMax, nzMin, nzSsq, nzSum, nzTr
© 2012 IBM CorporationMarch 1, 2012
Demonstration
Using Revolution R
with IBM Netezza
Revolution Confidential
Turbo-Charge Your
Analytics with IB M
Netezza and R evolution
R E nterpris e
Pres ented by:
Derek M Norton, S enior S ales E ngineer
Revolution Confidential
Us e Cas e – Credit R is k
 We have a dataset comprised of individuals
and their credit risk
 stored on the Netezza Appliance
 The goal is to model if someone is
“approvable” for a loan.
 This use case will follow a modeling process
(though condensed) from start to finish.
 I will discuss each of the parts and at the end
there will be a demo of the code
Revolution Confidential
Modeling E xercis e
1. Learning more about the data
2. Prepare the data for modeling
3. Fit models to the data
4. Model Performance
Revolution Confidential
1. Learning more about the data
 Connect to the IBM Netezza appliance
 Summarize the data
 Visualize the data
Continuous Variable
x
Frequency
0 5 10 15 20 25
050100150200250300
High School Diploma Bachelors Degree Masters Degree Professional Degree PhD
Discrete Varible
050100150200250300
Revolution Confidential
2. Prepare the data for modeling
 Split the data in to 70/30 Training/Test sets
 Transform some variables
 Discretize numeric variables for later use
Revolution Confidential
3. Fit models to the data
 Build two different models to predict if an
individual is “approvable”
 Decision Tree
 Naïve Bayes
Revolution Confidential
4. Model Performance
 Examine confusion matrices to determine:
 Training performance
 Test performance
Revolution Confidential
Demo
© 2012 IBM Corporation9
Summary
 Familiar environment for R Developers
– World-class productivity tools
– Enterprise class service, support and integration
 Execution of analytics in-database
– Analytic computing distributed across Netezza nodes and run
in a massively parallel manner
– Each Netezza node gets a data slice and analytics are pushed
down from the Host to the individual nodes
 Capabilities
– R Code executed on Netezza nodes in row-by-row fashion or
on groups of rows
– Enables access to explicitly parallelized algorithms running on
entire data set
– Large-scale parallel matrix operations on database tables
 Performance
– 10-100x Performance improvements
Revolution Confidential
Contact Us
Derek Norton
Solutions Executive
Revolution Analytics
derek.norton@revolutionanalytics.com
www.revolutionanalytics.com +1 (650) 646 9545 Twitter: @RevolutionR
Bill Zanine
Business Solutions Executive, Analytics Solutions
IBM Netezza
wzanine@us.ibm.com

Contenu connexe

Tendances

Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which DataWorks Summit
 
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Cloudera, Inc.
 
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...DataWorks Summit
 
Best Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareBest Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareMapR Technologies
 
Cardinality-HL-Overview
Cardinality-HL-OverviewCardinality-HL-Overview
Cardinality-HL-OverviewHarry Frost
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformMapR Technologies
 
Deploying Enterprise Scale Deep Learning in Actuarial Modeling at Nationwide
Deploying Enterprise Scale Deep Learning in Actuarial Modeling at NationwideDeploying Enterprise Scale Deep Learning in Actuarial Modeling at Nationwide
Deploying Enterprise Scale Deep Learning in Actuarial Modeling at NationwideDatabricks
 
20131111 - Santa Monica - BigDataCamp - Big Data Design Patterns
20131111 - Santa Monica - BigDataCamp - Big Data Design Patterns20131111 - Santa Monica - BigDataCamp - Big Data Design Patterns
20131111 - Santa Monica - BigDataCamp - Big Data Design PatternsAllen Day, PhD
 
Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Performance and Scale Options for R with Hadoop: A comparison of potential ar...Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Performance and Scale Options for R with Hadoop: A comparison of potential ar...Revolution Analytics
 
How Big Data and Hadoop Integrated into BMC ControlM at CARFAX
How Big Data and Hadoop Integrated into BMC ControlM at CARFAXHow Big Data and Hadoop Integrated into BMC ControlM at CARFAX
How Big Data and Hadoop Integrated into BMC ControlM at CARFAXBMC Software
 
Big Data Platforms: An Overview
Big Data Platforms: An OverviewBig Data Platforms: An Overview
Big Data Platforms: An OverviewC. Scyphers
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsMapR Technologies
 
Optimizing your Hadoop Infastructure: An Industry Panel Presentation
Optimizing your Hadoop Infastructure: An Industry Panel PresentationOptimizing your Hadoop Infastructure: An Industry Panel Presentation
Optimizing your Hadoop Infastructure: An Industry Panel PresentationDataWorks Summit
 
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data AnalyticsMapR Technologies
 
Software engineering practices for the data science and machine learning life...
Software engineering practices for the data science and machine learning life...Software engineering practices for the data science and machine learning life...
Software engineering practices for the data science and machine learning life...DataWorks Summit
 
Machine Learning for z/OS
Machine Learning for z/OSMachine Learning for z/OS
Machine Learning for z/OSCuneyt Goksu
 
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Technologies
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarioskcmallu
 

Tendances (20)

Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which
 
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
 
Hadoop Trends
Hadoop TrendsHadoop Trends
Hadoop Trends
 
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
Can you Re-Platform your Teradata, Oracle, Netezza and SQL Server Analytic Wo...
 
Best Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareBest Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in Healthcare
 
Cardinality-HL-Overview
Cardinality-HL-OverviewCardinality-HL-Overview
Cardinality-HL-Overview
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data Platform
 
Deploying Enterprise Scale Deep Learning in Actuarial Modeling at Nationwide
Deploying Enterprise Scale Deep Learning in Actuarial Modeling at NationwideDeploying Enterprise Scale Deep Learning in Actuarial Modeling at Nationwide
Deploying Enterprise Scale Deep Learning in Actuarial Modeling at Nationwide
 
20131111 - Santa Monica - BigDataCamp - Big Data Design Patterns
20131111 - Santa Monica - BigDataCamp - Big Data Design Patterns20131111 - Santa Monica - BigDataCamp - Big Data Design Patterns
20131111 - Santa Monica - BigDataCamp - Big Data Design Patterns
 
Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Performance and Scale Options for R with Hadoop: A comparison of potential ar...Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Performance and Scale Options for R with Hadoop: A comparison of potential ar...
 
How Big Data and Hadoop Integrated into BMC ControlM at CARFAX
How Big Data and Hadoop Integrated into BMC ControlM at CARFAXHow Big Data and Hadoop Integrated into BMC ControlM at CARFAX
How Big Data and Hadoop Integrated into BMC ControlM at CARFAX
 
Big Data Platforms: An Overview
Big Data Platforms: An OverviewBig Data Platforms: An Overview
Big Data Platforms: An Overview
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and Analytics
 
Optimizing your Hadoop Infastructure: An Industry Panel Presentation
Optimizing your Hadoop Infastructure: An Industry Panel PresentationOptimizing your Hadoop Infastructure: An Industry Panel Presentation
Optimizing your Hadoop Infastructure: An Industry Panel Presentation
 
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics
 
Software engineering practices for the data science and machine learning life...
Software engineering practices for the data science and machine learning life...Software engineering practices for the data science and machine learning life...
Software engineering practices for the data science and machine learning life...
 
Machine Learning for z/OS
Machine Learning for z/OSMachine Learning for z/OS
Machine Learning for z/OS
 
Highly Automated IT
Highly Automated ITHighly Automated IT
Highly Automated IT
 
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Product Update - Spring 2017
MapR Product Update - Spring 2017
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
 

En vedette

Studentenvisie op digitalisering
Studentenvisie op digitaliseringStudentenvisie op digitalisering
Studentenvisie op digitaliseringSURF Events
 
Changeout Two Page
Changeout Two PageChangeout Two Page
Changeout Two Pagemroeske
 
CEO-020-領導的意義Ok
CEO-020-領導的意義OkCEO-020-領導的意義Ok
CEO-020-領導的意義Okhandbook
 
HR-017-社會新鮮人生涯規劃
HR-017-社會新鮮人生涯規劃HR-017-社會新鮮人生涯規劃
HR-017-社會新鮮人生涯規劃handbook
 
Ventana Research Presents: Best Practices with Hadoop - Real World Data
Ventana Research Presents:  Best Practices with Hadoop - Real World DataVentana Research Presents:  Best Practices with Hadoop - Real World Data
Ventana Research Presents: Best Practices with Hadoop - Real World DataCloudera, Inc.
 
FEATURED SESSIE: Active academic blended learning
 FEATURED SESSIE: Active academic blended learning FEATURED SESSIE: Active academic blended learning
FEATURED SESSIE: Active academic blended learningSURF Events
 
Een custommade netwerk tweede taalonderwijs
Een custommade netwerk tweede taalonderwijsEen custommade netwerk tweede taalonderwijs
Een custommade netwerk tweede taalonderwijsSURF Events
 
Apache Hadoop YARN and the Docker Ecosystem
Apache Hadoop YARN and the Docker EcosystemApache Hadoop YARN and the Docker Ecosystem
Apache Hadoop YARN and the Docker EcosystemDataWorks Summit
 
Effectieve en efficiënte practica met LabBuddy
Effectieve en efficiënte practica met LabBuddyEffectieve en efficiënte practica met LabBuddy
Effectieve en efficiënte practica met LabBuddySURF Events
 
Onderwijsverandering en innovatie: van visie naar praktijk
Onderwijsverandering en innovatie: van visie naar praktijkOnderwijsverandering en innovatie: van visie naar praktijk
Onderwijsverandering en innovatie: van visie naar praktijkSURF Events
 
Ds 016 精密機械設計總體設計
Ds 016 精密機械設計總體設計Ds 016 精密機械設計總體設計
Ds 016 精密機械設計總體設計handbook
 
Dlaczego (i jak) się uczyć
Dlaczego (i jak) się uczyćDlaczego (i jak) się uczyć
Dlaczego (i jak) się uczyćAnna Pietras
 
Analyze and Visualize Git Log for Fun and Profit - DevTernity 2015
Analyze and Visualize Git Log for Fun and Profit - DevTernity 2015Analyze and Visualize Git Log for Fun and Profit - DevTernity 2015
Analyze and Visualize Git Log for Fun and Profit - DevTernity 2015Raimonds Simanovskis
 
Programming WebSockets with Glassfish and Grizzly
Programming WebSockets with Glassfish and GrizzlyProgramming WebSockets with Glassfish and Grizzly
Programming WebSockets with Glassfish and GrizzlyC2B2 Consulting
 
Asian architecture slides
Asian architecture slidesAsian architecture slides
Asian architecture slidesWm Chia
 
Adaptief leren en Rekenblokken
Adaptief leren en RekenblokkenAdaptief leren en Rekenblokken
Adaptief leren en RekenblokkenSURF Events
 
HR-096-職能為本的領導與管理發展
HR-096-職能為本的領導與管理發展HR-096-職能為本的領導與管理發展
HR-096-職能為本的領導與管理發展handbook
 
Hadoop and Graph Data Management: Challenges and Opportunities
Hadoop and Graph Data Management: Challenges and OpportunitiesHadoop and Graph Data Management: Challenges and Opportunities
Hadoop and Graph Data Management: Challenges and OpportunitiesDaniel Abadi
 

En vedette (20)

Studentenvisie op digitalisering
Studentenvisie op digitaliseringStudentenvisie op digitalisering
Studentenvisie op digitalisering
 
Changeout Two Page
Changeout Two PageChangeout Two Page
Changeout Two Page
 
CEO-020-領導的意義Ok
CEO-020-領導的意義OkCEO-020-領導的意義Ok
CEO-020-領導的意義Ok
 
HR-017-社會新鮮人生涯規劃
HR-017-社會新鮮人生涯規劃HR-017-社會新鮮人生涯規劃
HR-017-社會新鮮人生涯規劃
 
Ventana Research Presents: Best Practices with Hadoop - Real World Data
Ventana Research Presents:  Best Practices with Hadoop - Real World DataVentana Research Presents:  Best Practices with Hadoop - Real World Data
Ventana Research Presents: Best Practices with Hadoop - Real World Data
 
FEATURED SESSIE: Active academic blended learning
 FEATURED SESSIE: Active academic blended learning FEATURED SESSIE: Active academic blended learning
FEATURED SESSIE: Active academic blended learning
 
Een custommade netwerk tweede taalonderwijs
Een custommade netwerk tweede taalonderwijsEen custommade netwerk tweede taalonderwijs
Een custommade netwerk tweede taalonderwijs
 
Velodati
VelodatiVelodati
Velodati
 
Apache Hadoop YARN and the Docker Ecosystem
Apache Hadoop YARN and the Docker EcosystemApache Hadoop YARN and the Docker Ecosystem
Apache Hadoop YARN and the Docker Ecosystem
 
Effectieve en efficiënte practica met LabBuddy
Effectieve en efficiënte practica met LabBuddyEffectieve en efficiënte practica met LabBuddy
Effectieve en efficiënte practica met LabBuddy
 
Onderwijsverandering en innovatie: van visie naar praktijk
Onderwijsverandering en innovatie: van visie naar praktijkOnderwijsverandering en innovatie: van visie naar praktijk
Onderwijsverandering en innovatie: van visie naar praktijk
 
Ds 016 精密機械設計總體設計
Ds 016 精密機械設計總體設計Ds 016 精密機械設計總體設計
Ds 016 精密機械設計總體設計
 
Dlaczego (i jak) się uczyć
Dlaczego (i jak) się uczyćDlaczego (i jak) się uczyć
Dlaczego (i jak) się uczyć
 
Analyze and Visualize Git Log for Fun and Profit - DevTernity 2015
Analyze and Visualize Git Log for Fun and Profit - DevTernity 2015Analyze and Visualize Git Log for Fun and Profit - DevTernity 2015
Analyze and Visualize Git Log for Fun and Profit - DevTernity 2015
 
Programming WebSockets with Glassfish and Grizzly
Programming WebSockets with Glassfish and GrizzlyProgramming WebSockets with Glassfish and Grizzly
Programming WebSockets with Glassfish and Grizzly
 
Asian architecture slides
Asian architecture slidesAsian architecture slides
Asian architecture slides
 
Adaptief leren en Rekenblokken
Adaptief leren en RekenblokkenAdaptief leren en Rekenblokken
Adaptief leren en Rekenblokken
 
HR-096-職能為本的領導與管理發展
HR-096-職能為本的領導與管理發展HR-096-職能為本的領導與管理發展
HR-096-職能為本的領導與管理發展
 
Erfelijkheid
ErfelijkheidErfelijkheid
Erfelijkheid
 
Hadoop and Graph Data Management: Challenges and Opportunities
Hadoop and Graph Data Management: Challenges and OpportunitiesHadoop and Graph Data Management: Challenges and Opportunities
Hadoop and Graph Data Management: Challenges and Opportunities
 

Similaire à Turbo charge-your-analytics-with-ibm-netezza-and-revolution-r-enterprise-presentation-120305145841-phpapp02

Turbo-Charge Your Analytics with IBM Netezza and Revolution R Enterprise: A S...
Turbo-Charge Your Analytics with IBM Netezza and Revolution R Enterprise: A S...Turbo-Charge Your Analytics with IBM Netezza and Revolution R Enterprise: A S...
Turbo-Charge Your Analytics with IBM Netezza and Revolution R Enterprise: A S...Revolution Analytics
 
From Mainframe to Microservices: Vanguard’s Move to the Cloud - ENT331 - re:I...
From Mainframe to Microservices: Vanguard’s Move to the Cloud - ENT331 - re:I...From Mainframe to Microservices: Vanguard’s Move to the Cloud - ENT331 - re:I...
From Mainframe to Microservices: Vanguard’s Move to the Cloud - ENT331 - re:I...Amazon Web Services
 
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...Revolution Analytics
 
Predictive Analytics - Big Data Warehousing Meetup, Zementis
Predictive Analytics - Big Data Warehousing Meetup, ZementisPredictive Analytics - Big Data Warehousing Meetup, Zementis
Predictive Analytics - Big Data Warehousing Meetup, ZementisCaserta
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database RoundtableEric Kavanagh
 
Exploring the Wider World of Big Data
Exploring the Wider World of Big DataExploring the Wider World of Big Data
Exploring the Wider World of Big DataNetApp
 
Data Virtualization for Data Architects (New Zealand)
Data Virtualization for Data Architects (New Zealand)Data Virtualization for Data Architects (New Zealand)
Data Virtualization for Data Architects (New Zealand)Denodo
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopRevolution Analytics
 
Big Data Fabric for At-Scale Real-Time Analysis by Edwin Robbins
 Big Data Fabric for At-Scale Real-Time Analysis by Edwin Robbins Big Data Fabric for At-Scale Real-Time Analysis by Edwin Robbins
Big Data Fabric for At-Scale Real-Time Analysis by Edwin RobbinsData Con LA
 
useR2011 - Edlefsen
useR2011 - EdlefsenuseR2011 - Edlefsen
useR2011 - Edlefsenrusersla
 
Achieving Agility and Scale for Your Data Lake - Talend
Achieving Agility and Scale for Your Data Lake - TalendAchieving Agility and Scale for Your Data Lake - Talend
Achieving Agility and Scale for Your Data Lake - TalendTalend
 
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...Denodo
 
Real-time Analytics with Redis
Real-time Analytics with RedisReal-time Analytics with Redis
Real-time Analytics with RedisCihan Biyikoglu
 
Get Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a ServiceGet Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a ServiceIBM Cloud Data Services
 
How to Increase Performance in IBM Cognos
How to Increase Performance in IBM CognosHow to Increase Performance in IBM Cognos
How to Increase Performance in IBM CognosCresco International
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Group
 
Webinar: Faster Big Data Analytics with MongoDB
Webinar: Faster Big Data Analytics with MongoDBWebinar: Faster Big Data Analytics with MongoDB
Webinar: Faster Big Data Analytics with MongoDBMongoDB
 

Similaire à Turbo charge-your-analytics-with-ibm-netezza-and-revolution-r-enterprise-presentation-120305145841-phpapp02 (20)

Turbo-Charge Your Analytics with IBM Netezza and Revolution R Enterprise: A S...
Turbo-Charge Your Analytics with IBM Netezza and Revolution R Enterprise: A S...Turbo-Charge Your Analytics with IBM Netezza and Revolution R Enterprise: A S...
Turbo-Charge Your Analytics with IBM Netezza and Revolution R Enterprise: A S...
 
From Mainframe to Microservices: Vanguard’s Move to the Cloud - ENT331 - re:I...
From Mainframe to Microservices: Vanguard’s Move to the Cloud - ENT331 - re:I...From Mainframe to Microservices: Vanguard’s Move to the Cloud - ENT331 - re:I...
From Mainframe to Microservices: Vanguard’s Move to the Cloud - ENT331 - re:I...
 
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
 
Predictive Analytics - Big Data Warehousing Meetup, Zementis
Predictive Analytics - Big Data Warehousing Meetup, ZementisPredictive Analytics - Big Data Warehousing Meetup, Zementis
Predictive Analytics - Big Data Warehousing Meetup, Zementis
 
Big Data Analysis Starts with R
Big Data Analysis Starts with RBig Data Analysis Starts with R
Big Data Analysis Starts with R
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
 
Exploring the Wider World of Big Data
Exploring the Wider World of Big DataExploring the Wider World of Big Data
Exploring the Wider World of Big Data
 
Data Virtualization for Data Architects (New Zealand)
Data Virtualization for Data Architects (New Zealand)Data Virtualization for Data Architects (New Zealand)
Data Virtualization for Data Architects (New Zealand)
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and Hadoop
 
Big Data Fabric for At-Scale Real-Time Analysis by Edwin Robbins
 Big Data Fabric for At-Scale Real-Time Analysis by Edwin Robbins Big Data Fabric for At-Scale Real-Time Analysis by Edwin Robbins
Big Data Fabric for At-Scale Real-Time Analysis by Edwin Robbins
 
useR2011 - Edlefsen
useR2011 - EdlefsenuseR2011 - Edlefsen
useR2011 - Edlefsen
 
Achieving Agility and Scale for Your Data Lake - Talend
Achieving Agility and Scale for Your Data Lake - TalendAchieving Agility and Scale for Your Data Lake - Talend
Achieving Agility and Scale for Your Data Lake - Talend
 
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
 
Skilwise Big data
Skilwise Big dataSkilwise Big data
Skilwise Big data
 
Real-time Analytics with Redis
Real-time Analytics with RedisReal-time Analytics with Redis
Real-time Analytics with Redis
 
Get Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a ServiceGet Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a Service
 
SnappyFlow Presentation.pdf
SnappyFlow Presentation.pdfSnappyFlow Presentation.pdf
SnappyFlow Presentation.pdf
 
How to Increase Performance in IBM Cognos
How to Increase Performance in IBM CognosHow to Increase Performance in IBM Cognos
How to Increase Performance in IBM Cognos
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2
 
Webinar: Faster Big Data Analytics with MongoDB
Webinar: Faster Big Data Analytics with MongoDBWebinar: Faster Big Data Analytics with MongoDB
Webinar: Faster Big Data Analytics with MongoDB
 

Turbo charge-your-analytics-with-ibm-netezza-and-revolution-r-enterprise-presentation-120305145841-phpapp02

  • 1. © 2012 IBM Corporation1 Revolution Confidential Revolution R Enterprise for IBM Netezza
  • 2. © 2012 IBM Corporation2 Revolution Confidential IBM Netezza with Revolution Analytics  High-performance, in-database analytics platform for Big Data – Massively parallel processing delivers 10-100x performance – Run analytics in-database and eliminate data movement – Scalable architecture fosters experimentation  Innovation with Advanced Analytics – Analytic modeling with most current statistical methods and 2,500+ open source packages  Enterprise ready advanced analytics software, services & support – Security, IDE, training, professional services – Web Services stack enables integration with front-end presentation layer
  • 3. © 2012 IBM CorporationMarch 1, 2012 Revolution Analytics
  • 4. © 2012 IBM Corporation4 Revolution Confidential What is R?  Data analysis software  A programming language – Development platform designed by and for statisticians – Object-oriented: vector, matrix, model, … – Built-in libraries of algorithms  An environment – Huge library of algorithms for data access, data manipulation, analysis and graphics  An open-source software project – Free, open, and active  A community – Thousands of contributors, 2 million users – Resources and help in every domain Download the White Paper R is Hot bit.ly/r-is-hot
  • 5. Revolution Confidential The professor who invented analytic software for the experts now wants to take it to the masses Most advanced statistical analysis software available Half the cost of commercial alternatives 2M+ Users 2,500+ Applications Statistics Predictive Analytics Data Mining Visualization Finance Life Sciences Manufacturing Retail Telecom Social Media Government 5 Power Productivity Enterprise Readiness
  • 6. Revolution Confidential R evolution R E nterpris e has the Open- S ource R E ngine at the core 2,500 community packages and growing exponentially 6 R Engine Language Libraries Open Source R Packages Technical Support Web Services API Big Data Analysis Revolution Productivity Environment Build Assurance Parallel Tools Multi-Threaded Math Libraries Technology Partners
  • 7. © 2012 IBM CorporationMarch 1, 2012 Working with Revolution R Enterprise for IBM Netezza
  • 8. © 2012 IBM Corporation8 Revolution Confidential Revolution R Enterprise for IBM Netezza inside the IBM Netezza Architecture IBM Netezza Analytics
  • 9. © 2012 IBM Corporation9 Revolution Confidential In-Database Paradigms for using R  In-database Scoring – Family of apply functions which score analytic models by using data parallelism – Underlying truism is that there is a fact that can be applied across all data  Big Data Analytics – Family of parallelized, in-database analytics that have R wrappers and work on entire data set – Underlying truism exists across all data  Grouped by Row (tapply) – Data and Task Parallelism • Data flow technique to apply analytics to naturally occurring groups of data using non-parallelized analytics – Underlying relationship in data is by a group  Examples – Customer lifetime value – Credit score – Affinity – Good stock/bad stock Big data analytics – Clustering of all data to determine groupings – Models that are apply across a whole data set – decision trees – Data transformation – variable selection, correlation Group – Forecasting – by store, stock symbol, etc. – Build model for each customer or product or etc.
  • 10. © 2012 IBM Corporation10 Revolution Confidential Access In-Database Language Support from R SQL Java PythonC Fortran C++
  • 11. © 2012 IBM Corporation11 Revolution Confidential Open Source R Package Support Vertical • Econometrics • Experimental Design • Computational Physics • Clinical Trials • Environmetrics • Finance • Genetics • Medical Imaging • Pharmacokinetics • Phylogenetics • Psychometrics • Social Sciences Horizontal • Bayesian • Cluster • Distributions • Graphics • Graphical Models • Machine Learning • Multivariate • Natural Language Processing • Optimization • Robust Statistical Metrics • Spatial • Survival Analysis • Time Series 2500+ community packages
  • 12. © 2012 IBM Corporation12 Revolution Confidential Using Revolution R Enterprise with IBM Netezza R Packages integrate and push analytics processing in-database Revolution R Enterprise - Workstation HTTP Revolution R Enterprise - Server RevoDeployR Server Web Services Interface for R Business Intelligence, Excel or Third-Party Application HostIBM Netezza Analytics S-Blade IBM Netezza Analytics S-Blade IBM Netezza Analytics S-Blade IBM Netezza Analytics S-Blade IBM Netezza Analytics S-Blade IBM Netezza Analytics RODBC & nzODBC RODBC & nzODBC
  • 13. © 2012 IBM Corporation13 Revolution Confidential Deploying Revolution R Enterprise to IBM Netezza •Remote terminal connection to Host •Create your R Script •Compile and Register your R Script as an AE (UDAP) •Execute SQL that will invoke the registered AE •Go back Revolution R Client to retrieve results and continue additional analysis HostIBM Netezza Analytics S-Blade IBM Netezza Analytics S-Blade IBM Netezza Analytics S-Blade IBM Netezza Analytics S-Blade IBM Netezza Analytics S-Blade IBM Netezza Analytics
  • 14. © 2012 IBM Corporation14 Revolution Confidential Revolution R Enterprise Client Configuration  Revolution R Enterprise – Productivity Environment  Netezza ODBC Drivers  ‘nz’ R Packages – nzA, nzR, nzMatrix  R Package Dependencies – RODBC – caTools – Tree – Bitops – E1071 – Rgl – Ca – MASS – XML
  • 15. © 2012 IBM Corporation15 Revolution ConfidentialIBM Netezza In-Database Analytics from Revolution R nzR Package Encapsulate database and expose “R”-like constructs R data.frame = database table Apply an R function to a row of data or grouped rows of data nzA Package Entry point to the nzAnalytics Explicitly parallelized algorithms that run in database nzMatrix Package Encapsulation of Matrices and operations in Database nz.matrix construct in R to access matrices in the database R operations on nz.matrix translate to matrix stored procedure operations
  • 16. © 2012 IBM Corporation16 Revolution Confidential nzR Package  Basic Functions  Sample Code Database Connection nzConnect nzConnectDSN SQL Execution nzQuery, nzScalarQuery nzDeleteTable Data Management as.nz.data.frame nz.data.frame Apply an R function nzApply nzTApply nzGroupedApply R Package Management nzInstallPackages nzIsPackageInstalled #load packages library(nzr) #connect to a database via ODBC nzConnect("admin", "xyz", "127.0.0.1", "iclasstest") #load the iris table nzdf <- nz.data.frame("iris") #run a nzTApply against the nz dataframe fun <- function(x) max(x[,1]) nzTApply(nzdf, nzdf[,5], fun)
  • 17. © 2012 IBM Corporation17 Revolution Confidential nzA Package  Data Manipulation Moments nz.moments Quantiles nz.quantile, nz.quartile Outlier Detection nz.outliers Frequency Table nz.bitable Histogram nz.hist Pearson's Correlation nz.corr Spearman's Correlation nz.spearman.corr, nz.spearman.corr.s Covariance nz.cov, nz.cov.matrix Mutual Information nz.mutualinfo Chi-Square Test nzChisq.test, nz.chisq.test t -Test t.ls.test, t.me.test, t.pmd.test, t.umd.test Mann-Whitney-Wilcoxon Test nz.mww.test Wilcoxon Test nz.wilcoxon.test Canonical Correlation nz.canonical.corr One-Way ANOVA nzAnova, nz.anova.CRD.test, nz.anova.RBD.test Principal Component Analysis nzPCA Tree-Shaped Bayesian Networks nz.TBNet Apply, nz.TBNet Grow, nz.BigBNControl, nz.TBNet1g2p, nz.TBNet1g,nz.TBNet2g
  • 18. © 2012 IBM Corporation18 Revolution Confidential nzA Package  Data Transformations  Model Diagnostics Discretization nz.efdisc, nz.emdisc, nz.ewdisc Standardization and Normalization nz.std.norm Data Imputation nz.impute.data Misclassification Error nz.cerror Confusion Matrix nz.acc, nz.CMATRIX STATS Mean Absolute Error nz.mae Mean Square Error nz.mse Relative Absolute Error nz.rae Percentage Split nz.percentage.split Cross-Validation nz.cross.validation
  • 19. © 2012 IBM Corporation19 Revolution Confidential nzA Package  Classification  Regression  Clustering  Associative Rule Mining Naive Bayes nzNaiveBayes, nz.naivebayes, nz.predict.naivebayes Decision Trees nzDecTree, nz.dectree, nz.grow.dectree, nz.print.dectree, nz.prune.dectree, nz.predict.dectree Nearest Neighbors nz.knn Linear Regression nzLm Regression Trees nzRegTree, nz.regtree, nz.grow.regtree, nz.print.regtree, nz.predict.regtree K-Means Clustering nzKMeans, nz.kmeans, nz.predict.kmeans Divisive Clustering nz.divcluster, nz.predict.divcluster FP-Growth nz.fpgrowth, nz.prepare.fpgrowth
  • 20. © 2012 IBM Corporation20 Revolution Confidential nzMatrix Package  Data Manipulation Coerce or point to a nz.matrix as.nz.matrix, as.nz.matrix.matrix, nz.matrix Combine Matrices nzCBind, nzRBind Create Matrices From Tables nzCreateMatrixFromTable, nzCreateTableFromMatrix Create Special Matrices nzIdentityMatrix, nzNormalMatrix, nzOnesMatrix, nzRandomMatrix, nzVecToDiag Decomposition nzSVD, svd, nzEigen Delete Matrices nzDeleteMatrix, nzDeleteMatrixByName Dimensions dim, NCOL, ncol, NROW, nrow Mathematical Functions abs, add, aubtr, ceiling, div, exp, floor, ln, log10, mod, mult, nzPowerMatrix, pow, rounding, sqrt, trunc Matrix Engine Initialization nzMatrixEngineInitialization Matrix Info is.nz.matrix, isSparse, nzExistMatrix, nzExistMatrixByName, nzGetValidMatrixName Operators *, +, -, <, ==, >, nzKronecker, nzPMax, nzPMin, nzSetValue, [, scale, t Printing Matrices print.nz.matrix Solve nzInv, nzSolve, nzSolveLLS Sparse Matrices isSparse, nzSparse2matrix Summaries nzAll, nzAny, nzMax, nzMin, nzSsq, nzSum, nzTr
  • 21. © 2012 IBM CorporationMarch 1, 2012 Demonstration Using Revolution R with IBM Netezza
  • 22. Revolution Confidential Turbo-Charge Your Analytics with IB M Netezza and R evolution R E nterpris e Pres ented by: Derek M Norton, S enior S ales E ngineer
  • 23. Revolution Confidential Us e Cas e – Credit R is k  We have a dataset comprised of individuals and their credit risk  stored on the Netezza Appliance  The goal is to model if someone is “approvable” for a loan.  This use case will follow a modeling process (though condensed) from start to finish.  I will discuss each of the parts and at the end there will be a demo of the code
  • 24. Revolution Confidential Modeling E xercis e 1. Learning more about the data 2. Prepare the data for modeling 3. Fit models to the data 4. Model Performance
  • 25. Revolution Confidential 1. Learning more about the data  Connect to the IBM Netezza appliance  Summarize the data  Visualize the data Continuous Variable x Frequency 0 5 10 15 20 25 050100150200250300 High School Diploma Bachelors Degree Masters Degree Professional Degree PhD Discrete Varible 050100150200250300
  • 26. Revolution Confidential 2. Prepare the data for modeling  Split the data in to 70/30 Training/Test sets  Transform some variables  Discretize numeric variables for later use
  • 27. Revolution Confidential 3. Fit models to the data  Build two different models to predict if an individual is “approvable”  Decision Tree  Naïve Bayes
  • 28. Revolution Confidential 4. Model Performance  Examine confusion matrices to determine:  Training performance  Test performance
  • 30. © 2012 IBM Corporation9 Summary  Familiar environment for R Developers – World-class productivity tools – Enterprise class service, support and integration  Execution of analytics in-database – Analytic computing distributed across Netezza nodes and run in a massively parallel manner – Each Netezza node gets a data slice and analytics are pushed down from the Host to the individual nodes  Capabilities – R Code executed on Netezza nodes in row-by-row fashion or on groups of rows – Enables access to explicitly parallelized algorithms running on entire data set – Large-scale parallel matrix operations on database tables  Performance – 10-100x Performance improvements
  • 31. Revolution Confidential Contact Us Derek Norton Solutions Executive Revolution Analytics derek.norton@revolutionanalytics.com www.revolutionanalytics.com +1 (650) 646 9545 Twitter: @RevolutionR Bill Zanine Business Solutions Executive, Analytics Solutions IBM Netezza wzanine@us.ibm.com