5. Revolution Confidential
The professor who invented analytic software for
the experts now wants to take it to the masses
Most advanced statistical
analysis software available
Half the cost of
commercial alternatives
2M+ Users
2,500+ Applications
Statistics
Predictive
Analytics
Data Mining
Visualization
Finance
Life Sciences
Manufacturing
Retail
Telecom
Social Media
Government
5
Power
Productivity
Enterprise
Readiness
6. Revolution Confidential
R evolution R E nterpris e has the Open-
S ource R E ngine at the core
2,500 community packages and growing exponentially
6
R Engine
Language Libraries
Open Source R
Packages
Technical
Support
Web Services
API
Big Data
Analysis
Revolution
Productivity
Environment
Build
Assurance
Parallel
Tools
Multi-Threaded
Math Libraries
Technology
Partners
23. Revolution Confidential
Us e Cas e – Credit R is k
We have a dataset comprised of individuals
and their credit risk
stored on the Netezza Appliance
The goal is to model if someone is
“approvable” for a loan.
This use case will follow a modeling process
(though condensed) from start to finish.
I will discuss each of the parts and at the end
there will be a demo of the code
24. Revolution Confidential
Modeling E xercis e
1. Learning more about the data
2. Prepare the data for modeling
3. Fit models to the data
4. Model Performance
25. Revolution Confidential
1. Learning more about the data
Connect to the IBM Netezza appliance
Summarize the data
Visualize the data
Continuous Variable
x
Frequency
0 5 10 15 20 25
050100150200250300
High School Diploma Bachelors Degree Masters Degree Professional Degree PhD
Discrete Varible
050100150200250300
26. Revolution Confidential
2. Prepare the data for modeling
Split the data in to 70/30 Training/Test sets
Transform some variables
Discretize numeric variables for later use
27. Revolution Confidential
3. Fit models to the data
Build two different models to predict if an
individual is “approvable”
Decision Tree
Naïve Bayes
28. Revolution Confidential
4. Model Performance
Examine confusion matrices to determine:
Training performance
Test performance