SlideShare une entreprise Scribd logo
1  sur  37
Télécharger pour lire hors ligne
Introduction to Predictive
Modeling
Venkat Reddy
Contents
• Fancy visualizations to Predictive Modeling
• The Business Problem
• What is Predictive Modeling
• The Horse Race Analogy
• Credit Risk Model Building
• Other Applications of Predictive Modeling
DataVisualizations
DataVisualizations
Data visualizations can help you make the right decisions quickly…
that will ultimately lead to success.
Data Analysis & Predictive Modeling
Increasingly, business rely on intelligent tools and techniques to analyze data systematically to
improve decision-making.
 Retail sales analytics
 Financial services analytics
 Telecommunications
 SupplyChain analytics
 Transportation analytics
 Risk & Credit analytics
 Talent analytics
 Marketing analytics
 Behavioral analytics
 Collections analytics
 Fraud analytics
 Pricing analytics
Dude we all know that
Data analytics rocks &
Predictive modeling is
going to rule!!!
• Credit Cards & Loans- How does bank approve or decline CC and Loan
applications
• Citi Bank : Present in more than 90 countries.
• More than 100,000 customers apply for credit cards/loans every month
• How do they approve or reject credit card applications?
Business Problem
How does bank approve or decline CC and Loan
applications?
• Approve if the applicant is an auto driver in Bangalore
• Approve if the applicant’s boy friend / girl friend is rich
• Approve if the applicant is a u-19 cricketer
• Decline if the applicant is a software professional from
Bangalore
Any suggestions…
Chill
maddi
The Problem
Who will run away with my money?
• Citi Bank : Present in more than 90 countries.
• More than 100,000 customers apply for credit
cards/loans every month
• All of them have different characteristics
• Out of 10o,000 customers, who all have the higher
probability of default/ Charge off?
• Basically, who will run away with my money?
• We need to predict the probability of “Running away”
How to predict?
We have historical data and we have statistics
Bank builds a model that gives a score to each
customer
“Developing set of equations or mathematical formulation to forecast future
behaviors based on current or historical data.”
Applicants
Predictive Modeling
Lets try to understand predictive modeling
Predictive Modeling – Fitting a model to the data to predict the future.
• Predicting the future –and it is so easy some times
• Who is going to score more runs in IPL-2014?
• That’s it you predicted the future..
• BTW how did you predict?
• Predicting the future based on historical data is
nothing but Predictive modeling
Predictive Modeling
Lets try to understand predictive modeling
Predictive Modeling – Fitting a model to the data to predict the future.
• Who is going to score more runs in IPL 2014?
• Predicting the future …well it is not that easy …
Predictive Modeling
Horse RaceAnalogy
How to bet on best horse in a horse race
The Historical Data
Win vs. Loss record in past 2 years
• Long legs: 75% (Horses with long legs won 75% of the times)
• BreedA: 55%, Breed B: 15 % Others : 30%
• T/L (Tummy to length) ratio <1/2 :75 %
• Gender: Male -68%
• Head size: Small 10%, Medium 15% Large 75%
• Country: Africa -65%
Given the historical data
Which one of these two horses would you bet on?
Kalyan Chethak
Length of legs 150 cm 110 cm
Breed A F
T/L ratio 0.3 0.6
Gender Male Female
Head size Large Small
Country India India
Given the historical data
Which one would you bet on….now?
Kalyan Chethak
Length of legs 110 cm 150 cm
Breed C A
T/L ratio 0.45 0.60
Gender Male Female
Head size Small Large
Country Africa India
Given the historical data
What about best one in this lot?
Horse-1 Horse-2 Horse-3 Horse-4 Horse-5 Horse-6 Horse-7 Horse-8 Horse-9 Horse-10
Length of
legs
109 114 134 130 149 120 104 117 115 135
Breed C A B A F K L B C A
T/L ratio 0.1 0.8 0.5 1.0 0.3 0.3 0.3 0.6 0.7 0.9
Gender Male Female Male Female Male Female Male Female Male Female
Head size L S M M L L S M L M
Country Africa India Aus NZ Africa Africa India India Aus Africa
Citi has a similar problem?
Who is going to run away with my money?
• Given Historical of the customers we want to predict the probability of bad
• We have the data of each customer on
• Customer previous loans, customer previous payments, length of account credit
history, other credit cards and loans, job type, income band etc.,
• We want to predict the probability of default
Credit Risk Model Building-Four main steps
1. Study historical
data
• What are the
causes(Customer
Characteristics)
• What are the
effects(Charge off)
2. Identify the most
impacting factors
3. Find the exact
impact of each
factor(Quantify it)
4. Use these
coefficients for future
The Historical Data of Customers
• Contains all the information about customers
• Contains information across more than 500
variables
• Portion of data is present in the application form
• Portion of the data is available with bank
• Lot of data is maintained in bureau
• Social Security number –in US
• PAN Number in India
Attribute Value
SSN 111259005
Age 27
Number of dependents 2
Number of current loans 1
Number of credit cards 1
Number Installments 30days late in
last 2 years
4
Average utilization % in last 2 years 30%
Time since accounts opened 60 months
Number of previous applications for
credit card
2
Bankrupt NO
The Historical Data of Customers:Types of variables
Payment History
• Account payment information on specific types of
accounts (credit cards, retail accounts, installment
loans, finance company accounts, mortgage, etc.)
• Presence of adverse public records (bankruptcy,
judgements, suits, liens, wage attachments, etc.),
collection items, and/or delinquency (past due items)
• Severity of delinquency (how long past due)
• Amount past due on delinquent accounts or
collection items
• Time since (recency of) past due items (delinquency),
adverse public records (if any), or collection items (if
any)
• Number of past due items on file
• Number of accounts paid as agreed
More variables
AmountsOwed
• Amount owing on accounts
• Amount owing on specific types of accounts
• Lack of a specific type of balance, in some cases
• Number of accounts with balances
• Proportion of credit lines used (proportion of
balances to total credit limits on certain types of
revolving accounts)
• Proportion of installment loan amounts still owing
(proportion of balance to original loan amount on
certain types of installment loans)
Length of Credit History
• Time since accounts opened
• Time since accounts opened, by specific type of account
• Time since account activity
New Credit
• Number of recently opened accounts, and proportion of
accounts that are recently opened, by type of account
• Number of recent credit inquiries
• Time since recent account opening(s), by type of account
• Time since credit inquiry(s)
• Re-establishment of positive credit history following past
payment problems
Even more variables
Identifying most impacting factors
Out of 500 what are those 20 important attributes
Selecting the variables- How do you do that?
Num_Enq Good Bad
0 6400 5
1 5800 69
2 5445 134
3 4500 250
4 4070 375
5 3726 470
6 2879 650
7 1893 876
8 1236 987
9 354 1298
36303 5114
Num_companies Good Bad
0 4858 594
1 4291 602
2 4410 577
3 4738 529
4 4506 707
5 4976 796
6 4723 770
7 4407 583
8 4157 825
9 4970 964
46036 6947
Bad Rate
Num_Enq Good Bad
0 6400 5
1 5800 69
2 5445 134
3 4500 250
4 4070 375
5 3726 470
6 2879 650
7 1893 876
8 1236 987
9 354 1298
36303 5114
Bad Rate
0%
1%
3%
5%
7%
9%
13%
17%
19%
25%
Num_companies Good Bad
0 4858 594
1 4291 602
2 4410 577
3 4738 529
4 4506 707
5 4976 796
6 4723 770
7 4407 583
8 4157 825
9 4970 964
46036 6947
Bad Rate
9%
9%
8%
8%
10%
11%
11%
8%
12%
14%
Information value
%Good %Bad
[x] [Y]
< 5 1850 150 29% 5% 6.31 0.25 0.80 0.20
5-30 1600 400 25% 12% 2.05 0.13 0.31 0.04
31 - 60 1200 600 19% 18% 1.02 0.00 0.01 0.00
60 - 90 900 900 14% 28% 0.51 (0.14) -0.29 0.04
>= 91 800 1200 13% 37% 0.34 (0.24) -0.47 0.11
Total 6350 3250 0.39
IV
Utilizatio
n %
# of
Good
# of Bad
%Good /
%Bad
X -Y
WOE =
Log (X/Y)
Model Building
Historical Data
The model
equation
Model Building
logistic regression model to predict the probability of default
• Probability of bad = w1(Var1)+w2(Var2) +w3(Var1)……+w20(var20)
• Logistic regression gives us those weights
• Predicting the probability
• Odds of bad=0.13(number of cards)+ 0.21 (utilization)+…….+0.06(number of loan
applications)
• That’s it ….we are done
Credit Risk Model Building-Example
Attributes used on the model
1. Age  Application form
2. MonthlyIncome1Application form
3. Dependents  Application form
4. Number of times 30DPD in last two years  Bureau
5. Debt Ratio Bureau /Bank
6. Utilization  Bureau /Bank
7. Number of loans Bureau /Bank
Model building and Implementation
Data: http://www.kaggle.com/c/GiveMeSomeCredit
Model building script:
The final Model
Actual model Building Steps
Objective &
Portfolio
identification
Customer score
vs. account
score
Exclusions
Observation
point and/or
window
Performance
window
Bad definition Segmentation
Validation plan
and samples
Variable
Selection
Model Building
Scoring and
Validation
Score
Implementation
Marketing Example
Predicting the response probability to a marketing Campaign
• Selling Mobile phones – Marketing campaign
• Who should we target?
• Consider historical data of mobile phone buyers
• See their characteristics
• Find top impacting characteristics
• Find weight of each characteristic
• Score new population
• Decide on the cut off
• Try to sell people who score more than cut off
Other Applications of Model Building
• Fraud transactions scorecard – Fraud identification based on attributes like
transaction amount, place, time, frequency of transactions etc.,
• Attrition modeling – Predicting employee attrition based on their characteristics
Thank You
Venkat Reddy
http://www.trendwiseanalytics.com/training/venkat.php

Contenu connexe

Tendances

Introduction to Business Analytics Part 1
Introduction to Business Analytics Part 1Introduction to Business Analytics Part 1
Introduction to Business Analytics Part 1Beamsync
 
The Evolution of Data Science
The Evolution of Data ScienceThe Evolution of Data Science
The Evolution of Data ScienceKenny Daniel
 
Introduction on Data Science
Introduction on Data ScienceIntroduction on Data Science
Introduction on Data ScienceEdureka!
 
Data Visualization
Data VisualizationData Visualization
Data Visualizationsimonwandrew
 
Data science applications and usecases
Data science applications and usecasesData science applications and usecases
Data science applications and usecasesSreenatha Reddy K R
 
Data Science Training | Data Science For Beginners | Data Science With Python...
Data Science Training | Data Science For Beginners | Data Science With Python...Data Science Training | Data Science For Beginners | Data Science With Python...
Data Science Training | Data Science For Beginners | Data Science With Python...Simplilearn
 
Linear regression
Linear regressionLinear regression
Linear regressionMartinHogg9
 
Business Analytics and Decision Making
Business Analytics and Decision MakingBusiness Analytics and Decision Making
Business Analytics and Decision MakingExcel Strategies LLC
 
Application statistics in software engineering
Application statistics in software engineeringApplication statistics in software engineering
Application statistics in software engineeringmd emran
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceSrishti44
 
Predictive Analytics - An Overview
Predictive Analytics - An OverviewPredictive Analytics - An Overview
Predictive Analytics - An OverviewMachinePulse
 
Data Visualization in Data Science
Data Visualization in Data ScienceData Visualization in Data Science
Data Visualization in Data ScienceMaloy Manna, PMP®
 
Data science and Artificial Intelligence
Data science and Artificial IntelligenceData science and Artificial Intelligence
Data science and Artificial IntelligenceSuman Srinivasan
 
Introduction To Predictive Modelling
Introduction To Predictive ModellingIntroduction To Predictive Modelling
Introduction To Predictive ModellingSpotle.ai
 
Linear Regression Analysis | Linear Regression in Python | Machine Learning A...
Linear Regression Analysis | Linear Regression in Python | Machine Learning A...Linear Regression Analysis | Linear Regression in Python | Machine Learning A...
Linear Regression Analysis | Linear Regression in Python | Machine Learning A...Simplilearn
 

Tendances (20)

Introduction to Business Analytics Part 1
Introduction to Business Analytics Part 1Introduction to Business Analytics Part 1
Introduction to Business Analytics Part 1
 
The Evolution of Data Science
The Evolution of Data ScienceThe Evolution of Data Science
The Evolution of Data Science
 
Introduction on Data Science
Introduction on Data ScienceIntroduction on Data Science
Introduction on Data Science
 
Data analytics
Data analyticsData analytics
Data analytics
 
Data Visualization
Data VisualizationData Visualization
Data Visualization
 
Predictive analytics
Predictive analytics Predictive analytics
Predictive analytics
 
Data science applications and usecases
Data science applications and usecasesData science applications and usecases
Data science applications and usecases
 
Bayesian inference
Bayesian inferenceBayesian inference
Bayesian inference
 
Data Science Training | Data Science For Beginners | Data Science With Python...
Data Science Training | Data Science For Beginners | Data Science With Python...Data Science Training | Data Science For Beginners | Data Science With Python...
Data Science Training | Data Science For Beginners | Data Science With Python...
 
Introduction to Data Analytics
Introduction to Data AnalyticsIntroduction to Data Analytics
Introduction to Data Analytics
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Business Analytics and Decision Making
Business Analytics and Decision MakingBusiness Analytics and Decision Making
Business Analytics and Decision Making
 
Application statistics in software engineering
Application statistics in software engineeringApplication statistics in software engineering
Application statistics in software engineering
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Predictive Analytics - An Overview
Predictive Analytics - An OverviewPredictive Analytics - An Overview
Predictive Analytics - An Overview
 
Data Visualization in Data Science
Data Visualization in Data ScienceData Visualization in Data Science
Data Visualization in Data Science
 
Decision tree
Decision treeDecision tree
Decision tree
 
Data science and Artificial Intelligence
Data science and Artificial IntelligenceData science and Artificial Intelligence
Data science and Artificial Intelligence
 
Introduction To Predictive Modelling
Introduction To Predictive ModellingIntroduction To Predictive Modelling
Introduction To Predictive Modelling
 
Linear Regression Analysis | Linear Regression in Python | Machine Learning A...
Linear Regression Analysis | Linear Regression in Python | Machine Learning A...Linear Regression Analysis | Linear Regression in Python | Machine Learning A...
Linear Regression Analysis | Linear Regression in Python | Machine Learning A...
 

En vedette

Learning Tableau - Data, Graphs, Filters, Dashboards and Advanced features
Learning Tableau -  Data, Graphs, Filters, Dashboards and Advanced featuresLearning Tableau -  Data, Graphs, Filters, Dashboards and Advanced features
Learning Tableau - Data, Graphs, Filters, Dashboards and Advanced featuresVenkata Reddy Konasani
 
Data exploration validation and sanitization
Data exploration validation and sanitizationData exploration validation and sanitization
Data exploration validation and sanitizationVenkata Reddy Konasani
 
Introduction To Predictive Analytics Part I
Introduction To Predictive Analytics   Part IIntroduction To Predictive Analytics   Part I
Introduction To Predictive Analytics Part Ijayroy
 
Model building in credit card and loan approval
Model building in credit card and loan approval Model building in credit card and loan approval
Model building in credit card and loan approval Venkata Reddy Konasani
 

En vedette (20)

R- Introduction
R- IntroductionR- Introduction
R- Introduction
 
Step By Step Guide to Learn R
Step By Step Guide to Learn RStep By Step Guide to Learn R
Step By Step Guide to Learn R
 
Decision tree
Decision treeDecision tree
Decision tree
 
Testing of hypothesis case study
Testing of hypothesis case study Testing of hypothesis case study
Testing of hypothesis case study
 
Predictive Model
Predictive ModelPredictive Model
Predictive Model
 
Machine Learning for Dummies
Machine Learning for DummiesMachine Learning for Dummies
Machine Learning for Dummies
 
A data analyst view of Bigdata
A data analyst view of Bigdata A data analyst view of Bigdata
A data analyst view of Bigdata
 
SAS basics Step by step learning
SAS basics Step by step learningSAS basics Step by step learning
SAS basics Step by step learning
 
Credit Risk Model Building Steps
Credit Risk Model Building StepsCredit Risk Model Building Steps
Credit Risk Model Building Steps
 
Learning Tableau - Data, Graphs, Filters, Dashboards and Advanced features
Learning Tableau -  Data, Graphs, Filters, Dashboards and Advanced featuresLearning Tableau -  Data, Graphs, Filters, Dashboards and Advanced features
Learning Tableau - Data, Graphs, Filters, Dashboards and Advanced features
 
Data exploration validation and sanitization
Data exploration validation and sanitizationData exploration validation and sanitization
Data exploration validation and sanitization
 
Big data Introduction by Mohan
Big data Introduction by MohanBig data Introduction by Mohan
Big data Introduction by Mohan
 
Statistical Distributions
Statistical DistributionsStatistical Distributions
Statistical Distributions
 
Testing of hypothesis
Testing of hypothesisTesting of hypothesis
Testing of hypothesis
 
Introduction To Predictive Analytics Part I
Introduction To Predictive Analytics   Part IIntroduction To Predictive Analytics   Part I
Introduction To Predictive Analytics Part I
 
ARIMA
ARIMA ARIMA
ARIMA
 
Multiple regression
Multiple regressionMultiple regression
Multiple regression
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
Model building in credit card and loan approval
Model building in credit card and loan approval Model building in credit card and loan approval
Model building in credit card and loan approval
 
Cluster Analysis for Dummies
Cluster Analysis for DummiesCluster Analysis for Dummies
Cluster Analysis for Dummies
 

Similaire à Introduction to predictive modeling v1

Credit Scores-The Basics
Credit Scores-The BasicsCredit Scores-The Basics
Credit Scores-The BasicsBarbara O'Neill
 
Overview of Data Analytics in Lending Business
Overview of Data Analytics in Lending BusinessOverview of Data Analytics in Lending Business
Overview of Data Analytics in Lending BusinessSanjay Kar
 
Credit Score Basics-04-17
Credit Score Basics-04-17Credit Score Basics-04-17
Credit Score Basics-04-17Barbara O'Neill
 
Funding Assessment Report
Funding Assessment ReportFunding Assessment Report
Funding Assessment ReportInfobrandz
 
Data Science Applications in Finance and Investing
Data Science Applications in Finance and InvestingData Science Applications in Finance and Investing
Data Science Applications in Finance and InvestingChristopher Conlan
 
Credit Slideshow
Credit SlideshowCredit Slideshow
Credit SlideshowElie Gammal
 
ROLE OF credit score WHILE SanctionING LOAN .pptx
ROLE OF credit score  WHILE SanctionING LOAN .pptxROLE OF credit score  WHILE SanctionING LOAN .pptx
ROLE OF credit score WHILE SanctionING LOAN .pptxrekhabawa2
 
Forecasting P2P Credit Risk based on Lending Club data
Forecasting P2P Credit Risk based on Lending Club dataForecasting P2P Credit Risk based on Lending Club data
Forecasting P2P Credit Risk based on Lending Club dataArchange Giscard DESTINE
 
Forecasting peer to_peer_lending_risk
Forecasting peer to_peer_lending_riskForecasting peer to_peer_lending_risk
Forecasting peer to_peer_lending_riskstevenllerner
 
How Eastern Bank Uses Big Data to Better Serve and Protect its Customers
How Eastern Bank Uses Big Data to Better Serve and Protect its CustomersHow Eastern Bank Uses Big Data to Better Serve and Protect its Customers
How Eastern Bank Uses Big Data to Better Serve and Protect its CustomersBrian Griffith
 
Credit Repair Education for Libraries 6.15.19
Credit Repair Education for Libraries  6.15.19Credit Repair Education for Libraries  6.15.19
Credit Repair Education for Libraries 6.15.19Victor Johnson
 
Syoncloud big data for retail banking
Syoncloud big data for retail bankingSyoncloud big data for retail banking
Syoncloud big data for retail bankingSyoncloud
 
Syoncloud big data for retail banking, Syoncloud
Syoncloud big data for retail banking,  SyoncloudSyoncloud big data for retail banking,  Syoncloud
Syoncloud big data for retail banking, SyoncloudLadislav Urban
 
Intro to credit rev 1517
Intro to credit rev 1517Intro to credit rev 1517
Intro to credit rev 1517OriginSC
 
Seminar fico and credit scores presentation new for posting
Seminar fico and credit scores presentation new for postingSeminar fico and credit scores presentation new for posting
Seminar fico and credit scores presentation new for postingnokio
 
The Components of Credit Scores
The Components of Credit ScoresThe Components of Credit Scores
The Components of Credit ScoresRichard Thripp
 

Similaire à Introduction to predictive modeling v1 (20)

Credit Scores-The Basics
Credit Scores-The BasicsCredit Scores-The Basics
Credit Scores-The Basics
 
Overview of Data Analytics in Lending Business
Overview of Data Analytics in Lending BusinessOverview of Data Analytics in Lending Business
Overview of Data Analytics in Lending Business
 
Credit Score Basics-04-17
Credit Score Basics-04-17Credit Score Basics-04-17
Credit Score Basics-04-17
 
Group 1 p53
Group 1 p53Group 1 p53
Group 1 p53
 
Funding Assessment Report
Funding Assessment ReportFunding Assessment Report
Funding Assessment Report
 
Data Science Applications in Finance and Investing
Data Science Applications in Finance and InvestingData Science Applications in Finance and Investing
Data Science Applications in Finance and Investing
 
Credit Slideshow
Credit SlideshowCredit Slideshow
Credit Slideshow
 
Bank Financing for Small Business
Bank Financing for Small BusinessBank Financing for Small Business
Bank Financing for Small Business
 
ROLE OF credit score WHILE SanctionING LOAN .pptx
ROLE OF credit score  WHILE SanctionING LOAN .pptxROLE OF credit score  WHILE SanctionING LOAN .pptx
ROLE OF credit score WHILE SanctionING LOAN .pptx
 
Credit defaulter analysis
Credit defaulter analysisCredit defaulter analysis
Credit defaulter analysis
 
Forecasting P2P Credit Risk based on Lending Club data
Forecasting P2P Credit Risk based on Lending Club dataForecasting P2P Credit Risk based on Lending Club data
Forecasting P2P Credit Risk based on Lending Club data
 
Forecasting peer to_peer_lending_risk
Forecasting peer to_peer_lending_riskForecasting peer to_peer_lending_risk
Forecasting peer to_peer_lending_risk
 
How Eastern Bank Uses Big Data to Better Serve and Protect its Customers
How Eastern Bank Uses Big Data to Better Serve and Protect its CustomersHow Eastern Bank Uses Big Data to Better Serve and Protect its Customers
How Eastern Bank Uses Big Data to Better Serve and Protect its Customers
 
Credit Repair Education for Libraries 6.15.19
Credit Repair Education for Libraries  6.15.19Credit Repair Education for Libraries  6.15.19
Credit Repair Education for Libraries 6.15.19
 
Syoncloud big data for retail banking
Syoncloud big data for retail bankingSyoncloud big data for retail banking
Syoncloud big data for retail banking
 
Syoncloud big data for retail banking, Syoncloud
Syoncloud big data for retail banking,  SyoncloudSyoncloud big data for retail banking,  Syoncloud
Syoncloud big data for retail banking, Syoncloud
 
Intro to credit rev 1517
Intro to credit rev 1517Intro to credit rev 1517
Intro to credit rev 1517
 
Seminar fico and credit scores presentation new for posting
Seminar fico and credit scores presentation new for postingSeminar fico and credit scores presentation new for posting
Seminar fico and credit scores presentation new for posting
 
The Components of Credit Scores
The Components of Credit ScoresThe Components of Credit Scores
The Components of Credit Scores
 
Credit Score in Canada
Credit Score in CanadaCredit Score in Canada
Credit Score in Canada
 

Plus de Venkata Reddy Konasani

Plus de Venkata Reddy Konasani (12)

Transformers 101
Transformers 101 Transformers 101
Transformers 101
 
Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science
 
Model selection and cross validation techniques
Model selection and cross validation techniquesModel selection and cross validation techniques
Model selection and cross validation techniques
 
Neural Network Part-2
Neural Network Part-2Neural Network Part-2
Neural Network Part-2
 
GBM theory code and parameters
GBM theory code and parametersGBM theory code and parameters
GBM theory code and parameters
 
Neural Networks made easy
Neural Networks made easyNeural Networks made easy
Neural Networks made easy
 
Table of Contents - Practical Business Analytics using SAS
Table of Contents - Practical Business Analytics using SAS Table of Contents - Practical Business Analytics using SAS
Table of Contents - Practical Business Analytics using SAS
 
L101 predictive modeling case_study
L101 predictive modeling case_studyL101 predictive modeling case_study
L101 predictive modeling case_study
 
Online data sources for analaysis
Online data sources for analaysis Online data sources for analaysis
Online data sources for analaysis
 
Data Analyst - Interview Guide
Data Analyst - Interview GuideData Analyst - Interview Guide
Data Analyst - Interview Guide
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
Data Exploration, Validation and Sanitization
Data Exploration, Validation and SanitizationData Exploration, Validation and Sanitization
Data Exploration, Validation and Sanitization
 

Dernier

4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptxmary850239
 
How to Uninstall a Module in Odoo 17 Using Command Line
How to Uninstall a Module in Odoo 17 Using Command LineHow to Uninstall a Module in Odoo 17 Using Command Line
How to Uninstall a Module in Odoo 17 Using Command LineCeline George
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...Nguyen Thanh Tu Collection
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Projectjordimapav
 
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptshraddhaparab530
 
6 ways Samsung’s Interactive Display powered by Android changes the classroom
6 ways Samsung’s Interactive Display powered by Android changes the classroom6 ways Samsung’s Interactive Display powered by Android changes the classroom
6 ways Samsung’s Interactive Display powered by Android changes the classroomSamsung Business USA
 
ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6Vanessa Camilleri
 
4.9.24 School Desegregation in Boston.pptx
4.9.24 School Desegregation in Boston.pptx4.9.24 School Desegregation in Boston.pptx
4.9.24 School Desegregation in Boston.pptxmary850239
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...DhatriParmar
 
Satirical Depths - A Study of Gabriel Okara's Poem - 'You Laughed and Laughed...
Satirical Depths - A Study of Gabriel Okara's Poem - 'You Laughed and Laughed...Satirical Depths - A Study of Gabriel Okara's Poem - 'You Laughed and Laughed...
Satirical Depths - A Study of Gabriel Okara's Poem - 'You Laughed and Laughed...HetalPathak10
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWQuiz Club NITW
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationdeepaannamalai16
 
Comparative Literature in India by Amiya dev.pptx
Comparative Literature in India by Amiya dev.pptxComparative Literature in India by Amiya dev.pptx
Comparative Literature in India by Amiya dev.pptxAvaniJani1
 
Indexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdfIndexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdfChristalin Nelson
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmStan Meyer
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...Nguyen Thanh Tu Collection
 
How to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseHow to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseCeline George
 
Narcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfNarcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfPrerana Jadhav
 

Dernier (20)

4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx
 
How to Uninstall a Module in Odoo 17 Using Command Line
How to Uninstall a Module in Odoo 17 Using Command LineHow to Uninstall a Module in Odoo 17 Using Command Line
How to Uninstall a Module in Odoo 17 Using Command Line
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Project
 
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.ppt
 
6 ways Samsung’s Interactive Display powered by Android changes the classroom
6 ways Samsung’s Interactive Display powered by Android changes the classroom6 ways Samsung’s Interactive Display powered by Android changes the classroom
6 ways Samsung’s Interactive Display powered by Android changes the classroom
 
Plagiarism,forms,understand about plagiarism,avoid plagiarism,key significanc...
Plagiarism,forms,understand about plagiarism,avoid plagiarism,key significanc...Plagiarism,forms,understand about plagiarism,avoid plagiarism,key significanc...
Plagiarism,forms,understand about plagiarism,avoid plagiarism,key significanc...
 
ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6
 
4.9.24 School Desegregation in Boston.pptx
4.9.24 School Desegregation in Boston.pptx4.9.24 School Desegregation in Boston.pptx
4.9.24 School Desegregation in Boston.pptx
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
 
Satirical Depths - A Study of Gabriel Okara's Poem - 'You Laughed and Laughed...
Satirical Depths - A Study of Gabriel Okara's Poem - 'You Laughed and Laughed...Satirical Depths - A Study of Gabriel Okara's Poem - 'You Laughed and Laughed...
Satirical Depths - A Study of Gabriel Okara's Poem - 'You Laughed and Laughed...
 
Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITW
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentation
 
Comparative Literature in India by Amiya dev.pptx
Comparative Literature in India by Amiya dev.pptxComparative Literature in India by Amiya dev.pptx
Comparative Literature in India by Amiya dev.pptx
 
Indexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdfIndexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdf
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and Film
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...
 
How to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseHow to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 Database
 
Narcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfNarcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdf
 

Introduction to predictive modeling v1

  • 2. Contents • Fancy visualizations to Predictive Modeling • The Business Problem • What is Predictive Modeling • The Horse Race Analogy • Credit Risk Model Building • Other Applications of Predictive Modeling
  • 4. DataVisualizations Data visualizations can help you make the right decisions quickly… that will ultimately lead to success.
  • 5. Data Analysis & Predictive Modeling Increasingly, business rely on intelligent tools and techniques to analyze data systematically to improve decision-making.  Retail sales analytics  Financial services analytics  Telecommunications  SupplyChain analytics  Transportation analytics  Risk & Credit analytics  Talent analytics  Marketing analytics  Behavioral analytics  Collections analytics  Fraud analytics  Pricing analytics
  • 6. Dude we all know that Data analytics rocks & Predictive modeling is going to rule!!!
  • 7. • Credit Cards & Loans- How does bank approve or decline CC and Loan applications • Citi Bank : Present in more than 90 countries. • More than 100,000 customers apply for credit cards/loans every month • How do they approve or reject credit card applications? Business Problem
  • 8. How does bank approve or decline CC and Loan applications? • Approve if the applicant is an auto driver in Bangalore • Approve if the applicant’s boy friend / girl friend is rich • Approve if the applicant is a u-19 cricketer • Decline if the applicant is a software professional from Bangalore Any suggestions… Chill maddi
  • 9. The Problem Who will run away with my money? • Citi Bank : Present in more than 90 countries. • More than 100,000 customers apply for credit cards/loans every month • All of them have different characteristics • Out of 10o,000 customers, who all have the higher probability of default/ Charge off? • Basically, who will run away with my money? • We need to predict the probability of “Running away”
  • 10. How to predict? We have historical data and we have statistics
  • 11. Bank builds a model that gives a score to each customer “Developing set of equations or mathematical formulation to forecast future behaviors based on current or historical data.” Applicants
  • 12. Predictive Modeling Lets try to understand predictive modeling Predictive Modeling – Fitting a model to the data to predict the future. • Predicting the future –and it is so easy some times • Who is going to score more runs in IPL-2014? • That’s it you predicted the future.. • BTW how did you predict? • Predicting the future based on historical data is nothing but Predictive modeling
  • 13. Predictive Modeling Lets try to understand predictive modeling Predictive Modeling – Fitting a model to the data to predict the future. • Who is going to score more runs in IPL 2014? • Predicting the future …well it is not that easy …
  • 14. Predictive Modeling Horse RaceAnalogy How to bet on best horse in a horse race
  • 15. The Historical Data Win vs. Loss record in past 2 years • Long legs: 75% (Horses with long legs won 75% of the times) • BreedA: 55%, Breed B: 15 % Others : 30% • T/L (Tummy to length) ratio <1/2 :75 % • Gender: Male -68% • Head size: Small 10%, Medium 15% Large 75% • Country: Africa -65%
  • 16. Given the historical data Which one of these two horses would you bet on? Kalyan Chethak Length of legs 150 cm 110 cm Breed A F T/L ratio 0.3 0.6 Gender Male Female Head size Large Small Country India India
  • 17. Given the historical data Which one would you bet on….now? Kalyan Chethak Length of legs 110 cm 150 cm Breed C A T/L ratio 0.45 0.60 Gender Male Female Head size Small Large Country Africa India
  • 18. Given the historical data What about best one in this lot? Horse-1 Horse-2 Horse-3 Horse-4 Horse-5 Horse-6 Horse-7 Horse-8 Horse-9 Horse-10 Length of legs 109 114 134 130 149 120 104 117 115 135 Breed C A B A F K L B C A T/L ratio 0.1 0.8 0.5 1.0 0.3 0.3 0.3 0.6 0.7 0.9 Gender Male Female Male Female Male Female Male Female Male Female Head size L S M M L L S M L M Country Africa India Aus NZ Africa Africa India India Aus Africa
  • 19. Citi has a similar problem? Who is going to run away with my money? • Given Historical of the customers we want to predict the probability of bad • We have the data of each customer on • Customer previous loans, customer previous payments, length of account credit history, other credit cards and loans, job type, income band etc., • We want to predict the probability of default
  • 20. Credit Risk Model Building-Four main steps 1. Study historical data • What are the causes(Customer Characteristics) • What are the effects(Charge off) 2. Identify the most impacting factors 3. Find the exact impact of each factor(Quantify it) 4. Use these coefficients for future
  • 21. The Historical Data of Customers • Contains all the information about customers • Contains information across more than 500 variables • Portion of data is present in the application form • Portion of the data is available with bank • Lot of data is maintained in bureau • Social Security number –in US • PAN Number in India Attribute Value SSN 111259005 Age 27 Number of dependents 2 Number of current loans 1 Number of credit cards 1 Number Installments 30days late in last 2 years 4 Average utilization % in last 2 years 30% Time since accounts opened 60 months Number of previous applications for credit card 2 Bankrupt NO
  • 22. The Historical Data of Customers:Types of variables Payment History • Account payment information on specific types of accounts (credit cards, retail accounts, installment loans, finance company accounts, mortgage, etc.) • Presence of adverse public records (bankruptcy, judgements, suits, liens, wage attachments, etc.), collection items, and/or delinquency (past due items) • Severity of delinquency (how long past due) • Amount past due on delinquent accounts or collection items • Time since (recency of) past due items (delinquency), adverse public records (if any), or collection items (if any) • Number of past due items on file • Number of accounts paid as agreed
  • 23. More variables AmountsOwed • Amount owing on accounts • Amount owing on specific types of accounts • Lack of a specific type of balance, in some cases • Number of accounts with balances • Proportion of credit lines used (proportion of balances to total credit limits on certain types of revolving accounts) • Proportion of installment loan amounts still owing (proportion of balance to original loan amount on certain types of installment loans) Length of Credit History • Time since accounts opened • Time since accounts opened, by specific type of account • Time since account activity New Credit • Number of recently opened accounts, and proportion of accounts that are recently opened, by type of account • Number of recent credit inquiries • Time since recent account opening(s), by type of account • Time since credit inquiry(s) • Re-establishment of positive credit history following past payment problems
  • 25. Identifying most impacting factors Out of 500 what are those 20 important attributes
  • 26. Selecting the variables- How do you do that? Num_Enq Good Bad 0 6400 5 1 5800 69 2 5445 134 3 4500 250 4 4070 375 5 3726 470 6 2879 650 7 1893 876 8 1236 987 9 354 1298 36303 5114 Num_companies Good Bad 0 4858 594 1 4291 602 2 4410 577 3 4738 529 4 4506 707 5 4976 796 6 4723 770 7 4407 583 8 4157 825 9 4970 964 46036 6947
  • 27. Bad Rate Num_Enq Good Bad 0 6400 5 1 5800 69 2 5445 134 3 4500 250 4 4070 375 5 3726 470 6 2879 650 7 1893 876 8 1236 987 9 354 1298 36303 5114 Bad Rate 0% 1% 3% 5% 7% 9% 13% 17% 19% 25% Num_companies Good Bad 0 4858 594 1 4291 602 2 4410 577 3 4738 529 4 4506 707 5 4976 796 6 4723 770 7 4407 583 8 4157 825 9 4970 964 46036 6947 Bad Rate 9% 9% 8% 8% 10% 11% 11% 8% 12% 14%
  • 28. Information value %Good %Bad [x] [Y] < 5 1850 150 29% 5% 6.31 0.25 0.80 0.20 5-30 1600 400 25% 12% 2.05 0.13 0.31 0.04 31 - 60 1200 600 19% 18% 1.02 0.00 0.01 0.00 60 - 90 900 900 14% 28% 0.51 (0.14) -0.29 0.04 >= 91 800 1200 13% 37% 0.34 (0.24) -0.47 0.11 Total 6350 3250 0.39 IV Utilizatio n % # of Good # of Bad %Good / %Bad X -Y WOE = Log (X/Y)
  • 29.
  • 31. Model Building logistic regression model to predict the probability of default • Probability of bad = w1(Var1)+w2(Var2) +w3(Var1)……+w20(var20) • Logistic regression gives us those weights • Predicting the probability • Odds of bad=0.13(number of cards)+ 0.21 (utilization)+…….+0.06(number of loan applications) • That’s it ….we are done
  • 32. Credit Risk Model Building-Example Attributes used on the model 1. Age  Application form 2. MonthlyIncome1Application form 3. Dependents  Application form 4. Number of times 30DPD in last two years  Bureau 5. Debt Ratio Bureau /Bank 6. Utilization  Bureau /Bank 7. Number of loans Bureau /Bank
  • 33. Model building and Implementation Data: http://www.kaggle.com/c/GiveMeSomeCredit Model building script: The final Model
  • 34. Actual model Building Steps Objective & Portfolio identification Customer score vs. account score Exclusions Observation point and/or window Performance window Bad definition Segmentation Validation plan and samples Variable Selection Model Building Scoring and Validation Score Implementation
  • 35. Marketing Example Predicting the response probability to a marketing Campaign • Selling Mobile phones – Marketing campaign • Who should we target? • Consider historical data of mobile phone buyers • See their characteristics • Find top impacting characteristics • Find weight of each characteristic • Score new population • Decide on the cut off • Try to sell people who score more than cut off
  • 36. Other Applications of Model Building • Fraud transactions scorecard – Fraud identification based on attributes like transaction amount, place, time, frequency of transactions etc., • Attrition modeling – Predicting employee attrition based on their characteristics