SlideShare une entreprise Scribd logo
1  sur  37
Télécharger pour lire hors ligne
SIMPLE LINEAR
REGRESSION AND
CORRELATION ANALYSIS
Understanding and Calculation
UNDERSTANDING
SIMPLE LINEAR
REGRESSION
Understanding the Concepts Behind It
Simple Linear Regression Analysis
The simple linear regression analysis is
one of the types of linear regression that
focuses on the relationship of TWO
VARIABLES, the other type being
Multiple Linear Regression.
Practice Problem
Let's say you're the researcher of
Pigcawayan National High School
(PNHS), and you were tasked to
predict the amount of students will
be enrolled in PNHS in the next
school year. The problem is that
you only get the amount of
students enrolled in PNHS in the
last 10 years. How can you predict
it?
School Year No. No. of Student Enrolled
1 1340
2 1270
3 1406
4 1004
5 1273
6 1567
7 998
8 1021
9 1705
10 1186
Answer
We can predict the
amount of enrollees in
PNHS in the next school
year by finding the mean
in our data.
School Year No. No. of Student Enrolled
1 1340
2 1270
3 1406
4 1004
5 1273
6 1567
7 998
8 1021
9 1705
10 1186
Mean 1277
1340
1270
1406
1004
1273
1567
998 1021
1705
1186
0
200
400
600
800
1000
1200
1400
1600
1800
0 2 4 6 8 10 12
No. of Students Enrolled in PNHS in the last 10 years
+63 +129
+290
+428
-7 -273
-4
-279 -256 -91
-910
+910
Residuals / Errors
School Year
No.
Error (Error)2
1 +63 3,969
2 -7 49
3 +129 16,641
4 -273 74,529
5 -4 16
6 +290 84,100
7 -279 77,841
8 -256 65,536
9 +428 183,184
10 -91 8,281
Total 514,146
1340
1270
1406
1004
1273
1567
998 1021
1705
1186
0
200
400
600
800
1000
1200
1400
1600
1800
0 2 4 6 8 10 12
No. of Students Enrolled in PNHS
in the last 10 years
+63
-7
+129
-273
-4
+290
-279
-256
+428
-91
Sum of Squared Errors
(SSE)
School Year
No.
Error (Error)2
1 +63 3,969
2 -7 49
3 +129 16,641
4 -273 74,529
5 -4 16
6 +290 84,100
7 -279 77,841
8 -256 65,536
9 +428 183,184
10 -91 8,281
Total 514,146
Sum of Squared Errors
(SSE)
Our sum of squared errors
(SSE) is 514,146, which is too
high. The higher the value of
our SSE, the weaker our model
— the mean — is in predicting
the number of enrollees in
PNHS. To solve this, we need to
create a new line through our
data by introducing an
independent variable, such as
tuition fee.
School Year
No.
Error (Error)2
1 +63 3,969
2 -7 49
3 +129 16,641
4 -273 74,529
5 -4 16
6 +290 84,100
7 -279 77,841
8 -256 65,536
9 +428 183,184
10 -91 8,281
Total 514,146
Sum of Squared Errors
(SSE)
This is the goal of the Simple
Linear Regression, or regression
in general, to make a line — a
regression line — that "fits" our
data better and minimize the
residuals as possible. However in
our example, we don't have an
independent variable, which
makes our model, the mean,
pretty inaccurate to predict the
number of enrollees in PNHS in
the next school year.
1340
1270
1406
1004
1273
1567
998 1021
1705
1186
0
200
400
600
800
1000
1200
1400
1600
1800
0 2 4 6 8 10 12
No. of Students Enrolled in PNHS
in the last 10 years
+63
-7
+129
-273
-4
+290
-279
-256
+428
-91
When working with simple linear regression
with TWO variables, we will determine how
good that line “fits” the data by comparing it
to THIS TYPE: when we pretend that the
second variable — the independent variable
— does not exist, basically the mean of the
dependent variable alone.
If our two-variable linear regression looks like
this in our example, what does the other
variable do to explain the dependent variable?
NOTHING.
Very Important Things to Note
Quick Review
◦ Simple linear regression is really a comparison of two models:
a) One is where the independent variable does not exists.
b) And the other uses the best-fit regression line.
◦ If there is only one variable, the best prediction of other values is the mean of
the dependent variable.
◦ The distance between the best-fit line to the observed value is called the residual
or error.
◦ The residuals are squared and summed to create the Sum of Squared Residuals /
Error (SSE).
◦ The simple linear regression is designed to make a line best fits our data and
minimize the number of SSE.
UNDERSTANDING
CORRELATION
ANALYSIS
Understanding the Concepts Behind It
Correlation Analysis
Correlation Analysis is statistical method that is used to discover if
there is a relationship between two variables/datasets, and how strong
that relationship may be.
It has an upper boundary of +1 and a lower boundary of -1 and its
scale in independent of the scale of the variables themselves.
Correlation Caveats
◦Before going crazy computing correlations, look at the
scatterplot of your data.
◦Correlations is only applicable to LINEAR
relationships.
◦Correlation is NOT Causation.
◦Correlation strength does not necessarily mean the
correlation is statistically significant.
Correlation Coefficients (r)
Value of r Qualitative Interpretation
±1 Perfectly linear relationship
±0.81 to ±0.99 Very strong linear relationship
±0.61 to ±0.80 Strong linear relationship
±0.41 to ±0.60 Moderate linear relationship
±0.21 to ±0.40 Weak linear relationship
±0.01 to ±0.20 Very weak linear relationship
0 No linear relationship
General Correlation Patterns (Linear)
Near +1 Near -1 Near 0
General Correlation Patterns (Linear)
Non-linear Correlation Patterns
CALCULATING
SIMPLE LINEAR
REGRESSION
2+2=6
Do you know this?
𝑦 = 𝑚𝑥 + 𝑏
Slope (rise/run)
Random
variable
Y-intercept
Linear Function
In the world of statistics, the simple linear regression can be
given as:
𝑌 = 𝛽0 + 𝛽1𝑥 + 𝜀
𝑦 = 𝑏 + 𝑚𝑥
𝑦 = 𝑏0 + 𝑏1𝑥
Errors
𝑦 = 𝑚𝑥 + 𝑏
Formula for finding m:
𝑚 =
𝑛 𝑥𝑦 − 𝑥 𝑦
𝑛 𝑥2 − 𝑥 2
Formula for finding b:
𝑏 =
𝑦 − 𝑚 𝑥
𝑛
Least Squares
Method
You realized that your mean
is a bad model to use as a
form of prediction. So, you
decided to go to the
principal's office and you
asked the principal about the
records of the tuition fees in
the past 10 school years.
This is all you have gathered.
School Year
No.
No. of Student
Enrolled
Tuition Fees
(in Php)
1 1,340 1,010
2 1,270 1,240
3 1,406 1,000
4 1,004 1,305
5 1,273 1,205
6 1,567 995
7 998 1,405
8 1,021 1,310
9 1,705 1,005
10 1,186 1,105
X
Y
Given:
𝑋 = 11,580
𝑌 = 12,770
𝑋𝑌 = 14,501,305
𝑋2 = 13,623,950
No. of
Student
Enrolled
Tuition
Fees
(in Php)
XY X2
1,340 1,010 1,353,400 1,020,100
1,270 1,240 1,574,800 1,537,600
1,406 1,000 1,406,000 1,000,000
1,004 1,305 1,310,220 1,703,025
1,273 1,205 1,533,965 1,425,025
1,567 995 1,559,165 990,025
998 1,405 1,402,190 1, 974,025
1,021 1,310 1,337,510 1,716,100
1,705 1,005 1,713,525 1,010,025
1,186 1,105 1,310,530 1,221,025
12,770 11,580 14,501,305 13,623,950
X
Y
Total
𝑚 =
𝑛 𝑥𝑦 − 𝑥 𝑦
𝑛 𝑥2 − 𝑥 2
𝑚 =
10 14,501,305 − 11,580 12,770
10 13,623,950 − 11,580 2
𝑚 =
145,013,050 − 147,876,600
136,239,500 − 134,096,400
𝑚 =
−2,863,550
2,143,100
𝑚 = −1.336
𝑏 =
𝑦 − 𝑚 𝑥
𝑛
𝑏 =
12,770 − −1.336 11,580
10
𝑏 =
12,770 − −15,470.88
10
𝑏 =
28,240.88
10
𝑏 = −2824.088
𝑦 = 𝑚𝑥 + 𝑏
𝑦 = −1.336𝑥 − 2824.088
Or
𝑦 = −2824.088 − 1.336𝑥
No. of Students Enrolled in PNHS vs.
Tuition Fees
Tuition Fees (in Php)
No.
of
Student
Enrolled
CALCULATING
CORRELATION
ANALYSIS
2+2=6
The correlation coefficient can be found by using the formula
based on the Simple Random Sample (SRS):
𝑟 =
𝑆𝑃𝑥𝑦
𝑆𝑆𝑥𝑆𝑆𝑦
Where:
𝑆𝑆𝑥 = 𝑋2
−
𝑋 2
𝑛
𝑆𝑃𝑥𝑦 = 𝑋𝑌 −
𝑋 𝑌
𝑛
𝑆𝑆𝑌 = 𝑌2 −
𝑌 2
𝑛
𝑆𝑃𝑥𝑦 = 𝑋𝑌 −
𝑋 𝑌
𝑛
𝑆𝑃𝑥𝑦 = 14,501,305 −
11,580 12770
10
𝑆𝑃𝑥𝑦 = 14,501,305 −
147,876,600
10
𝑆𝑃𝑥𝑦 = 14,501,305 − 14,787,660
𝑆𝑃𝑥𝑦 = −286,355
𝑆𝑆𝑥 = 𝑋2 −
𝑋 2
𝑛
𝑆𝑆𝑥 = 13,623,950 −
11,588 2
10
𝑆𝑆𝑥 = 13,623,950 −
134,096,400
10
𝑆𝑆𝑥 = 13,623,950 − 13,409,640
𝑆𝑆𝑥 = 214,310
𝑆𝑆𝑌 = 𝑌2 −
𝑌 2
𝑛
𝑆𝑆𝑌 = 16,821,436 −
12,770 2
10
𝑆𝑆𝑌 = 16,821,436 −
163,072,900
10
𝑆𝑆𝑌 = 16,821,436 − 16,307,290
𝑆𝑆𝑌 = 514,146
𝑟 =
𝑆𝑃𝑥𝑦
𝑆𝑆𝑥𝑆𝑆𝑦
𝑟 =
−286,355
214,310 514,146
𝑟 =
−286,355
110,86,629,260
𝑟 =
−286,355
331,943.713
𝑟 = −0.86
Our correlation
coefficient is -0.86, this
means that the number
of enrollees in PNHS
and the tuition fees have
a very strong negative
linear relationship.
Tuition Fees (in Php)
No.
of
Student
Enrolled
THAT'S ALL, THANK
YOU!
I hope you learn something today!

Contenu connexe

Similaire à Simple Linear Regression and Correlation Analysis

Analyzing Relations between Data Set - Part I
Analyzing Relations between Data Set - Part IAnalyzing Relations between Data Set - Part I
Analyzing Relations between Data Set - Part INaseha Sameen
 
Regression analysis
Regression analysisRegression analysis
Regression analysisSrikant001p
 
16 USING LINEAR REGRESSION PREDICTING THE FUTURE16 MEDIA LIBRAR.docx
16 USING LINEAR REGRESSION PREDICTING THE FUTURE16 MEDIA LIBRAR.docx16 USING LINEAR REGRESSION PREDICTING THE FUTURE16 MEDIA LIBRAR.docx
16 USING LINEAR REGRESSION PREDICTING THE FUTURE16 MEDIA LIBRAR.docxhyacinthshackley2629
 
16 USING LINEAR REGRESSION PREDICTING THE FUTURE16 MEDIA LIBRAR.docx
16 USING LINEAR REGRESSION PREDICTING THE FUTURE16 MEDIA LIBRAR.docx16 USING LINEAR REGRESSION PREDICTING THE FUTURE16 MEDIA LIBRAR.docx
16 USING LINEAR REGRESSION PREDICTING THE FUTURE16 MEDIA LIBRAR.docxnovabroom
 
Correlation biostatistics
Correlation biostatisticsCorrelation biostatistics
Correlation biostatisticsLekhan Lodhi
 
Measures and Strengths of AssociationRemember that while w.docx
Measures and Strengths of AssociationRemember that while w.docxMeasures and Strengths of AssociationRemember that while w.docx
Measures and Strengths of AssociationRemember that while w.docxARIV4
 
Correlation_and_Regression-3.ppt
Correlation_and_Regression-3.pptCorrelation_and_Regression-3.ppt
Correlation_and_Regression-3.pptRidaIrfan10
 
Multiple Linear Regression.pptx
Multiple Linear Regression.pptxMultiple Linear Regression.pptx
Multiple Linear Regression.pptxBHUSHANKPATEL
 
Empirical Finance, Jordan Stone- Linkedin
Empirical Finance, Jordan Stone- LinkedinEmpirical Finance, Jordan Stone- Linkedin
Empirical Finance, Jordan Stone- LinkedinJordan Stone
 
CreditCardDefaultModel
CreditCardDefaultModelCreditCardDefaultModel
CreditCardDefaultModelAndrew Rogala
 
Ibmathstudiesinternalassessmentfinaldraft 101208070253-phpapp02
Ibmathstudiesinternalassessmentfinaldraft 101208070253-phpapp02Ibmathstudiesinternalassessmentfinaldraft 101208070253-phpapp02
Ibmathstudiesinternalassessmentfinaldraft 101208070253-phpapp02Travis Hayes
 
Individualized-Data-Report_Sample
Individualized-Data-Report_SampleIndividualized-Data-Report_Sample
Individualized-Data-Report_SampleLisa Martinez
 
Linear functions and modeling
Linear functions and modelingLinear functions and modeling
Linear functions and modelingIVY SOLIS
 

Similaire à Simple Linear Regression and Correlation Analysis (20)

Summary measures
Summary measuresSummary measures
Summary measures
 
Day 3 descriptive statistics
Day 3  descriptive statisticsDay 3  descriptive statistics
Day 3 descriptive statistics
 
Analyzing Relations between Data Set - Part I
Analyzing Relations between Data Set - Part IAnalyzing Relations between Data Set - Part I
Analyzing Relations between Data Set - Part I
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 
16 USING LINEAR REGRESSION PREDICTING THE FUTURE16 MEDIA LIBRAR.docx
16 USING LINEAR REGRESSION PREDICTING THE FUTURE16 MEDIA LIBRAR.docx16 USING LINEAR REGRESSION PREDICTING THE FUTURE16 MEDIA LIBRAR.docx
16 USING LINEAR REGRESSION PREDICTING THE FUTURE16 MEDIA LIBRAR.docx
 
16 USING LINEAR REGRESSION PREDICTING THE FUTURE16 MEDIA LIBRAR.docx
16 USING LINEAR REGRESSION PREDICTING THE FUTURE16 MEDIA LIBRAR.docx16 USING LINEAR REGRESSION PREDICTING THE FUTURE16 MEDIA LIBRAR.docx
16 USING LINEAR REGRESSION PREDICTING THE FUTURE16 MEDIA LIBRAR.docx
 
Correlation biostatistics
Correlation biostatisticsCorrelation biostatistics
Correlation biostatistics
 
Correlating test scores
Correlating test scoresCorrelating test scores
Correlating test scores
 
Correlation.pptx
Correlation.pptxCorrelation.pptx
Correlation.pptx
 
Measures and Strengths of AssociationRemember that while w.docx
Measures and Strengths of AssociationRemember that while w.docxMeasures and Strengths of AssociationRemember that while w.docx
Measures and Strengths of AssociationRemember that while w.docx
 
Correlation_and_Regression-3.ppt
Correlation_and_Regression-3.pptCorrelation_and_Regression-3.ppt
Correlation_and_Regression-3.ppt
 
Multiple Linear Regression.pptx
Multiple Linear Regression.pptxMultiple Linear Regression.pptx
Multiple Linear Regression.pptx
 
Empirical Finance, Jordan Stone- Linkedin
Empirical Finance, Jordan Stone- LinkedinEmpirical Finance, Jordan Stone- Linkedin
Empirical Finance, Jordan Stone- Linkedin
 
CreditCardDefaultModel
CreditCardDefaultModelCreditCardDefaultModel
CreditCardDefaultModel
 
Ibmathstudiesinternalassessmentfinaldraft 101208070253-phpapp02
Ibmathstudiesinternalassessmentfinaldraft 101208070253-phpapp02Ibmathstudiesinternalassessmentfinaldraft 101208070253-phpapp02
Ibmathstudiesinternalassessmentfinaldraft 101208070253-phpapp02
 
data analysis
data analysisdata analysis
data analysis
 
Individualized-Data-Report_Sample
Individualized-Data-Report_SampleIndividualized-Data-Report_Sample
Individualized-Data-Report_Sample
 
Statistics For Entrepreneurs
Statistics For  EntrepreneursStatistics For  Entrepreneurs
Statistics For Entrepreneurs
 
Linear functions and modeling
Linear functions and modelingLinear functions and modeling
Linear functions and modeling
 
Correlation continued
Correlation continuedCorrelation continued
Correlation continued
 

Dernier

Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...RKavithamani
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 

Dernier (20)

Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 

Simple Linear Regression and Correlation Analysis

  • 1. SIMPLE LINEAR REGRESSION AND CORRELATION ANALYSIS Understanding and Calculation
  • 3. Simple Linear Regression Analysis The simple linear regression analysis is one of the types of linear regression that focuses on the relationship of TWO VARIABLES, the other type being Multiple Linear Regression.
  • 4. Practice Problem Let's say you're the researcher of Pigcawayan National High School (PNHS), and you were tasked to predict the amount of students will be enrolled in PNHS in the next school year. The problem is that you only get the amount of students enrolled in PNHS in the last 10 years. How can you predict it? School Year No. No. of Student Enrolled 1 1340 2 1270 3 1406 4 1004 5 1273 6 1567 7 998 8 1021 9 1705 10 1186
  • 5. Answer We can predict the amount of enrollees in PNHS in the next school year by finding the mean in our data. School Year No. No. of Student Enrolled 1 1340 2 1270 3 1406 4 1004 5 1273 6 1567 7 998 8 1021 9 1705 10 1186 Mean 1277
  • 6. 1340 1270 1406 1004 1273 1567 998 1021 1705 1186 0 200 400 600 800 1000 1200 1400 1600 1800 0 2 4 6 8 10 12 No. of Students Enrolled in PNHS in the last 10 years +63 +129 +290 +428 -7 -273 -4 -279 -256 -91 -910 +910 Residuals / Errors
  • 7. School Year No. Error (Error)2 1 +63 3,969 2 -7 49 3 +129 16,641 4 -273 74,529 5 -4 16 6 +290 84,100 7 -279 77,841 8 -256 65,536 9 +428 183,184 10 -91 8,281 Total 514,146 1340 1270 1406 1004 1273 1567 998 1021 1705 1186 0 200 400 600 800 1000 1200 1400 1600 1800 0 2 4 6 8 10 12 No. of Students Enrolled in PNHS in the last 10 years +63 -7 +129 -273 -4 +290 -279 -256 +428 -91 Sum of Squared Errors (SSE)
  • 8. School Year No. Error (Error)2 1 +63 3,969 2 -7 49 3 +129 16,641 4 -273 74,529 5 -4 16 6 +290 84,100 7 -279 77,841 8 -256 65,536 9 +428 183,184 10 -91 8,281 Total 514,146 Sum of Squared Errors (SSE) Our sum of squared errors (SSE) is 514,146, which is too high. The higher the value of our SSE, the weaker our model — the mean — is in predicting the number of enrollees in PNHS. To solve this, we need to create a new line through our data by introducing an independent variable, such as tuition fee.
  • 9. School Year No. Error (Error)2 1 +63 3,969 2 -7 49 3 +129 16,641 4 -273 74,529 5 -4 16 6 +290 84,100 7 -279 77,841 8 -256 65,536 9 +428 183,184 10 -91 8,281 Total 514,146 Sum of Squared Errors (SSE) This is the goal of the Simple Linear Regression, or regression in general, to make a line — a regression line — that "fits" our data better and minimize the residuals as possible. However in our example, we don't have an independent variable, which makes our model, the mean, pretty inaccurate to predict the number of enrollees in PNHS in the next school year.
  • 10. 1340 1270 1406 1004 1273 1567 998 1021 1705 1186 0 200 400 600 800 1000 1200 1400 1600 1800 0 2 4 6 8 10 12 No. of Students Enrolled in PNHS in the last 10 years +63 -7 +129 -273 -4 +290 -279 -256 +428 -91 When working with simple linear regression with TWO variables, we will determine how good that line “fits” the data by comparing it to THIS TYPE: when we pretend that the second variable — the independent variable — does not exist, basically the mean of the dependent variable alone. If our two-variable linear regression looks like this in our example, what does the other variable do to explain the dependent variable? NOTHING. Very Important Things to Note
  • 11. Quick Review ◦ Simple linear regression is really a comparison of two models: a) One is where the independent variable does not exists. b) And the other uses the best-fit regression line. ◦ If there is only one variable, the best prediction of other values is the mean of the dependent variable. ◦ The distance between the best-fit line to the observed value is called the residual or error. ◦ The residuals are squared and summed to create the Sum of Squared Residuals / Error (SSE). ◦ The simple linear regression is designed to make a line best fits our data and minimize the number of SSE.
  • 13. Correlation Analysis Correlation Analysis is statistical method that is used to discover if there is a relationship between two variables/datasets, and how strong that relationship may be. It has an upper boundary of +1 and a lower boundary of -1 and its scale in independent of the scale of the variables themselves.
  • 14. Correlation Caveats ◦Before going crazy computing correlations, look at the scatterplot of your data. ◦Correlations is only applicable to LINEAR relationships. ◦Correlation is NOT Causation. ◦Correlation strength does not necessarily mean the correlation is statistically significant.
  • 15.
  • 16. Correlation Coefficients (r) Value of r Qualitative Interpretation ±1 Perfectly linear relationship ±0.81 to ±0.99 Very strong linear relationship ±0.61 to ±0.80 Strong linear relationship ±0.41 to ±0.60 Moderate linear relationship ±0.21 to ±0.40 Weak linear relationship ±0.01 to ±0.20 Very weak linear relationship 0 No linear relationship
  • 17. General Correlation Patterns (Linear) Near +1 Near -1 Near 0
  • 21. Do you know this? 𝑦 = 𝑚𝑥 + 𝑏 Slope (rise/run) Random variable Y-intercept Linear Function
  • 22. In the world of statistics, the simple linear regression can be given as: 𝑌 = 𝛽0 + 𝛽1𝑥 + 𝜀 𝑦 = 𝑏 + 𝑚𝑥 𝑦 = 𝑏0 + 𝑏1𝑥 Errors
  • 23. 𝑦 = 𝑚𝑥 + 𝑏 Formula for finding m: 𝑚 = 𝑛 𝑥𝑦 − 𝑥 𝑦 𝑛 𝑥2 − 𝑥 2 Formula for finding b: 𝑏 = 𝑦 − 𝑚 𝑥 𝑛 Least Squares Method
  • 24. You realized that your mean is a bad model to use as a form of prediction. So, you decided to go to the principal's office and you asked the principal about the records of the tuition fees in the past 10 school years. This is all you have gathered. School Year No. No. of Student Enrolled Tuition Fees (in Php) 1 1,340 1,010 2 1,270 1,240 3 1,406 1,000 4 1,004 1,305 5 1,273 1,205 6 1,567 995 7 998 1,405 8 1,021 1,310 9 1,705 1,005 10 1,186 1,105 X Y
  • 25. Given: 𝑋 = 11,580 𝑌 = 12,770 𝑋𝑌 = 14,501,305 𝑋2 = 13,623,950 No. of Student Enrolled Tuition Fees (in Php) XY X2 1,340 1,010 1,353,400 1,020,100 1,270 1,240 1,574,800 1,537,600 1,406 1,000 1,406,000 1,000,000 1,004 1,305 1,310,220 1,703,025 1,273 1,205 1,533,965 1,425,025 1,567 995 1,559,165 990,025 998 1,405 1,402,190 1, 974,025 1,021 1,310 1,337,510 1,716,100 1,705 1,005 1,713,525 1,010,025 1,186 1,105 1,310,530 1,221,025 12,770 11,580 14,501,305 13,623,950 X Y Total
  • 26. 𝑚 = 𝑛 𝑥𝑦 − 𝑥 𝑦 𝑛 𝑥2 − 𝑥 2 𝑚 = 10 14,501,305 − 11,580 12,770 10 13,623,950 − 11,580 2 𝑚 = 145,013,050 − 147,876,600 136,239,500 − 134,096,400 𝑚 = −2,863,550 2,143,100 𝑚 = −1.336
  • 27. 𝑏 = 𝑦 − 𝑚 𝑥 𝑛 𝑏 = 12,770 − −1.336 11,580 10 𝑏 = 12,770 − −15,470.88 10 𝑏 = 28,240.88 10 𝑏 = −2824.088
  • 28. 𝑦 = 𝑚𝑥 + 𝑏 𝑦 = −1.336𝑥 − 2824.088 Or 𝑦 = −2824.088 − 1.336𝑥
  • 29. No. of Students Enrolled in PNHS vs. Tuition Fees Tuition Fees (in Php) No. of Student Enrolled
  • 31. The correlation coefficient can be found by using the formula based on the Simple Random Sample (SRS): 𝑟 = 𝑆𝑃𝑥𝑦 𝑆𝑆𝑥𝑆𝑆𝑦 Where: 𝑆𝑆𝑥 = 𝑋2 − 𝑋 2 𝑛 𝑆𝑃𝑥𝑦 = 𝑋𝑌 − 𝑋 𝑌 𝑛 𝑆𝑆𝑌 = 𝑌2 − 𝑌 2 𝑛
  • 32. 𝑆𝑃𝑥𝑦 = 𝑋𝑌 − 𝑋 𝑌 𝑛 𝑆𝑃𝑥𝑦 = 14,501,305 − 11,580 12770 10 𝑆𝑃𝑥𝑦 = 14,501,305 − 147,876,600 10 𝑆𝑃𝑥𝑦 = 14,501,305 − 14,787,660 𝑆𝑃𝑥𝑦 = −286,355
  • 33. 𝑆𝑆𝑥 = 𝑋2 − 𝑋 2 𝑛 𝑆𝑆𝑥 = 13,623,950 − 11,588 2 10 𝑆𝑆𝑥 = 13,623,950 − 134,096,400 10 𝑆𝑆𝑥 = 13,623,950 − 13,409,640 𝑆𝑆𝑥 = 214,310
  • 34. 𝑆𝑆𝑌 = 𝑌2 − 𝑌 2 𝑛 𝑆𝑆𝑌 = 16,821,436 − 12,770 2 10 𝑆𝑆𝑌 = 16,821,436 − 163,072,900 10 𝑆𝑆𝑌 = 16,821,436 − 16,307,290 𝑆𝑆𝑌 = 514,146
  • 35. 𝑟 = 𝑆𝑃𝑥𝑦 𝑆𝑆𝑥𝑆𝑆𝑦 𝑟 = −286,355 214,310 514,146 𝑟 = −286,355 110,86,629,260 𝑟 = −286,355 331,943.713 𝑟 = −0.86
  • 36. Our correlation coefficient is -0.86, this means that the number of enrollees in PNHS and the tuition fees have a very strong negative linear relationship. Tuition Fees (in Php) No. of Student Enrolled
  • 37. THAT'S ALL, THANK YOU! I hope you learn something today!