SlideShare une entreprise Scribd logo
1  sur  26
Télécharger pour lire hors ligne
Regression Basic
Maximum Likelihood Estimator(MLE)
Jinseob Kim
July 15, 2015
Jinseob Kim Regression Basic July 15, 2015 1 / 26
Introduction
Contents
1 Introduction
2 Regression Review
Basic linear regression
MLE에서 주요 지표
3 Logistic Regression
4 Poisson Regression
Jinseob Kim Regression Basic July 15, 2015 2 / 26
Introduction
생각의 단순화
Y 가 연속형?
1 진짜 연속 VS 가짜연속(Count data)
2 연속: 정규분포!!!!!! → 일반 회귀분석
3 Count: 발생 수, 자녀 수 etc.. : 포아송, 감마, 음이항 등
Y 가 범주형?
1 2범주 VS 3범주이상
2 2범주 : 로지스틱
3 3범주이상 : 프로빗 등..
Y 가 독립이 아님?: 반복측정, 멀티레벨 등.. 본강의에선 제외
Jinseob Kim Regression Basic July 15, 2015 3 / 26
Introduction
단변량 VS 다변량
단변량(univariate) VS 다변량(multivariate)
1 Association 얼마나 있느냐
1 다른 것의 효과를 보정한 후에도 Association이 있는가?
Jinseob Kim Regression Basic July 15, 2015 4 / 26
Regression Review
Contents
1 Introduction
2 Regression Review
Basic linear regression
MLE에서 주요 지표
3 Logistic Regression
4 Poisson Regression
Jinseob Kim Regression Basic July 15, 2015 5 / 26
Regression Review Basic linear regression
Remind
β estimation in linear regression
1 Ordinary Least Square(OLS): semi-parametric
2 Maximum Likelihood Estimator(MLE): parametric
대부분의 회귀분석에서 추정원칙
Jinseob Kim Regression Basic July 15, 2015 6 / 26
Regression Review Basic linear regression
Least Square(최소제곱법)
제곱합을 최소로: y 정규성에 대한 가정 필요없다.
Figure: OLS Fitting
Jinseob Kim Regression Basic July 15, 2015 7 / 26
Regression Review Basic linear regression
Likelihood??
가능도(likelihood) VS 확률(probability)
Discrete: 가능도 = 확률 - 주사위 던져 1나올 확률은 1
6
Continuous: 가능도 != 확률 - 0∼1 에서 숫자 하나 뽑았을 때 0.7일
확률은 0...
Jinseob Kim Regression Basic July 15, 2015 8 / 26
Regression Review Basic linear regression
Maximum likelihood estimator(MLE)
최대가능도추정량: 1, · · · , n이 서로 독립이라하자.
1 각각의 가능도를 구한다.
2 가능도를 전부 곱하면 전체 사건의 가능도 (독립이니까)
3 가능도를 최대로 하는 β를 구한다.
즉, 정규분포 가정 하에 갖고 있는 데이터가 나올 가능성을 최대로 하는 β
를 구한다.
Jinseob Kim Regression Basic July 15, 2015 9 / 26
Regression Review Basic linear regression
MLE: 최대가능도추정량
데이터가 일어날 가능성을 최대로: y또는 분포가정필요.
최소제곱추정량과 동일
Jinseob Kim Regression Basic July 15, 2015 10 / 26
Regression Review MLE에서 주요 지표
LRT? Ward? score?
Likelihood Ratio Test VS Ward test VS score test
1 통계적 유의성 판단하는 방법들.
2 가능도비교 VS 베타값비교 VS 기울기비교/
Jinseob Kim Regression Basic July 15, 2015 11 / 26
Regression Review MLE에서 주요 지표
비교
Figure: Comparison
Jinseob Kim Regression Basic July 15, 2015 12 / 26
Regression Review MLE에서 주요 지표
AIC
우리가 구한 모형의 가능도를 L이라 하면.
1 AIC = −2 × log(L) + 2 × k
2 k: 설명변수의 갯수(성별, 나이, 연봉...)
3 작을수록 좋은 모형!!!
가능도가 큰 모형을 고르겠지만.. 설명변수 너무 많으면 페널티!!!
Jinseob Kim Regression Basic July 15, 2015 13 / 26
Regression Review MLE에서 주요 지표
Examples
## Loading required package: splines
##
## Call:
## glm(formula = nonacc ~ meanpm10 + meanso2 + meanno2 + meanco +
## maxo3 + meantemp + meanhumi + meanpress, data = mort)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -81.089 -15.398 -4.053 11.979 117.643
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 782.542027 37.581693 20.822 < 2e-16 ***
## meanpm10 0.047872 0.008661 5.528 3.29e-08 ***
## meanso2 -2.036635 0.087065 -23.392 < 2e-16 ***
## meanno2 1.758609 0.021575 81.513 < 2e-16 ***
## meanco -2.844671 0.078291 -36.335 < 2e-16 ***
## maxo3 -0.252572 0.013237 -19.081 < 2e-16 ***
## meantemp -0.373984 0.032410 -11.539 < 2e-16 ***
## meanhumi -0.202591 0.014763 -13.723 < 2e-16 ***
## meanpress -0.725117 0.036518 -19.856 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 530.2808)
##
## Null deviance: 13933228 on 17354 degrees of freedom
## Residual deviance: 9198250 on 17346 degrees of freedom
## (207 observations deleted due to missingness)
## AIC: 158137
## Jinseob Kim Regression Basic July 15, 2015 14 / 26
Logistic Regression
Contents
1 Introduction
2 Regression Review
Basic linear regression
MLE에서 주요 지표
3 Logistic Regression
4 Poisson Regression
Jinseob Kim Regression Basic July 15, 2015 15 / 26
Logistic Regression
Logistic function: MLE
Case-control study: Y 가 0 or 1
Figure: Fitting Logistic Function
Jinseob Kim Regression Basic July 15, 2015 16 / 26
Logistic Regression
Model
Log(
pi
1 − pi
) = β0 + β1 · xi1
pi = P(Yi = 1) =
exp(β0 + β1 · xi1)
1 + exp(β0 + β1 · xi1)
P(Yi = 0) =
1
1 + exp(β0 + β1 · xi1)
P(Yi = yi ) = (
exp(β0 + β1 · xi1)
1 + exp(β0 + β1 · xi1)
)yi
(
1
1 + exp(β0 + β1 · xi1)
)1−yi
Jinseob Kim Regression Basic July 15, 2015 17 / 26
Logistic Regression
Likelihood
Likelihood=
n
i=1
P(Yi = yi ) =
n
i=1
(
exp(β0 + β1 · xi1)
1 + exp(β0 + β1 · xi1)
)yi
(
1
1 + exp(β0 + β1 · xi1)
)1−yi
개인별로 가능도(데이터의 상황이 나올 확률)이 나온다.
그것들을 다 곱하면 Likelihood
이것을 최소로 하는 β를 구하는 것.
Case나 Control이나 따로따로 Likelihood를 구한다.
Jinseob Kim Regression Basic July 15, 2015 18 / 26
Logistic Regression
해석
Log(
pi
1 − pi
) = β0 + β1 · xi1
x1이 증가할수록 Log( p
1−p )이 β1만큼 증가한다.
p
1−p 이 exp(β1)배가 된다.
Odds Ratio = exp(β1)
Jinseob Kim Regression Basic July 15, 2015 19 / 26
Poisson Regression
Contents
1 Introduction
2 Regression Review
Basic linear regression
MLE에서 주요 지표
3 Logistic Regression
4 Poisson Regression
Jinseob Kim Regression Basic July 15, 2015 20 / 26
Poisson Regression
가짜 연속: 정규분포 쓸 수 있는가?
발생, 사망수 : 자연수
확률이 어느정도 된다면 이벤트라면 그냥 정규분포 가정해도 무방.
드문 이벤트라면??
Jinseob Kim Regression Basic July 15, 2015 21 / 26
Poisson Regression
이항분포 & 정규분포 & 포아송분포
이항분포: 발생확률 p인 일을 n번 수행.
정규분포: n이 무한대일 때 이항분포
포아송분포: n → ∞, p → 0, 또는 np → λ 라면
n!
(n − k)!k!
pk
(1 − p)n−k
→ e−λ λk
k!
Jinseob Kim Regression Basic July 15, 2015 22 / 26
Poisson Regression
Figure: Poisson distribution
Jinseob Kim Regression Basic July 15, 2015 23 / 26
Poisson Regression
Poisson regression model
log(E(Y | x)) = α + β x
즉, log(평균발생)과 선형관계가 있다. 또는
E(Y | x) = eα+β x
Log(RR) = exp(β)
MLE추정은 https://en.wikipedia.org/wiki/Poisson_regression
참조
Jinseob Kim Regression Basic July 15, 2015 24 / 26
Poisson Regression
Examples
##
## Call:
## glm(formula = nonacc ~ meanpm10 + meanso2 + meanno2 + meanco +
## maxo3 + meantemp + meanhumi + meanpress, family = poisson,
## data = mort)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -16.7854 -2.7891 -0.8845 1.6723 16.4589
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.160e+01 2.670e-01 80.90 <2e-16 ***
## meanpm10 1.318e-03 5.916e-05 22.28 <2e-16 ***
## meanso2 -4.389e-02 6.286e-04 -69.82 <2e-16 ***
## meanno2 4.168e-02 1.379e-04 302.27 <2e-16 ***
## meanco -8.808e-02 6.387e-04 -137.90 <2e-16 ***
## maxo3 -6.130e-03 9.672e-05 -63.38 <2e-16 ***
## meantemp -1.170e-02 2.388e-04 -48.99 <2e-16 ***
## meanhumi -4.194e-03 1.042e-04 -40.25 <2e-16 ***
## meanpress -1.748e-02 2.596e-04 -67.36 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 328676 on 17354 degrees of freedom
## Residual deviance: 210104 on 17346 degrees of freedom
## (207 observations deleted due to missingness)
## AIC: 300091
##
## Number of Fisher Scoring iterations: 5Jinseob Kim Regression Basic July 15, 2015 25 / 26
Poisson Regression
END
Email : secondmath85@gmail.com
Office: (02)880-2473
H.P: 010-9192-5385
Jinseob Kim Regression Basic July 15, 2015 26 / 26

Contenu connexe

En vedette

Multi-Objective Evolutionary Algorithms
Multi-Objective Evolutionary AlgorithmsMulti-Objective Evolutionary Algorithms
Multi-Objective Evolutionary AlgorithmsSong Gao
 
Maximum Likelihood Estimation
Maximum Likelihood EstimationMaximum Likelihood Estimation
Maximum Likelihood Estimationguestfee8698
 
Metode maximum likelihood
Metode maximum likelihoodMetode maximum likelihood
Metode maximum likelihoodririn12
 
Estimators for structural equation models of Likert scale data
Estimators for structural equation models of Likert scale dataEstimators for structural equation models of Likert scale data
Estimators for structural equation models of Likert scale dataNick Stauner
 
Pareto optimal
Pareto optimal    Pareto optimal
Pareto optimal rmpas
 
Chapter 3 maximum likelihood and bayesian estimation-fix
Chapter 3   maximum likelihood and bayesian estimation-fixChapter 3   maximum likelihood and bayesian estimation-fix
Chapter 3 maximum likelihood and bayesian estimation-fixjelli123
 
Chapter 4 likelihood
Chapter 4 likelihoodChapter 4 likelihood
Chapter 4 likelihoodNBER
 
Multiobjective optimization and trade offs using pareto optimality
Multiobjective optimization and trade offs using pareto optimalityMultiobjective optimization and trade offs using pareto optimality
Multiobjective optimization and trade offs using pareto optimalityAmogh Mundhekar
 
Multi Objective Optimization
Multi Objective OptimizationMulti Objective Optimization
Multi Objective OptimizationNawroz University
 
Intro to Classification: Logistic Regression & SVM
Intro to Classification: Logistic Regression & SVMIntro to Classification: Logistic Regression & SVM
Intro to Classification: Logistic Regression & SVMNYC Predictive Analytics
 
The Method Of Maximum Likelihood
The Method Of Maximum LikelihoodThe Method Of Maximum Likelihood
The Method Of Maximum LikelihoodMax Chipulu
 

En vedette (13)

Multi-Objective Evolutionary Algorithms
Multi-Objective Evolutionary AlgorithmsMulti-Objective Evolutionary Algorithms
Multi-Objective Evolutionary Algorithms
 
Maximum Likelihood Estimation
Maximum Likelihood EstimationMaximum Likelihood Estimation
Maximum Likelihood Estimation
 
Metode maximum likelihood
Metode maximum likelihoodMetode maximum likelihood
Metode maximum likelihood
 
Estimators for structural equation models of Likert scale data
Estimators for structural equation models of Likert scale dataEstimators for structural equation models of Likert scale data
Estimators for structural equation models of Likert scale data
 
Pareto optimal
Pareto optimal    Pareto optimal
Pareto optimal
 
Chapter 3 maximum likelihood and bayesian estimation-fix
Chapter 3   maximum likelihood and bayesian estimation-fixChapter 3   maximum likelihood and bayesian estimation-fix
Chapter 3 maximum likelihood and bayesian estimation-fix
 
Chapter 4 likelihood
Chapter 4 likelihoodChapter 4 likelihood
Chapter 4 likelihood
 
Estimation
EstimationEstimation
Estimation
 
Multiobjective optimization and trade offs using pareto optimality
Multiobjective optimization and trade offs using pareto optimalityMultiobjective optimization and trade offs using pareto optimality
Multiobjective optimization and trade offs using pareto optimality
 
Multi Objective Optimization
Multi Objective OptimizationMulti Objective Optimization
Multi Objective Optimization
 
Intro to Classification: Logistic Regression & SVM
Intro to Classification: Logistic Regression & SVMIntro to Classification: Logistic Regression & SVM
Intro to Classification: Logistic Regression & SVM
 
Point Estimation
Point EstimationPoint Estimation
Point Estimation
 
The Method Of Maximum Likelihood
The Method Of Maximum LikelihoodThe Method Of Maximum Likelihood
The Method Of Maximum Likelihood
 

Plus de Jinseob Kim

Unsupervised Deep Learning Applied to Breast Density Segmentation and Mammogr...
Unsupervised Deep Learning Applied to Breast Density Segmentation and Mammogr...Unsupervised Deep Learning Applied to Breast Density Segmentation and Mammogr...
Unsupervised Deep Learning Applied to Breast Density Segmentation and Mammogr...Jinseob Kim
 
Fst, selection index
Fst, selection indexFst, selection index
Fst, selection indexJinseob Kim
 
Why Does Deep and Cheap Learning Work So Well
Why Does Deep and Cheap Learning Work So WellWhy Does Deep and Cheap Learning Work So Well
Why Does Deep and Cheap Learning Work So WellJinseob Kim
 
괴델(Godel)의 불완전성 정리 증명의 이해.
괴델(Godel)의 불완전성 정리 증명의 이해.괴델(Godel)의 불완전성 정리 증명의 이해.
괴델(Godel)의 불완전성 정리 증명의 이해.Jinseob Kim
 
New Epidemiologic Measures in Multilevel Study: Median Risk Ratio, Median Haz...
New Epidemiologic Measures in Multilevel Study: Median Risk Ratio, Median Haz...New Epidemiologic Measures in Multilevel Study: Median Risk Ratio, Median Haz...
New Epidemiologic Measures in Multilevel Study: Median Risk Ratio, Median Haz...Jinseob Kim
 
가설검정의 심리학
가설검정의 심리학 가설검정의 심리학
가설검정의 심리학 Jinseob Kim
 
Win Above Replacement in Sabermetrics
Win Above Replacement in SabermetricsWin Above Replacement in Sabermetrics
Win Above Replacement in SabermetricsJinseob Kim
 
iHS calculation in R
iHS calculation in RiHS calculation in R
iHS calculation in RJinseob Kim
 
Selection index population_genetics
Selection index population_geneticsSelection index population_genetics
Selection index population_geneticsJinseob Kim
 
질병부담계산: Dismod mr gbd2010
질병부담계산: Dismod mr gbd2010질병부담계산: Dismod mr gbd2010
질병부담계산: Dismod mr gbd2010Jinseob Kim
 
Case-crossover study
Case-crossover studyCase-crossover study
Case-crossover studyJinseob Kim
 
Generalized Additive Model
Generalized Additive Model Generalized Additive Model
Generalized Additive Model Jinseob Kim
 
Deep Learning by JSKIM (Korean)
Deep Learning by JSKIM (Korean)Deep Learning by JSKIM (Korean)
Deep Learning by JSKIM (Korean)Jinseob Kim
 
Machine Learning Introduction
Machine Learning IntroductionMachine Learning Introduction
Machine Learning IntroductionJinseob Kim
 
Deep learning by JSKIM
Deep learning by JSKIMDeep learning by JSKIM
Deep learning by JSKIMJinseob Kim
 
Multilevel study
Multilevel study Multilevel study
Multilevel study Jinseob Kim
 

Plus de Jinseob Kim (20)

Unsupervised Deep Learning Applied to Breast Density Segmentation and Mammogr...
Unsupervised Deep Learning Applied to Breast Density Segmentation and Mammogr...Unsupervised Deep Learning Applied to Breast Density Segmentation and Mammogr...
Unsupervised Deep Learning Applied to Breast Density Segmentation and Mammogr...
 
Fst, selection index
Fst, selection indexFst, selection index
Fst, selection index
 
Why Does Deep and Cheap Learning Work So Well
Why Does Deep and Cheap Learning Work So WellWhy Does Deep and Cheap Learning Work So Well
Why Does Deep and Cheap Learning Work So Well
 
괴델(Godel)의 불완전성 정리 증명의 이해.
괴델(Godel)의 불완전성 정리 증명의 이해.괴델(Godel)의 불완전성 정리 증명의 이해.
괴델(Godel)의 불완전성 정리 증명의 이해.
 
New Epidemiologic Measures in Multilevel Study: Median Risk Ratio, Median Haz...
New Epidemiologic Measures in Multilevel Study: Median Risk Ratio, Median Haz...New Epidemiologic Measures in Multilevel Study: Median Risk Ratio, Median Haz...
New Epidemiologic Measures in Multilevel Study: Median Risk Ratio, Median Haz...
 
가설검정의 심리학
가설검정의 심리학 가설검정의 심리학
가설검정의 심리학
 
Win Above Replacement in Sabermetrics
Win Above Replacement in SabermetricsWin Above Replacement in Sabermetrics
Win Above Replacement in Sabermetrics
 
iHS calculation in R
iHS calculation in RiHS calculation in R
iHS calculation in R
 
Fst in R
Fst in R Fst in R
Fst in R
 
Selection index population_genetics
Selection index population_geneticsSelection index population_genetics
Selection index population_genetics
 
질병부담계산: Dismod mr gbd2010
질병부담계산: Dismod mr gbd2010질병부담계산: Dismod mr gbd2010
질병부담계산: Dismod mr gbd2010
 
DALY & QALY
DALY & QALYDALY & QALY
DALY & QALY
 
Case-crossover study
Case-crossover studyCase-crossover study
Case-crossover study
 
Generalized Additive Model
Generalized Additive Model Generalized Additive Model
Generalized Additive Model
 
Deep Learning by JSKIM (Korean)
Deep Learning by JSKIM (Korean)Deep Learning by JSKIM (Korean)
Deep Learning by JSKIM (Korean)
 
Machine Learning Introduction
Machine Learning IntroductionMachine Learning Introduction
Machine Learning Introduction
 
Tree advanced
Tree advancedTree advanced
Tree advanced
 
Deep learning by JSKIM
Deep learning by JSKIMDeep learning by JSKIM
Deep learning by JSKIM
 
Main result
Main result Main result
Main result
 
Multilevel study
Multilevel study Multilevel study
Multilevel study
 

Regression Basic : MLE

  • 1. Regression Basic Maximum Likelihood Estimator(MLE) Jinseob Kim July 15, 2015 Jinseob Kim Regression Basic July 15, 2015 1 / 26
  • 2. Introduction Contents 1 Introduction 2 Regression Review Basic linear regression MLE에서 주요 지표 3 Logistic Regression 4 Poisson Regression Jinseob Kim Regression Basic July 15, 2015 2 / 26
  • 3. Introduction 생각의 단순화 Y 가 연속형? 1 진짜 연속 VS 가짜연속(Count data) 2 연속: 정규분포!!!!!! → 일반 회귀분석 3 Count: 발생 수, 자녀 수 etc.. : 포아송, 감마, 음이항 등 Y 가 범주형? 1 2범주 VS 3범주이상 2 2범주 : 로지스틱 3 3범주이상 : 프로빗 등.. Y 가 독립이 아님?: 반복측정, 멀티레벨 등.. 본강의에선 제외 Jinseob Kim Regression Basic July 15, 2015 3 / 26
  • 4. Introduction 단변량 VS 다변량 단변량(univariate) VS 다변량(multivariate) 1 Association 얼마나 있느냐 1 다른 것의 효과를 보정한 후에도 Association이 있는가? Jinseob Kim Regression Basic July 15, 2015 4 / 26
  • 5. Regression Review Contents 1 Introduction 2 Regression Review Basic linear regression MLE에서 주요 지표 3 Logistic Regression 4 Poisson Regression Jinseob Kim Regression Basic July 15, 2015 5 / 26
  • 6. Regression Review Basic linear regression Remind β estimation in linear regression 1 Ordinary Least Square(OLS): semi-parametric 2 Maximum Likelihood Estimator(MLE): parametric 대부분의 회귀분석에서 추정원칙 Jinseob Kim Regression Basic July 15, 2015 6 / 26
  • 7. Regression Review Basic linear regression Least Square(최소제곱법) 제곱합을 최소로: y 정규성에 대한 가정 필요없다. Figure: OLS Fitting Jinseob Kim Regression Basic July 15, 2015 7 / 26
  • 8. Regression Review Basic linear regression Likelihood?? 가능도(likelihood) VS 확률(probability) Discrete: 가능도 = 확률 - 주사위 던져 1나올 확률은 1 6 Continuous: 가능도 != 확률 - 0∼1 에서 숫자 하나 뽑았을 때 0.7일 확률은 0... Jinseob Kim Regression Basic July 15, 2015 8 / 26
  • 9. Regression Review Basic linear regression Maximum likelihood estimator(MLE) 최대가능도추정량: 1, · · · , n이 서로 독립이라하자. 1 각각의 가능도를 구한다. 2 가능도를 전부 곱하면 전체 사건의 가능도 (독립이니까) 3 가능도를 최대로 하는 β를 구한다. 즉, 정규분포 가정 하에 갖고 있는 데이터가 나올 가능성을 최대로 하는 β 를 구한다. Jinseob Kim Regression Basic July 15, 2015 9 / 26
  • 10. Regression Review Basic linear regression MLE: 최대가능도추정량 데이터가 일어날 가능성을 최대로: y또는 분포가정필요. 최소제곱추정량과 동일 Jinseob Kim Regression Basic July 15, 2015 10 / 26
  • 11. Regression Review MLE에서 주요 지표 LRT? Ward? score? Likelihood Ratio Test VS Ward test VS score test 1 통계적 유의성 판단하는 방법들. 2 가능도비교 VS 베타값비교 VS 기울기비교/ Jinseob Kim Regression Basic July 15, 2015 11 / 26
  • 12. Regression Review MLE에서 주요 지표 비교 Figure: Comparison Jinseob Kim Regression Basic July 15, 2015 12 / 26
  • 13. Regression Review MLE에서 주요 지표 AIC 우리가 구한 모형의 가능도를 L이라 하면. 1 AIC = −2 × log(L) + 2 × k 2 k: 설명변수의 갯수(성별, 나이, 연봉...) 3 작을수록 좋은 모형!!! 가능도가 큰 모형을 고르겠지만.. 설명변수 너무 많으면 페널티!!! Jinseob Kim Regression Basic July 15, 2015 13 / 26
  • 14. Regression Review MLE에서 주요 지표 Examples ## Loading required package: splines ## ## Call: ## glm(formula = nonacc ~ meanpm10 + meanso2 + meanno2 + meanco + ## maxo3 + meantemp + meanhumi + meanpress, data = mort) ## ## Deviance Residuals: ## Min 1Q Median 3Q Max ## -81.089 -15.398 -4.053 11.979 117.643 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 782.542027 37.581693 20.822 < 2e-16 *** ## meanpm10 0.047872 0.008661 5.528 3.29e-08 *** ## meanso2 -2.036635 0.087065 -23.392 < 2e-16 *** ## meanno2 1.758609 0.021575 81.513 < 2e-16 *** ## meanco -2.844671 0.078291 -36.335 < 2e-16 *** ## maxo3 -0.252572 0.013237 -19.081 < 2e-16 *** ## meantemp -0.373984 0.032410 -11.539 < 2e-16 *** ## meanhumi -0.202591 0.014763 -13.723 < 2e-16 *** ## meanpress -0.725117 0.036518 -19.856 < 2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## (Dispersion parameter for gaussian family taken to be 530.2808) ## ## Null deviance: 13933228 on 17354 degrees of freedom ## Residual deviance: 9198250 on 17346 degrees of freedom ## (207 observations deleted due to missingness) ## AIC: 158137 ## Jinseob Kim Regression Basic July 15, 2015 14 / 26
  • 15. Logistic Regression Contents 1 Introduction 2 Regression Review Basic linear regression MLE에서 주요 지표 3 Logistic Regression 4 Poisson Regression Jinseob Kim Regression Basic July 15, 2015 15 / 26
  • 16. Logistic Regression Logistic function: MLE Case-control study: Y 가 0 or 1 Figure: Fitting Logistic Function Jinseob Kim Regression Basic July 15, 2015 16 / 26
  • 17. Logistic Regression Model Log( pi 1 − pi ) = β0 + β1 · xi1 pi = P(Yi = 1) = exp(β0 + β1 · xi1) 1 + exp(β0 + β1 · xi1) P(Yi = 0) = 1 1 + exp(β0 + β1 · xi1) P(Yi = yi ) = ( exp(β0 + β1 · xi1) 1 + exp(β0 + β1 · xi1) )yi ( 1 1 + exp(β0 + β1 · xi1) )1−yi Jinseob Kim Regression Basic July 15, 2015 17 / 26
  • 18. Logistic Regression Likelihood Likelihood= n i=1 P(Yi = yi ) = n i=1 ( exp(β0 + β1 · xi1) 1 + exp(β0 + β1 · xi1) )yi ( 1 1 + exp(β0 + β1 · xi1) )1−yi 개인별로 가능도(데이터의 상황이 나올 확률)이 나온다. 그것들을 다 곱하면 Likelihood 이것을 최소로 하는 β를 구하는 것. Case나 Control이나 따로따로 Likelihood를 구한다. Jinseob Kim Regression Basic July 15, 2015 18 / 26
  • 19. Logistic Regression 해석 Log( pi 1 − pi ) = β0 + β1 · xi1 x1이 증가할수록 Log( p 1−p )이 β1만큼 증가한다. p 1−p 이 exp(β1)배가 된다. Odds Ratio = exp(β1) Jinseob Kim Regression Basic July 15, 2015 19 / 26
  • 20. Poisson Regression Contents 1 Introduction 2 Regression Review Basic linear regression MLE에서 주요 지표 3 Logistic Regression 4 Poisson Regression Jinseob Kim Regression Basic July 15, 2015 20 / 26
  • 21. Poisson Regression 가짜 연속: 정규분포 쓸 수 있는가? 발생, 사망수 : 자연수 확률이 어느정도 된다면 이벤트라면 그냥 정규분포 가정해도 무방. 드문 이벤트라면?? Jinseob Kim Regression Basic July 15, 2015 21 / 26
  • 22. Poisson Regression 이항분포 & 정규분포 & 포아송분포 이항분포: 발생확률 p인 일을 n번 수행. 정규분포: n이 무한대일 때 이항분포 포아송분포: n → ∞, p → 0, 또는 np → λ 라면 n! (n − k)!k! pk (1 − p)n−k → e−λ λk k! Jinseob Kim Regression Basic July 15, 2015 22 / 26
  • 23. Poisson Regression Figure: Poisson distribution Jinseob Kim Regression Basic July 15, 2015 23 / 26
  • 24. Poisson Regression Poisson regression model log(E(Y | x)) = α + β x 즉, log(평균발생)과 선형관계가 있다. 또는 E(Y | x) = eα+β x Log(RR) = exp(β) MLE추정은 https://en.wikipedia.org/wiki/Poisson_regression 참조 Jinseob Kim Regression Basic July 15, 2015 24 / 26
  • 25. Poisson Regression Examples ## ## Call: ## glm(formula = nonacc ~ meanpm10 + meanso2 + meanno2 + meanco + ## maxo3 + meantemp + meanhumi + meanpress, family = poisson, ## data = mort) ## ## Deviance Residuals: ## Min 1Q Median 3Q Max ## -16.7854 -2.7891 -0.8845 1.6723 16.4589 ## ## Coefficients: ## Estimate Std. Error z value Pr(>|z|) ## (Intercept) 2.160e+01 2.670e-01 80.90 <2e-16 *** ## meanpm10 1.318e-03 5.916e-05 22.28 <2e-16 *** ## meanso2 -4.389e-02 6.286e-04 -69.82 <2e-16 *** ## meanno2 4.168e-02 1.379e-04 302.27 <2e-16 *** ## meanco -8.808e-02 6.387e-04 -137.90 <2e-16 *** ## maxo3 -6.130e-03 9.672e-05 -63.38 <2e-16 *** ## meantemp -1.170e-02 2.388e-04 -48.99 <2e-16 *** ## meanhumi -4.194e-03 1.042e-04 -40.25 <2e-16 *** ## meanpress -1.748e-02 2.596e-04 -67.36 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## (Dispersion parameter for poisson family taken to be 1) ## ## Null deviance: 328676 on 17354 degrees of freedom ## Residual deviance: 210104 on 17346 degrees of freedom ## (207 observations deleted due to missingness) ## AIC: 300091 ## ## Number of Fisher Scoring iterations: 5Jinseob Kim Regression Basic July 15, 2015 25 / 26
  • 26. Poisson Regression END Email : secondmath85@gmail.com Office: (02)880-2473 H.P: 010-9192-5385 Jinseob Kim Regression Basic July 15, 2015 26 / 26