This document introduces new epidemiological measures for multilevel studies, including the median risk ratio, median hazard ratio, and median beta. It begins with an introduction and overview of intraclass correlation coefficients and variance partition coefficients. It then provides formulas for calculating the new measures based on binomial, Poisson, and Cox proportional hazards multilevel models. Examples are shown using real data on breast cancer and families to demonstrate how to compute and interpret the median odds ratio, median risk ratio, and median hazard ratio. The document concludes by discussing applications of the new measures to other data types like count and survival data.
2. Introduction
Formula
Examples
Discussion
References
Contents
1 Introduction
ICC & VPC
Multilevel analysis: binomial case
This study
2 Formula
Brief review of median OR
Median RR, Median HR, Median Beta
3 Examples
Data
Count data
Cox proportional hazard model
4 Discussion
Jinseob Kim1
New Epidemiologic Measures in Multilevel Study:
3. Introduction
Formula
Examples
Discussion
References
ICC & VPC
Multilevel analysis: binomial case
This study
목표
1 Multilevel study에서 그룹변수의 효과를 설명하는 방법을
소개한다.
2 그룹변수의 효과를 직관적으로 설명할 수 있는 새로운 지표를
제시한다.
3 실제로 어떻게 계산하고 해석되는지 예제를 통해 알아본다.
4 새로운 역학지표로서의 의의.
Jinseob Kim1
New Epidemiologic Measures in Multilevel Study:
4. Introduction
Formula
Examples
Discussion
References
ICC & VPC
Multilevel analysis: binomial case
This study
예시
Health survey conducted in 2000 in the county of Scania,
Sweden[11]
1 10,723 persons, 18-80 age, 60 areas
2 Individual propensity of consulting private physicians VS Area.
3 Y: 최근 1년간 private physicians consulting 경험유무:
binomial
4 X: individual level variables, area level variables, area
Jinseob Kim1
New Epidemiologic Measures in Multilevel Study:
6. Introduction
Formula
Examples
Discussion
References
ICC & VPC
Multilevel analysis: binomial case
This study
Effect of group variable
1 Repeated measure, random effect, multilevel, hierarchial GLM,
GEE, GLMM...
2 그룹변수의 beta값 못구하겠다. (그룹이 너무 많다.. 50개 100
개...)
3 구해본들.. 해석난감.. (50개 그룹 → 49개 베타값)
4 그룹변수의 효과를 숫자 하나로 표시한다: Vgroup
5 분산이 얼마나 크냐?? 0: 그룹은 의미없다, 클수록 그룹의
의미가 크다.
Jinseob Kim1
New Epidemiologic Measures in Multilevel Study:
7. Introduction
Formula
Examples
Discussion
References
ICC & VPC
Multilevel analysis: binomial case
This study
Intraclass correlation coefficient, variance partition
coefficients
Yi = Xi β + Groupi + i (1)
ICC =
VGroup
VY
=
VGroup
VGroup + V
(2)
1 그룹변수의 효과를 나타내는 지표[1, 6].
2 0: 그룹변수는 의미없는 변수, 1: 그룹변수가 Y 의 모든 것을
설명한다.
Jinseob Kim1
New Epidemiologic Measures in Multilevel Study:
8. Introduction
Formula
Examples
Discussion
References
ICC & VPC
Multilevel analysis: binomial case
This study
ICC example
lmer(formula = TG ~ age + sex + BMI + (1 | FID), data = a)
Estimate Std. Error t value
(Intercept) -65.222107 35.8720093 -1.8181894
age 0.109564 0.3318413 0.3301699
sex -41.942137 11.3684264 -3.6893529
BMI 8.648601 1.2917159 6.6954362
Groups Name Std.Dev.
FID (Intercept) 39.356
Residual 72.007
Jinseob Kim1
New Epidemiologic Measures in Multilevel Study:
10. Introduction
Formula
Examples
Discussion
References
ICC & VPC
Multilevel analysis: binomial case
This study
Different scale: ICC??
Var(Yi ) = pi (1 − pi ) (4)
logit(pi ) = Xi β + Groupi (5)
Proportional scale VS Logistic scale[3]
Jinseob Kim1
New Epidemiologic Measures in Multilevel Study:
11. Introduction
Formula
Examples
Discussion
References
ICC & VPC
Multilevel analysis: binomial case
This study
Example: binomial case
glmer(formula = hyperTG ~ age + sex + BMI + (1 | FID), data = a,
family = binomial)
Estimate Std. Error z value Pr(>|z|)
(Intercept) -6.65451749 1.48227814 -4.4893852 7.142904e-06
age 0.01052907 0.01206682 0.8725635 3.829010e-01
sex -1.48506920 0.60773433 -2.4436158 1.454090e-02
BMI 0.19131619 0.05022612 3.8090977 1.394749e-04
Groups Name Std.Dev.
FID (Intercept) 1.1163
Jinseob Kim1
New Epidemiologic Measures in Multilevel Study:
12. Introduction
Formula
Examples
Discussion
References
ICC & VPC
Multilevel analysis: binomial case
This study
Solution
1 Linearization : logit → proportion
2 Simulation : proportion → logit
3 Latent variable
Approximation of ICC, calculation issue[3, 15]
Jinseob Kim1
New Epidemiologic Measures in Multilevel Study:
13. Introduction
Formula
Examples
Discussion
References
ICC & VPC
Multilevel analysis: binomial case
This study
Median Odds Ratio(MOR)
Larsen et al.(2000, 2005)
임의의 두 group을 골랐을 때 (Odds가 큰 그룹: 작은 그룹) 의
OR이 대충(median) 얼마나 되는가?[8, 7, 11]
MOR = exp ( 2VGroup × Φ−1
(0.75)) exp (0.95 VGroup) (6)
1 1 ∼ inf : Group효과 없다, 엄청 크다.
2 VGroup만 있으면 계산가능.
3 OR scale로 해석: age, sex 해석하듯이 하면 된다.
Jinseob Kim1
New Epidemiologic Measures in Multilevel Study:
16. Introduction
Formula
Examples
Discussion
References
ICC & VPC
Multilevel analysis: binomial case
This study
Example: binomial case
glmer(formula = hyperTG ~ age + sex + BMI + (1 | FID), data = a,
family = binomial)
Groups Name Std.Dev.
FID (Intercept) 1.1163
MOR = exp( 2 × 1.11632 × 0.6745) = 3.67 (7)
: 임의의 두 가족을 뽑으면 대충(median) OR이 3.67이다.
Jinseob Kim1
New Epidemiologic Measures in Multilevel Study:
17. Introduction
Formula
Examples
Discussion
References
ICC & VPC
Multilevel analysis: binomial case
This study
If count data? survival analysis?
Count data(rate, 자녀 수..)
1 poisson 분포때는 ICC계산가능 (similar to binomial case)
2 Gamma, neg-bin...???, Interpretation issue: 0-1 scale.
Cox-proportional hazard model
1 ICC의 개념이 없다. Y: hazard function...
2 그냥 Vgroup 만 제시하는 정도.. Interpretation issue.
Jinseob Kim1
New Epidemiologic Measures in Multilevel Study:
18. Introduction
Formula
Examples
Discussion
References
ICC & VPC
Multilevel analysis: binomial case
This study
목표
New measurement in multilevel analysis.
1 Count data(poisson, gamma, neg-bin...)[4, 14, 16] : Median
Risk Ratio
2 Survival data : cox proportional hazard : Median Hazard
Ratio
3 Continuous data : Median Beta
일반 변수 해석과 같은 Scale로 해석가능 & 계산이 간단하며
신뢰구간도 쉽게 구할 수 있다[5].
Jinseob Kim1
New Epidemiologic Measures in Multilevel Study:
19. Introduction
Formula
Examples
Discussion
References
Brief review of median OR
Median RR, Median HR, Median Beta
Multilevel logistic regression[8]
Logit[Pr(Yij = 1|Xij , Gj )] = β0 + Xij β1 + Gj (8)
(β0: intercept, β1: vector of fixed regression coefficients, Gj :
random intercept Gj ∼ N(0, Vg ))
Odds[Pr(Yij = 1|Xij , Gj )] = exp (β0) exp (Xij β1) exp (Gj ) (9)
Odds[Pr(Yij = 1|X, Gj )]
Odds[Pr(Yik = 1|X, Gk)]
= exp (Gj − Gk) (10)
Jinseob Kim1
New Epidemiologic Measures in Multilevel Study:
20. Introduction
Formula
Examples
Discussion
References
Brief review of median OR
Median RR, Median HR, Median Beta
Odds가 큰그룹을 Odds가 작은 그룹과 비교!
OR = exp |Gj − Gk| (11)
(Gj − Gk) ∼ N(0, 2Vg ) (12)
결국 임의로 두 그룹을 뽑았을 때 Odds가 큰 그룹과 Odds가 작은
그룹을 비교하여 OR의 median값을 계산하였을 때 그 결과는
MOR = exp ( 2Vg × Φ−1
(0.75)) exp (0.95 Vg) (13)
(Φ: probability density function(PDF) of standard normal
distribution)
Jinseob Kim1
New Epidemiologic Measures in Multilevel Study:
21. Introduction
Formula
Examples
Discussion
References
Brief review of median OR
Median RR, Median HR, Median Beta
Multilevel poisson regression[9]
Yij |λij ∼ Pois(λij ) (14)
ln[(λij |Xij , Gj )] = β0 + Xij β1 + Gj (15)
Risk[(λij |X, Gj )]
Risk[(λik|X, Gk)]
= exp (Gj − Gk) (16)
Jinseob Kim1
New Epidemiologic Measures in Multilevel Study:
22. Introduction
Formula
Examples
Discussion
References
Brief review of median OR
Median RR, Median HR, Median Beta
Risk가 큰그룹을 Risk가 작은 그룹과 비교!!
RR = exp |Gj − Gk| (17)
(Gj − Gk) ∼ N(0, 2Vg ) (18)
임을 이용하면 결국 임의로 두 그룹을 뽑았을 때 Risk가 큰 그룹과
Risk가 작은 그룹을 비교하여 RR의 median값을 계산하였을 때 그
결과는
MRR = exp ( 2Vg × Φ−1
(0.75)) exp (0.95 Vg) (19)
(Φ: probability density function(PDF) of standard normal
distribution)
Jinseob Kim1
New Epidemiologic Measures in Multilevel Study:
23. Introduction
Formula
Examples
Discussion
References
Brief review of median OR
Median RR, Median HR, Median Beta
Multilevel cox-proportional hazard analysis[10]
ln[(
hij (t)
h0(t)
|Xij , Gj )] = β0 + Xij β1 + Gj (20)
(hij (t): hazard function of ith individual of jth group, h0(t): base
hazard function)
[(hij (t)|X, Gj )]
[(hik(t)|X, Gk)]
= exp (Gj − Gk) (21)
MHR = exp ( 2Vg × Φ−1
(0.75)) exp (0.95 Vg) (22)
Jinseob Kim1
New Epidemiologic Measures in Multilevel Study:
24. Introduction
Formula
Examples
Discussion
References
Brief review of median OR
Median RR, Median HR, Median Beta
Gaussian multilevel regression
Yij ∼ N(µij , σ2
) (23)
[(µij |Xij , Gj )] = β0 + Xij β1 + Gj (24)
[(µij |X, Gj )] − [(µik|X, Gk)] = (Gj − Gk) (25)
Median Beta = 2Vg × Φ−1
(0.75) 0.95 Vg (26)
Jinseob Kim1
New Epidemiologic Measures in Multilevel Study:
25. Introduction
Formula
Examples
Discussion
References
Data
Count data
Cox proportional hazard model
Minnesota Breast Cancer Study- kinship2 packages in R[13]
1 3725 obs. of 15 variables (female with non-missing)
2 education : 1-고졸이하, 2-대졸미만, 3-대졸이상
3 marstat : 1- 결혼 및 사실혼, 2- 사별 및 이혼, 3-미혼
4 yob(출생년도): 1: -1919, 2: 1920-1939, 3: 1940-1959, 4: 1960-
5 parity: 자녀 수
6 cancer: 1-유방암, 0-censored
7 endage: 마지막 f/u 또는 암발생 나이
8 famid: 가족 id
Jinseob Kim1
New Epidemiologic Measures in Multilevel Study:
27. Introduction
Formula
Examples
Discussion
References
Data
Count data
Cox proportional hazard model
Variables Model.1 Model.2 Model.3
(Intercept) 2.75 (2.67˜2.83) 3.53 (3.35˜3.71) 3.61 (3.35˜3.9)
Education
1 . 1 1
2 . 0.84 (0.8˜0.89) 0.89 (0.84˜0.94)
3 . 0.67 (0.63˜0.71) 0.75 (0.7˜0.79)
Marriage
1 . 1 1
2 . 1.03 (0.98˜1.08) 0.95 (0.9˜1)
3 . 0.07 (0.05˜0.11) 0.08 (0.05˜0.13)
Year of birth
˜1919 . . 1
˜1939 . . 1.13 (1.05˜1.21)
˜1959 . . 0.75 (0.7˜0.81)
1960˜ . . 0.52 (0.47˜0.59)
V famid 0.03 0.02 0.01
Median RR 1.18 1.14 1.11
Table: Y: parity, Group: family ID, lme4 package in R
Jinseob Kim1
New Epidemiologic Measures in Multilevel Study:
28. Introduction
Formula
Examples
Discussion
References
Data
Count data
Cox proportional hazard model
Interpretation
1 일반적인 RR에 대한 해석
2 가족구조가 차지하는 분산이 각각 0.03, 0.02, 0.01
3 MRR: 임의로 두 가족을 골랐을때 high rate: low rate의 RR
값의 중간값은 각각 1.18, 1.14, 1.11
4 교육수준, 결혼상태, period effect 를 고려한 후에도
가족자체의 효과는 남아있다??
Jinseob Kim1
New Epidemiologic Measures in Multilevel Study:
29. Introduction
Formula
Examples
Discussion
References
Data
Count data
Cox proportional hazard model
Variables Model.1 Model.2 Model.3 Model.4
I(parity > 0)TRUE . 0.71 (0.66˜0.77) 0.72 (0.65˜0.79) 0.74 (0.67˜0.81)
Education
1 . . 1 1
2 . . 1.32 (1.25˜1.4) 1.24 (1.17˜1.31)
3 . . 1.07 (0.99˜1.14) 0.97 (0.9˜1.04)
Marriage
1 . . 1 1
2 . . 1.03 (0.99˜1.07) 1.15 (1.1˜1.2)
3 . . 1.08 (0.71˜1.64) 1.23 (0.81˜1.88)
Year of birth
˜1919 . . . 1
˜1939 . . . 1.41 (1.3˜1.52)
˜1959 . . . 2.52 (2.21˜2.87)
1960˜ . . . 1.5 (0.17˜13.14)
V famid 0.18 0.18 0.18 0.17
Median HR 1.49 1.5 1.5 1.49
Table: Y: Breast cancer hazard, Group: family ID, coxme package in R
Jinseob Kim1
New Epidemiologic Measures in Multilevel Study:
30. Introduction
Formula
Examples
Discussion
References
Data
Count data
Cox proportional hazard model
Interpretation
1 일반적인 Hazard Ratio에 대한 해석.
2 가족구조가 차지하는 분산이 1.5정도
3 MHR: 임의로 두 가족을 골랐을때 high hazard: low hazard
의 HR값의 중간값은 1.5
4 출산경험, 교육수준, 결혼상태, period effect와 상관없이
가족력이 일정하게 존재한다??
Jinseob Kim1
New Epidemiologic Measures in Multilevel Study:
31. Introduction
Formula
Examples
Discussion
References
Median OR의 count, survival 버전.
1 Poisson regression: ICC 계산가능 but, Neg-bin? Gamma?
2 Cox: ICC 개념적용어렵다, 그냥 그룹변수의 분산을 제시하고
끝이었다.
3 MRR, MHR: 다른 지표 해석과 같은 scale에서 해석이
가능하다[2, 12].
4 계산이 간단하다. 그룹변수만 있으면 된다.
5 신뢰구간 구하기도 ICC보다 편하다.
Jinseob Kim1
New Epidemiologic Measures in Multilevel Study:
32. Introduction
Formula
Examples
Discussion
References
Conclusion
1 New measurement explaining effect of group variable in
multilevel analysis with count data/survival data
2 Count data: ICC의 대안, 해석하고싶은 scale로 (proportion VS
RR)
3 Cox: Best explaination??
4 Median Beta도 ICC의 대안이 될 수 있다.
5 Multilevel study에서 Group level의 효과를 직관적으로
설명할 수 있어 의사결정과 소통에 도움이 될 것이다.
Jinseob Kim1
New Epidemiologic Measures in Multilevel Study:
33. Introduction
Formula
Examples
Discussion
References
Packages
1 lme4, nlme, coxme.. in R
2 Confidence interval for MHR, MRR: calculation issue..
3 Using Bayesian hierarchical model with OpenBUGS, JAGS,
Stan..
4 R2OpenBUGS, BRugs, rjags, R2jags, rstan
Jinseob Kim1
New Epidemiologic Measures in Multilevel Study:
34. Introduction
Formula
Examples
Discussion
References
Reference I
[1] Bartko, J. J. (1966). The intraclass correlation coefficient as a measure of reliability. Psychological reports,
19(1):3–11.
[2] Bolker, B. M., Brooks, M. E., Clark, C. J., Geange, S. W., Poulsen, J. R., Stevens, M. H. H., and White,
J.-S. S. (2009). Generalized linear mixed models: a practical guide for ecology and evolution. Trends in ecology
& evolution, 24(3):127–135.
[3] Browne, W. J., Subramanian, S. V., Jones, K., and Goldstein, H. (2005). Variance partitioning in multilevel
logistic models that exhibit overdispersion. Journal of the Royal Statistical Society: Series A (Statistics in
Society), 168(3):599–613.
[4] Coxe, S., West, S. G., and Aiken, L. S. (2009). The analysis of count data: A gentle introduction to poisson
regression and its alternatives. Journal of personality assessment, 91(2):121–136.
[5] Do Ha, I. and Lee, Y. (2005). Multilevel mixed linear models for survival data. Lifetime data analysis,
11(1):131–142.
[6] Goldstein, H., Browne, W., and Rasbash, J. (2002). Partitioning variation in multilevel models. Understanding
Statistics: Statistical Issues in Psychology, Education, and the Social Sciences, 1(4):223–231.
[7] Larsen, K. and Merlo, J. (2005). Appropriate assessment of neighborhood effects on individual health:
integrating random and fixed effects in multilevel logistic regression. American journal of epidemiology,
161(1):81–88.
[8] Larsen, K., Petersen, J. H., Budtz-Jørgensen, E., and Endahl, L. (2000). Interpreting parameters in the logistic
regression model with random effects. Biometrics, 56(3):909–914.
[9] Lee, A. H., Wang, K., Scott, J. A., Yau, K. K., and McLachlan, G. J. (2006). Multi-level zero-inflated poisson
regression modelling of correlated count data with excess zeros. Statistical Methods in Medical Research,
15(1):47–61.
Jinseob Kim1
New Epidemiologic Measures in Multilevel Study:
35. Introduction
Formula
Examples
Discussion
References
Reference II
[10] Liu, L. and Huang, X. (2008). The use of gaussian quadrature for estimation in frailty proportional hazards
models. Statistics in medicine, 27(14):2665–2683.
[11] Merlo, J., Chaix, B., Ohlsson, H., Beckman, A., Johnell, K., Hjerpe, P., R˚astam, L., and Larsen, K. (2006). A
brief conceptual tutorial of multilevel analysis in social epidemiology: using measures of clustering in multilevel
logistic regression to investigate contextual phenomena. Journal of Epidemiology and Community Health,
60(4):290–297.
[12] Therneau, T. (2012). Mixed effects cox models. R-package description. URL: http://cran. r-project.
org/web/packages/coxme/vignettes/coxme. pdf.
[13] Therneau, T., Atkinson, E., Sinnwell, J., Schaid, D., and McDonnell, S. (2014). kinship2: Pedigree functions.
R package version 1.5.7.
[14] Ver Hoef, J. M. and Boveng, P. L. (2007). Quasi-poisson vs. negative binomial regression: how should we
model overdispersed count data? Ecology, 88(11):2766–2772.
[15] Vigre, H., Dohoo, I., Stryhn, H., and Busch, M. (2004). Intra-unit correlations in seroconversion to
actinobacillus pleuropneumoniae and mycoplasma hyopneumoniae at different levels in danish multi-site pig
production facilities. Preventive veterinary medicine, 63(1-2):9–28.
[16] Winkelmann, R. and Zimmermann, K. F. (1995). Recent developments in count data modelling: theory and
application. Journal of economic surveys, 9(1):1–24.
Jinseob Kim1
New Epidemiologic Measures in Multilevel Study: