Waterloo GLMM talk

1. Denitions Estimation Inference Challenges open questions References Generalized linear mixed models Ben Bolker McMaster University, Mathematics Statistics and Biology 26 September 2014

2. Denitions Estimation Inference Challenges open questions References Acknowledgments lme4: Doug Bates, Martin Mächler, Steve Walker Data: Josh Banta, Adrian Stier, Sea McKeon, David Julian, Jada-Simone White NSERC (Discovery) SHARCnet

3. Denitions Estimation Inference Challenges open questions References Outline 1 Examples and denitions 2 Estimation Overview Methods 3 Inference 4 Challenges open questions

5. Denitions Estimation Inference Challenges open questions References (Generalized) linear mixed models (G)LMMs: a statistical modeling framework incorporating: combinations of categorical and continuous predictors, and interactions (some) non-Normal responses (e.g. binomial, Poisson, and extensions) (some) nonlinearity (e.g. logistic, exponential, hyperbolic) non-independent (grouped) data

8. Denitions Estimation Inference Challenges open questions References } correlation nonlinearity random nonlinear least−squares linear regression ANOVA analysis of covariance multiple linear regression (non−normal errors) (nonlinearity) repeated−measures models; time series (ARIMA) } generalized (non−normal errors) linear models random smooth nonlinearity scaled variance effects nonlinearity nonlinear time series models thresholds; mixtures; compound distributions; etc. etc. etc. etc. effects correlation (nonlinearity) general linear models logistic regression binomial regression log−linear models GLMM mixed models generalized additive models quasilikelihood negative binomial models

9. Denitions Estimation Inference Challenges open questions References Coral protection from seastars (Culcita) by symbionts (McKeon et al., 2012) Number of predation events 2 2 1 2 2 1 none shrimp crabs both Symbionts 10 8 Number of blocks 0 6 4 2 1 0 0 0

10. Denitions Estimation Inference Challenges open questions References Environmental stress: Glycera cell survival (D. Julian unpubl.) Anoxia Anoxia Anoxia Copper Normoxia Normoxia Osm=12.8 Osm=22.4 133.3 66.6 33.3 0 0 0.03 0.1 0.32 H2S Normoxia Osm=32 0 0.03 0.1 0.32 Anoxia Normoxia Osm=41.6 Anoxia Normoxia Osm=51.2 0 0.03 0.1 0.32 Osm=12.8 0 0.03 0.1 0.32 Osm=22.4 Osm=32 0 0.03 0.1 0.32 Osm=41.6 133.3 66.6 33.3 0 Osm=51.2 1.0 0.8 0.6 0.4 0.2 0.0

11. Denitions Estimation Inference Challenges open questions References Arabidopsis response to fertilization herbivory (Banta et al., 2010) Log(1+fruit set) 5 4 3 2 1 0 nutrient : 1 l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l ll ll ll l ll lll ll l l l ll ll l l lll l l l l l l l l l ll ll unclipped clipped nutrient : 8 l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l ll l l ll l ll l l l l l l l l l l l l l l l l l l l ll l l ll l ll l l ll lll l lll ll l l l l ll llll l l unclipped clipped

12. Denitions Estimation Inference Challenges open questions References Coral demography (J.-S. White unpubl.) Before Experimental l l l l l l l l l l l l l l l l l l l l lll l ll l l l l l l l l l l l l l l ll l l l l l ll l l ll l l l l ll l l ll l l l l l l llll l l l l l l l l l l l l l l l l l l l lll ll l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l lll l l l l l l l l l l l l l l ll l l ll l l l l l l lll l l l ll l l l l l l l l l 1.00 0.75 0.50 0.25 0.00 l 0 10 20 30 40 50 0 10 20 30 40 50 Previous size (cm) Mortality probability Treatment l l Present Removed

13. Denitions Estimation Inference Challenges open questions References Technical denition Yi |{z} response conditional disztr}ib|u{tion Distr (g1(i ) | {z } inverse link function ; |{z} scale ) parameter

14. Denitions Estimation Inference Challenges open questions References Technical denition Yi |{z} response conditional disztr}ib|u{tion Distr (g1(i ) | {z } inverse link function ; |{z} scale ) parameter |{z} linear predictor = |X{z

15. } xed eects + Zb |{z} random eects

16. Denitions Estimation Inference Challenges open questions References Technical denition Yi |{z} response conditional disztr}ib|u{tion Distr (g1(i ) | {z } inverse link function ; |{z} scale ) parameter |{z} linear predictor = |X{z

17. } xed eects + Zb |{z} random eects b |{z} conditional modes MVN(0; () | {z } ) variance- covariance matrix

18. Denitions Estimation Inference Challenges open questions References What are random eects? A method for . . . accounting for among-individual, within-block correlation compromising between complete pooling (no among-block variance) and xed eects (innite among-block variance) handling levels selected at random from a larger population sharing information among levels (shrinkage estimation) estimating variability among levels allowing predictions for unmeasured levels

25. Denitions Estimation Inference Challenges open questions References Maximum likelihood estimation Best t is a compromise between two components (consistency of data with xed eects and conditional modes; consistency of random eect with RE distribution) Goodness-of-t integrates over conditional modes l l l l l l l l l l l l l l l l l l l l l ll l l 2 1 0 −1 −2 1 2 3 4 5 f y

26. Denitions Estimation Inference Challenges open questions References Shrinkage: Arabidopsis conditional modes 20 10 Mean fruit set 0 5 10 15 20 25 l l l l l l l l l l l l l l l l l l l l l l l l Genotype 1 0.1 0 l group mean shrinkage est.

27. Denitions Estimation Inference Challenges open questions References Estimation methods deterministic : various approximate integrals (Breslow, 2004) Penalized quasi-likelihood, Laplace, Gauss-Hermite quadrature, . . . exibility and speed vs. accuracy . . . stochastic (Monte Carlo): frequentist and Bayesian (Booth and Hobert, 1999; Ponciano et al., 2009; Sung, 2007) usually slower but exible and accurate

28. Denitions Estimation Inference Challenges open questions References Estimation: Culcita (McKeon et al., 2012) Log−odds of predation −6 −4 −2 0 2 Added symbiont Crab vs. Shrimp Symbiont l l l l l l l l l l l l l l l GLM (fixed) GLM (pooled) PQL Laplace AGQ

30. Denitions Estimation Inference Challenges open questions References Wald tests typical results of summary exact for ANOVA, regression: approximation for GLM(M)s fast approximation is sometimes awful (Hauck-Donner eect) parameter log−likelihood

31. Denitions Estimation Inference Challenges open questions References 2D proles for Culcita data 0 −2 −4 −6 −8 −2 −6 −4 −2 −4 −6 −8 Scatter Plot Matrix 2 4 6 8 101214 .sig01 0 −1 −2 −3 15 10 (Intercept) 5 0 10 15 0 1 2 3 tttcrabs −10 −4 −2 0 0 1 2 3 tttshrimp −10 0 1 2 3 tttboth −2 −4 −6 −8 −10 −12 0 1 2 3

32. Denitions Estimation Inference Challenges open questions References Likelihood ratio tests better than Wald, but still have to two problems: denominator degrees of freedom (when estimating scale) for GLMMs, distributions are approximate anyway (Bartlett corrections) Kenward-Roger correction? (Stroup, 2014) Prole condence intervals: expensive/fragile

33. Denitions Estimation Inference Challenges open questions References Parametric bootstrapping t null model to data simulate data from null model t null and working model, compute likelihood dierence repeat to estimate null distribution should be OK but ??? not well tested (assumes estimated parameters are suciently good)

34. Denitions Estimation Inference Challenges open questions References Parametric bootstrap results H2S Osm Cu True p value Inferred p value 0.08 0.06 0.04 0.02 0.02 0.06 0.02 0.06 0.08 0.06 0.04 0.02 Anoxia normal t(14) t(7)

35. Denitions Estimation Inference Challenges open questions References Bayesian inference If we have a good sample from the posterior distribution (Markov chains have converged etc. etc.) we get most of the inferences we want for free by summarizing the marginal posteriors post hoc Bayesian can work, but mode at zero causes problems

36. Denitions Estimation Inference Challenges open questions References Culcita condence intervals l l l l l l l l l l l l l l l tttshrimp tttcrabs tttboth (Intercept) block −15 −10 −5 0 5 10 15 Effect (log−odds of predation) CI l l l l Wald profile boot MCMC fun l glmer glmmADMB MCMCglmm

38. Denitions Estimation Inference Challenges open questions References On beyond R Julia: MixedModels package SAS: PROC MIXED, NLMIXED AS-REML Stata (GLLAMM, xtmelogit) AD Model Builder HLM, MLWiN

39. Denitions Estimation Inference Challenges open questions References Challenges Small/medium data: inference, singular ts (blme, MCMCglmm) Big data: speed! Worst case: large n, small N (e.g. telemetry/genomics) Model diagnosis Condence intervals accounting for uncertainty in variances See also: http://rpubs.com/bbolker/glmmchapter, https: //groups.nceas.ucsb.edu/non-linear-modeling/projects

40. Denitions Estimation Inference Challenges open questions References Spatial and temporal correlations Sometimes blocking takes care of non-independence ... but sometimes there is temporal or spatial correlation within blocks . . . also phylogenetic . . . (Ives and Zhu, 2006) G-side vs. R-side eects tricky to implement for GLMMs, but new possibilities on the horizon (Rousset and Ferdy, 2014; Rue et al., 2009)

41. Denitions Estimation Inference Challenges open questions References Next steps Complex random eects: regularization, model selection, penalized methods (lasso/fence) Flexible correlation and variance structures Flexible/nonparametric random eects distributions hybrid improved MCMC methods Reliable assessment of out-of-sample performance

42. Denitions Estimation Inference Challenges open questions References References Banta, J.A., Stevens, M.H.H., and Pigliucci, M., 2010. Oikos, 119(2):359369. ISSN 1600-0706. doi:10.1111/j.1600-0706.2009.17726.x. Booth, J.G. and Hobert, J.P., 1999. Journal of the Royal Statistical Society. Series B, 61(1):265285. doi:10.1111/1467-9868.00176. Breslow, N.E., 2004. In D.Y. Lin and P.J. Heagerty, editors, Proceedings of the second Seattle symposium in biostatistics: Analysis of correlated data, pages 122. Springer. ISBN 0387208623. Ives, A.R. and Zhu, J., 2006. Ecological Applications, 16(1):2032. McKeon, C.S., Stier, A., et al., 2012. Oecologia, 169(4):10951103. ISSN 0029-8549. doi:10.1007/s00442-012-2275-2. Ponciano, J.M., Taper, M.L., et al., 2009. Ecology, 90(2):356362. ISSN 0012-9658. Rousset, F. and Ferdy, J.B., 2014. Ecography, page nono. ISSN 1600-0587. doi:10.1111/ecog.00566. Rue, H., Martino, S., and Chopin, N., 2009. Journal of the Royal Statistical Society, Series B, 71(2):319392. Stroup, W.W., 2014. Agronomy Journal, 106:117. doi:10.2134/agronj2013.0342. Sung, Y.J., 2007. The Annals of Statistics, 35(3):9901011. ISSN 0090-5364. doi:10.1214/009053606000001389.

Waterloo GLMM talk

Recommandé

Recommandé

Contenu connexe

En vedette

En vedette (9)

Similaire à Waterloo GLMM talk

Similaire à Waterloo GLMM talk (20)

Plus de Ben Bolker

Plus de Ben Bolker (13)

Dernier

Dernier (20)

Waterloo GLMM talk