4. 4
欠測のパターン(復習)
• Ignorable
–MCAR (missing completely at random)
–MAR (missing at random)
• Non-ignorable
–MNAR (missing not at random)
• Ad hocな方法(complete case studyなど)では
MCAR以外の欠測パターンではバイアスが
生じる
21. 21
③:事例
• Statistical Methods
–We used multiple imputation to handle missing data.
To impute the missing data we constructed multiple
regression models including variables potentially
related to the fact that the data were missing and also
variables correlated with that outcome. We used Stata
(StataCorp, College Station, Texas, USA)18 and
PROC MI in SAS (SAS Institute, Cary, NC, USA) to
obtain similar answers, and only the former are
presented.
BMJ, doi:10.1136/bmj.38441.620417.BF (published 23 May 2005)
22. 22
③:事例
• Methods
–The imputation proedure uses all the known covariates
thought to be associated with the missingness
mechanism and cost, together with the
interrelationships between the cost components, to
help predict the values for the missing data.The
incomplete response variables were ~. The observed
covariates were ~.
–Ex. of ~) sex (dichotomous), Age (continuous),
hospital cost (continuous; log transformed)
Clinical Trials 2007; 4:154-161
25. 25
④非正規分布の連続変数及び
カテゴリー変数の扱い
• Methods
–The MI data augmentation procedure used here
assumes that the data have a multivariate normal
distribution. Suitable transformations were necessary
for this assumption to hold. (中略) The continuous
variables for the non-zero values for the hospice cost,
non-QE cost and the GP cost components were highly
skewed. A scales logit transformation, as suggested in
Scafer’s Norm Program, was chosen to give normally
distribited and plausible values.
Clinical Trials 2007; 4:154-161
26. 26
④-1 To round or not to round?
• カテゴリーなどの変数はimputation をすると、
連続変数で表現される
Enders “Applied Missing Data Analysis”Guilford 2010 P263
27. 27
④-1 To round or not to round?
• 特に2値変数である場合は、まるめる必要は
ないとされる。まるめることでパラメータ推定
にバイアスが生じる。(Alison,2005 etc.)
30. 30
⑤代入のモデルと代入したデータ
セットの数(m)の記載
• 歴史的には 2-5 セットが推奨
• しかし、コンピュータの計算能
力が上がるにつれて、5 - 20
セットが推奨されてきている。
• セットが多いほど、推定の精
度が上昇するため
• 代入のモデルについては⑥
にて
ØEfficiency of MI
Øγ = rate of missing
information
http://sites.stat.psu.edu/~jls/mifaq.html
31. 31
⑤:事例
• Method
–We used an extended hot deck multiple imputation
technique that modifies the predictive mean matching
method to impute item-level missing data. Rates of
item-level missing data were less than 2% for all
variables discussed in this article. The results across 5
imputed data sets were combined by averaging, and
SEs were adjusted to reflect both within-imputation
variability and between-imputation variability.
JAMA. 2002;288:2836-2845
42. 42
⑦MARの仮定
~感度分析の重要性~
• National Reseach Council 2010
–Sensitivity analysis should be part of the primary
reporting of findings from clinical trials.
Examinating sensitivity to the assumptions about
missing data mechanism should be a mandatory
component of reporting
• ↑と書いているにも関わらず、これをやってい
いない研究が大多数(Sterne,2009)
43. 43
感度分析の方法
• “...However, we know of no generally
available MI software package which can do
this” (Carpenter,2007)
• δ-adjustment (VanBuuren,1999)
• Weighting approach (Carpenter,2007)
• 実務的には、異なるアルゴリズムで再計算し
て感度分析とするのが良いか?
44. 44
⑦:事例
• Methods
–We carried out some sensitivity analysis using
alternative modeling strategies. When using the SRMI,
another modeling option is to treat income, education,
and age as continuous to capture the underlying ordering
of these variables. Their corresponding conditional
regression models are thus linear normal models. After
rounding the continuous imputations to the nearest
allowed integer values, the logistic regression analysis
results (not shown) are similar to those from the option
treating all variables as categorical. We also applied the
joint modeling strategy using a general location model.
Circulation: Cardiovascular Quality and Outcomes.2010; 3: 98-105
52. 52
テンプレート
• “The percentage of missing values across the nine
variables varied between 0 and 34%. In total 1601 out
of 3801 records (42%) were incomplete. Many girls
had no score because the nurse felt that the
measurment was “unnecessary”, or because the girl
did not give permission. Older girls had many more
mising data.We used multiple imputation (Rubin,
1987a) to create and analyze 40 multiply imputed
datasets. Methodologist currently regard multiple
imputation as a state-of-the-art technique because it
improves accuracy and statical power relative to oter
missing data techniques. (→ continued)
53. 53
テンプレート
• → ...Incomplete variables were imputed under fully
conditional specification (Van Buuren et al., 2006).
Calculation were done in R 2.13.1 using the default
strings of the mice 2.12 package were estimated with
multiple regression applied to each imputed dataset
separately. These estimates and their standard errors
were combined using Rubin’s rules. For comparison,
we also performed the analysis on the subset of
complete cases.”
Stef van Buuren “Flexible imputation of missing data” CRC Press 2012 P254
54. 54
参考書籍
• Enders, “Applied Missing Data Analysis”
Guilford; 2010
• Stef van Buuren, “Flexible Imputation of
Missing Data” CRC Press; 2012
http://www.stef
vanbuuren.nl/
http://www.
appliedmissi
ngdata.com/
55. 55
参考文献
• A Burton et al. “Cost-effectiveness in clinical trials: using
multiple imputation to deal with incomplete cost data” Clin.
Trials 2007;4:154-161
• J Sterne et al. ”Multiple imputation for missing data in
epidemiological and clinical research: potential and pitfalls”
BMJ 2009;338:b2393
• A Mackinnon “The use and reporting of multiple imputation
in medical research - a review” J Intern Med 2010;268:
586–593.
• JL Schafer, JW Graham “Missing Data: Our View of the
State of the Art” Psychological Methods 2002,Vol.7,No. 2,
147–177
56. 56
参考文献
• A Marshall et al. “Combining estimates of interest in
prognostic modelling studies after multiple imputation:
current practice and guidelines” BMC Medical Research
Methodology 2009, 9:57
• L Collins “A comparison of inclusive and restrictive
strategies in modern missing data procedures”Psychological
Methods, Vol 6(4), Dec 2001, 330-351.
• Y He “Missing Data Analysis Using Multiple Imputation:
Getting to the Heart of the Matter” Circ Cardiovasc Qual
Outcomes. 2010;3:98-105
• S. van Buuren et al. “Multiple Imputation of Missing Blood
Pressure Covariates in Survival Analysis” Statist. Med. 18,
681-694 (1999)