class: center, middle, inverse, title-slide # Bootstrapping Clustered Data ## via a Weighted Laplace Approximation ### Daniel Flores-Agreda ### School of Public Health and Preventive Medicine ### ViCBiostat Work In Progress (WIP) Seminar 9/04/2020 --- class: center, middle # Joint Work ??? Dear Ms Hickey, Please find these slides for your review. This version includes the previously mentioned Simpsons memes. The planned special pre-booked session will follow very shortly (i.e. tomorrow, for example? tonight there's Tennis! *shrug) Kindest regards, Dan/Dani/Bear --- class: center, middle # _(Co)-Starring:_ --- class: center # _(Co)-Starring:_ .column[ ### Eva Cantoni <img src="https://www.unige.ch/gsem/files/cache/89437c652682ce133175f82b3238bb0a_f3835.jpg" width="350"> <img src="./img/gsem_en.png" width="150"> ] .column[ ### Stephane Heritier <img src="https://research.monash.edu/files-asset/248358259/Stephane_Roland_Heritier.jpg" width="150"> <img src="./img/monash-stacked-black-cmyk.png" width="150"> ] .column[ ### Rory Wolfe <img src="https://research.monash.edu/files-asset/248351697/profile+photo" width="150"> <img src="./img/monash-stacked-black-cmyk.png" width="150"> ] --- # Outline 1. **Clustered Data** 2. **Bootstrapping** Clustered Data 3. Bootstrapping Clustered Data via a **Weighted Laplace Approximation** 4. (Some) Properties 5. Discussion and Current Work --- class: inverse, center, middle <!-- Tell a story --> <!-- This is the journey : Elevator pitch 30 seconds --> <!-- Signpost along the way, i.e. <!-- Say where are we going and where are we up to in the story --> # Clustered Data --- class: center ### Data that can be **grouped**, according to a **hierarchical** structure. <img src="./img/Charts-001.png" width="600"> ### _Highest_ level in the hierarchy: Independent **Clusters** ( `\(i\)` ) ??? A natural clustering factor in medical applications and longitudinal studies is the patient level. from which the different observations come. In this case, the patient, but in general it can be other things, such as hospitals in centre-patient studies. More levels of clustering can be considered. --- class: center ### Modelling `\(\Rightarrow\)` **Mixed Models** (MM) .content-box-lightgrey[ `$$\text{Mixed Effects} = \textbf{Fixed Effects} + \color{blue}{\textbf{Random Effects}}$$` ] --- class: center ### Modelling `\(\Rightarrow\)` **Mixed Models** (MM) .content-box-lightgrey[ `$$\text{Mixed Effects} = \underbrace{\textbf{Fixed Effects}}_{\mathbf{X}_{i}\mathbf{\beta}} + \underbrace{\color{blue}{\textbf{Random Effects}}}_{\color{blue}{\mathbf{Z}_{i}\mathbf{D}_{\sigma}\mathbf{u}_{i}}}$$` ] -- .pull-left[ Generalized Linear Mixed Models (**GLMM**) .content-box-lightgrey[ `$${\scriptsize \underbrace{\text{link function}}_{g}\underbrace{\left[\mathbb{E}\text{xpected}\left(\textbf{Outcomes}\right)\right]}_{\left[\mathbb{E}\left( \mathbf{Y}_{i} \ \vert \ \mathbf{X}_{i} , \mathbf{u}_{i} \right)\right]} = \mathbf{X}_{i}\mathbf{\beta} + \color{blue}{\mathbf{Z}_{i}\mathbf{D}_{\sigma}\mathbf{u}_{i}}}$$` ] .black[Example] Longitudinal **Mixed Logit**: .content-box-lightgrey[ `$$\log \frac{p_{ij}}{1-p_{ij}} = \mathbf{x}_{ij}^{T}\mathbf{\beta} + \color{blue}{\sigma_{1} u_{i}}$$` ] ] -- .pull-right[ Accelerated Failure Time Models (**AFT**) .content-box-lightgrey[ `$${\scriptsize \text{log}\underbrace{\left(\textbf{Duration}\right)}_{ \mathbf{T}_{i} } = \mathbf{X}_{i}\mathbf{\beta}+\color{blue}{\mathbf{Z}_{i}\mathbf{D}_{\sigma}\mathbf{u}_{i}}+\color{purple}{\phi\mathbf{\varepsilon}_i}}$$` ] .black[Example]: **AFT** with **Random Intercept** and **Right-Censoring** .content-box-lightgrey[ `\begin{align} \log t_{ij}&= \mathbf{x}_{it}^T\mathbf{\beta} + \color{blue}{\sigma_1 u_i} + \phi\varepsilon_{it}\\ Y_{ij} &= (1-\delta_{ij}) t_{ij}+\delta_{ij} C_{ij} \end{align}` `\(\delta_{ij}\)`, `\(C_{ij}\)`: censoring **indicators** and **times** ] ] ??? For parsimony, the effects of the different levels in this hierarchy are considered as random, this has the added advantage of modelling within-cluster correlations. Mixed models _link_ the expected outcome with a combination of Fixed Effects and Random effects. This combination can be _linear_ yielding the class of generalized Linear Mixed Models. --- class: center, middle ### Likelihood Inference `\(\Rightarrow\)` **(Numerical) Integration** (of `\(\color{\blue}{\mathbf{u}_{i}}\)` ) .content-box-lightgrey[ `$$\color{red}{\mathcal{L}_i\left(\mathbf{\theta}\right)} = \color{red}{f\left(\mathbf{y}_{i}\right)}$$` ] --- class: center, middle ### Likelihood Inference `\(\Rightarrow\)` **(Numerical) Integration** (of `\(\color{\blue}{\mathbf{u}_{i}}\)` ) .content-box-lightgrey[ `$$\color{red}{\mathcal{L}_i\left(\mathbf{\theta}\right)} = \color{blue}{\int_{\mathbb{R}^q}} f\left(\mathbf{y}_i, \color{blue}{\mathbf{u}_i}\right) \mathrm{d} \color{blue}{\mathbf{u}_{i}}$$` ] ??? As with many problems with unobserved random effects, these need to be partialed out by integration from the joint pdf of responses and random effects in order to construct the likelihood. --- class: center, middle ### Likelihood Inference `\(\Rightarrow\)` **(Numerical) Integration** (of `\(\color{\blue}{\mathbf{u}_{i}}\)` ) .content-box-lightgrey[ `$$\color{red}{\mathcal{L}_i\left(\mathbf{\theta}\right)} = \int_{\mathbb{R}^q} \color{blue}{f\left(\mathbf{y}_i \vert \color{blue}{\mathbf{u}_i}\right)f\left(\color{blue}{\mathbf{u}_i}\right)} \mathrm{d} \mathbf{u}_{i}$$` ] ??? which is the product between the conditional pdf or pmf and random effect density --- class: center, middle ### Likelihood Inference `\(\Rightarrow\)` **(Numerical) Integration** (of `\(\color{\blue}{\mathbf{u}_{i}}\)` ) .content-box-lightgrey[ `$$\color{red}{\mathcal{L}_i\left(\mathbf{\theta}\right)} =\int_{\mathbb{R}^q} \color{red}{\exp} \left[ \color{blue}{\log f\left(\mathbf{y}_i \vert \color{blue}{\mathbf{u}_i}\right) + \log f\left(\color{blue}{\mathbf{u}_i}\right)} \right] \ \mathrm{d}\mathbf{u}_{i}$$` ] ??? which can be written for convenience as the exponents in the log -- ### e.g. via the **Laplace Approximation** `\(\rightarrow \color{red}{\tilde{\mathcal{L}}\left(\mathbf{\theta}\right)}\)`. --- class: inverse, center, middle # Bootstrapping Clustered Data --- class: center, middle ### _Classical_ Bootstrap **does not respect the clustering structure** <img src="./img/Charts-002.png" width="600"> --- class: center, middle ### _Cluster_ Bootstrap **might be limited** by the **number of elements** at each level <img src="./img/Charts-003.png" width="800"> --- class: center, middle ### _Residual_ Bootstrap **struggles** with **random effects** and **link function** -- .pull-left[ Easy(-ish) in **Gaussian LMM** .content-box-lightgrey[ `$$\mathbf{x}_{ij}^T\widehat{\mathbf{\beta}} + \widehat{\sigma}_1\color{red}{\widehat{u}_{i}^{{\star}}} + \widehat{\phi}\color{red}{\widehat{\varepsilon}_{i}^{{\star}}} \longrightarrow y_{ij}^{\color{red}{\star}}$$` ] <!-- <img src="https://media.giphy.com/media/4JSAfzWfvMN36r1HG1/giphy.gif" style="width: 200px" /> --> <img src="https://media.giphy.com/media/1Zp770guwpG5UHONlD/giphy.gif" style="width: 200px" /> ] -- .pull-right[ Not that clear **in other cases** .content-box-lightgrey[ `$$\mathbf{x}_{ij}^T \widehat{\mathbf{\beta}} + \widehat{\sigma}_1\color{red}{\widehat{u}_{i}^{{\star}}} \overset{\mathbf{???}}{\rightarrow} g(\mathbb{E}[y_{ij}^{\color{red}{\star}}]) \overset{\mathbf{???}}{\rightarrow} y_{ij}^{\color{red}{\star}}$$` ] <!-- <img src="https://media.giphy.com/media/eNTxLwTGW7E64/giphy.gif" style="width: 200px" /> --> <img src="https://media.giphy.com/media/l0HlRnAWXxn0MhKLK/giphy.gif" style="width: 200px" /> ] --- class: center, middle <!-- <img src="./img/meme-lisapresenting.png" width="600"> --> ### _Parametric_ Bootstrap is the only **feasible** and **convenient** alternative ??? At this point in the talk it appears that Parametric Bootstrap (approach based on simulations) is the only feasible and convenient alternative --- class: inverse, center, middle # Bootstrapping Clustered Data via a Weighted Laplace Approximation* ??? That said, it also help us segue into the title of the talk -- (*yup! Our proposal!) ??? What is it that we propose? --- class: dark, center, middle ## Random Weighted Laplace Bootstrap (RWLB) -- ### **General** scheme relying on **externally generated _Random Weights_** --- # RWLB Algorithm ### 1. **Generate _Random Weights_** `\({\color{red}{w_{i}^{*}}}\)` -- ### 2. **Apply** to `\(\color{blue}{f(\mathbf{y}_i,\mathbf{u}_i)}\)` -- ### 3. **Evaluate via the _Laplace Approximation_** `\({\color{red}{\tilde{\mathcal{L}}_{i}^{*}}}\)` -- ### 4. Optimize -- ... -- ### 5. Profit! --- # (Some) Details ### 1. **Generate _Random Weights_** `\({\color{red}{w_{i}^{*}}}\)` .content-box-lightgrey[ `$$\mathbb{E}\left[{\color{red}{w^{*}_{i}}}\right] = \mathbb{V}ar\left[{\color{red}{w^{*}_{i}}}\right]=1 \qquad \text{e.g. } \mathcal{E}(1)$$` ] -- ### 2. **Apply** to `\(\color{blue}{f(\mathbf{y}_i,\mathbf{u}_i)}\)` .content-box-lightgrey[ `$$\mathcal{L}_i^{*} \left(\mathbf{\theta}\right) = \int_{\mathbb{R}^q} \exp \left\{- \left[ - {\color{red}{w_{i}^{*}}} \color{blue}{\log f\left(\mathbf{y}_i \vert \mathbf{u}_i\right)} - {\color{red}{w_{i}^{*}}} \log f\left(\color{blue}{\mathbf{u}_i}\right)\right] \right\} \mathrm{d}\mathbf{u}_{i}$$` ] -- ### 3.(4. and 5.) **Evaluate** : Package `TMB` (`C++` Templates and `R`) --- class: inverse, center, middle # (Some) Properties --- class: dark, center, middle # The Good
![](https://media.giphy.com/media/RXXTZtYKTubiU/giphy.gif) --- class: middle #
Flexible -- ## Can be applied to **a wide variety of Mixed Models** -- ###
Can deal with **Non-Gaussian Responses** and **Censored Data** ###
Software **implementation is straightforward** --- ###
Flexible .content-box-lightgrey[ `$$\mathcal{L}_i^{*} \left(\mathbf{\theta}\right) = \int_{\mathbb{R}^q} \exp \left\{- \left[ - {\color{red}{w_{i}^{*}}} \color{blue}{\log f\left(\mathbf{y}_i \vert \mathbf{u}_i\right)} - {\color{red}{w_{i}^{*}}} \log f\left(\color{blue}{\mathbf{u}_i}\right)\right] \right\} \mathrm{d}\mathbf{u}_{i}$$` ] -- .black[Example:] **Mixed Logit** .tothe-left[ `template.cpp`: application of `\(\color{red}{w_i^{*}}\)` ```c ... for(int j=0;j<ngroups;j++){ * res-= w[j]*dnorm(u[j], Type(0) ,sigma,1); } for(int i=0;i<nobs;i++){ j=group[i]; * res-= w[j]*dbinom(y[i],Type(1),Type(1/(1+exp(-mu[i]-u[j]))),1); } ... ``` ] -- .tothe-right[ `estimation.R`: Approximation and `\(\mathcal{L}(\mathbf{\theta})\)` ```r ... # Compilation * compile("template.cpp") # Loading to the Environment dyn.load(dynlib("template")) # AD Object creation obj <- MakeADFun(data=data, parameters=parameters, random="u", DLL="template" ) # Optimisation opt <-nlminb(obj$par, obj$fn, obj$gr) ... ``` ] --- class: middle #
Correct -- ## **Good approximation** of the **Asymptotic Distribution** -- ###
`\(\sqrt{n} \left(\widehat{\mathbf{\beta}}^{*} - \widehat{\beta}\right)\)` is
-- ###
`\(\sqrt{n} \left(\widehat{\mathbf{\sigma}}^{*} - \widehat{\sigma}\right)\)` is
--- class: dark, middle, center # Simulations -- # 1. **Mixed Logit** --- ### Simulation: **Mixed Logit** - **Parameters** : .black[4 Fixed Effects] ( `\(\beta_k\)` ) and .black[1 Random intercept] ( `\(\sigma_1\)` ) - **Number of Simulations**: 1000 with `\(B=1000\)` **Bootstrap Replicates** - **Studied methods**: .green[RWLB] vs .red[Parametric Bootstrap (PB)] - **Cluster size** ( `\(n_i\)` ): **7** and **21** - **Number of clusters** ( `\(n\)` ): **300** and **600** -- #### .black[Q-Q plots]: _Average_ **Bootstrap** replicates vs. **Empirical** Distributions .content-box-lightgrey[ `$$\sqrt{n}\left(\widehat{\beta}_{k}^{*}-\widehat{\beta}_{k}\right) \ \ \text{vs.} \ \ \sqrt{n}\left(\widehat{\beta}_{k} - \beta_{k}\right) \qquad \sqrt{n} \left(\widehat{\sigma}_1^{*} - \widehat{\sigma}_{1}\right) \ \ \text{vs.} \ \ \sqrt{n} \left(\widehat{\sigma}_{1} - \sigma_1\right)$$` ] ??? To illustrate these points, let me show you a comparison of the Mean Bootstrap distribution vs. the empirical by means of qq-plots --- class: center ### Fixed Effect Parameters `\(\beta_k\)` **Number of clusters** `\(n = 300\)` .pull-left[ **Cluster size** `\(n_i= 7\)` <img src="index_files/figure-html/unnamed-chunk-3-1.png" style="display: block; margin: auto;" /> ] ??? As we have said at the introduction of this section, the approximation of the bootstrap to the asymptotic distribution is very correct as shown by the alignment of the green lines over the identiy line. This is valid accross -- .pull-right[ **Cluster size** `\(n_i = 21\)` <img src="index_files/figure-html/unnamed-chunk-4-1.png" style="display: block; margin: auto;" /> ] --- class: center ### Random Effect Variance `\(\sigma_1^{2}\)` **Cluster size** `\(n_{i} = 7\)` .pull-left[ **Number of Clusters:** `\(n = 300\)` <img src="index_files/figure-html/unnamed-chunk-5-1.png" style="display: block; margin: auto;" /> ] .pull-right[ **Number of Clusters:** `\(n = 600\)` <img src="index_files/figure-html/unnamed-chunk-6-1.png" style="display: block; margin: auto;" /> ] --- class: center ### Random Effect Variance `\(\sigma_{1}^{2}\)` **Cluster size** `\(n_{i} = 14\)` .pull-left[ **Number of Clusters:** `\(n = 300\)` <img src="index_files/figure-html/unnamed-chunk-7-1.png" style="display: block; margin: auto;" /> ] .pull-right[ **Number of Clusters:** `\(n = 600\)` <img src="index_files/figure-html/unnamed-chunk-8-1.png" style="display: block; margin: auto;" /> ] --- class: center ### Random Effect Variance `\(\sigma_{1}^{2}\)` **Cluster size** `\(n_{i} = 21\)` .pull-left[ **Number of Clusters:** `\(n = 300\)` <img src="index_files/figure-html/unnamed-chunk-9-1.png" style="display: block; margin: auto;" /> ] .pull-right[ **Number of Clusters:** `\(n = 600\)` <img src="index_files/figure-html/unnamed-chunk-10-1.png" style="display: block; margin: auto;" /> ] --- class: center ### Coverage Ratios <img src="index_files/figure-html/unnamed-chunk-12-1.png" style="display: block; margin: auto;" /> --- class: center ### Coverage Ratios <img src="index_files/figure-html/unnamed-chunk-13-1.png" style="display: block; margin: auto;" /> --- class: center ### Coverage Ratios <img src="index_files/figure-html/unnamed-chunk-14-1.png" style="display: block; margin: auto;" /> --- class: dark, middle, center # Simulations -- # 2. **Log-Normal AFT** --- ### Simulated **Log-Normal AFT** - **Parameters**: 4 Fixed Effects ( `\(\beta_k\)` ), 1 Random intercept ( `\(\sigma_1\)` ) and a Scale ( `\(\phi\)` ) - Number of Simulations: 1000 with `\(B=1000\)` Bootstrap Replicates - **Studied methods**: .green[RWLB] vs .pink[Asymptotic based on the Laplace-Approximated Likelihood] - **Cluster sizes ( `\(n_i\)` )**: 4, 10, 20, 50 - **Number of clusters ( `\(n\)` )**: 20, 50, 100, 200 - **Degree of censoring**: 0% and 23% - **Signal-to-noise Ratio** `\(\displaystyle \frac{\sigma_{1}}{\phi} = 1/0.5 = 2\)` -- #### .black[Box-Plots]: Small-sample Bias Correction of `\(\widehat{\sigma_1}\)` .content-box-lightgrey[ `$$\widehat{\sigma}_1^{\text{BC}} = \widehat{\sigma}_1^{\text{ML}} - \frac{1}{B}\sum_{b=1}^B\left[ \color{forestgreen}{\widehat{\sigma}^{*}_{1}} - \widehat{\sigma}_1^{\text{ML}}\right]$$` ] --- class: center ### Boxplots of `\(\widehat{\sigma}_1\)` <img src="index_files/figure-html/unnamed-chunk-17-1.png" style="display: block; margin: auto;" /> --- class: center ### Boxplots of `\(\widehat{\sigma}_1\)` <img src="index_files/figure-html/unnamed-chunk-18-1.png" style="display: block; margin: auto;" /> --- class: center ### Boxplots of `\(\widehat{\phi}\)` <img src="index_files/figure-html/unnamed-chunk-19-1.png" style="display: block; margin: auto;" /> --- class: center ### Boxplots of `\(\widehat{\phi}\)` <img src="index_files/figure-html/unnamed-chunk-20-1.png" style="display: block; margin: auto;" /> --- class: dark, center, middle # The 'meh'
![](https://media.giphy.com/media/QC5jE83f3tBCg/giphy.gif) --- # Some Caveats ###
Can be **unstable** in **extreme Cases** -- - RWLB is tributary to the quality of the estimation method. - **Cluster Size** is **too small** and/or - **Signal-to-noise** ratio is **weak** --- class: center ### Simulated AFT - `\(\widehat{\sigma}_1\)` : Same as before <img src="index_files/figure-html/unnamed-chunk-21-1.png" style="display: block; margin: auto;" /> --- class: center ### Simulated AFT - `\(\widehat{\sigma}_1\)` : **s-t-n = 1** <img src="index_files/figure-html/unnamed-chunk-22-1.png" style="display: block; margin: auto;" /> --- class: center ### Simulated AFT - `\(\widehat{\sigma}_1\)` : **s-t-n = 0.4** <img src="index_files/figure-html/unnamed-chunk-23-1.png" style="display: block; margin: auto;" /> --- class: center ### Simulated AFT - `\(\widehat{\sigma}_1\)` : **s-t-n = 0.4, `\(n_{i}=2\)`** <img src="index_files/figure-html/unnamed-chunk-24-1.png" style="display: block; margin: auto;" /> --- # Some Caveats ###
**Variance** of some `\(\widehat{\mathbf{\sigma}}^{*}\)` **differs slightly** from the asymptotic. - Divergence **Depends on the relationship between**: - **Marginal Variance** of the Observations `\(\mathbb{V}ar(\mathbf{Y}_i) = \mathbf{\Sigma}_{i}\)`: and - **Contrasts of the Random Effects**: `\(\mathbf{Z}_{i}\)` -- - **In LMM**, this is _easy-ish_ to figure out. - Relationship **governed by** `\(\text{tr}(\mathbf{\Sigma}_{i}^{-1}\mathbf{Z}_{i}\mathbf{Z}_{i}^{T})\)` - **"Fuller" designs**, e.g. *Random Intercept* `\(\mathbf{Z}_{ir} = \mathbb{1}_{n_{i}}\)` and *Slope* `\(\mathbf{Z}_{ir} = \mathbf{x}^{(k)}_{i}\)` seem less affected. - **"Sparser"" designs** e.g. the *Random Error* `\(\varepsilon\)` with `\(\mathbf{Z}_{i0} = \mathbf{I}_{n_{i}}\)` seem to be **most divergent**. - Possible **fix**: increases in `\(n_i\)`. --- class: inverse, center, middle # Discussion and Current Work --- # Discussion ### RWLB has good properties as a bootstrapping method. -- -
**Flexible** to be applied in many fields. -
**Correct** to provide good inference. -- ### RWLB has a few drawbacks. -
**settings** where it appears to fail -
**improvements** to its estimation of variability. --- # Current Work #### 1. Data applications - Time-to-failure studies on patients submitting to dialysis under two different treatments. -- #### 2. Further software implementation of estimation methods in AFT - Software tool allowing for flexible estimation of AFT models for Clustered Data (different distributions for durations and Random effects) - Bootstrapping via RWLB is intended as a feature. -- #### 3. Improvements - REML-like-based estimation method and its effects on variance and bias. --- # References (and Associated Publications) F-A, Cantoni (2020), **Bootstrapping generalized linear mixed models via a weighted Laplace approximation**, - _Under Review_ F-A, Cantoni (2019), **Bootstrap estimation of uncertainty in prediction for generalized linear mixed models**, - _Computational Statistics & Data Analysis_, Volume 130, 2019, Pages 1-17
[10.1016/j.csda.2018.08.006](10.1016/j.csda.2018.08.006) F-A, Wolfe, Cantoni, Heritier (2020), **A flexible tool for inference of Accelerated Failure Time models with clustered data**, - Working Paper --- class: center, middle # Questions? --- class: center, middle # Remarks? --- class: center, middle # I'll stick around! --- class: center, middle # Thanks! Slides created via the `R` package [**xaringan**](https://github.com/yihui/xaringan). Research funded by the National Health and Medical Research Council (NHMRC) of Australia (APP 1128222) Travel Grant accorded by ViCBiostat and Monash University, Melbourne. --- class: inverse, center, middle # Appendix --- class: dark , center, middle # (Some) Theory --- # Comparison with GCB* **Generalised Cluster Bootstrap** for Gaussian LMM (Field, Pang, Welsh 2014) `$$\widehat{\mathbf{\theta}}^{*} = \underset{\theta}{\mathrm{arg}} \left\{ \frac{1}{\sqrt{n}}\sum_{i=1}^{n} \color{red}{w_{i}^{*}} \Psi_{i}\left(\mathbf{\theta}\right) = \mathbf{0} \right\}$$` -- .tothe-left[ .content-box-lightgrey[ .black[Theorem:] Under regularity conditions on `\(\Psi_i\left(\mathbf{\theta}\right)\)`, ensuring Asymptotic Normality i.e. `$$\sqrt{n}\left(\widehat{\mathbf{\theta}}-\mathbf{\theta}\right) \overset{\mathcal{D}}{\underset{n\rightarrow \infty}{\longrightarrow}} \mathcal{N}\left\{ \mathbf{0},\mathbf{M}^{-1}\mathbf{Q}\mathbf{M}^{-1}\right)$$` for fixed `\(n_i\)`, then: .content-box-fuchsia[ .white[ `$$\sqrt{n}\left(\widehat{\mathbf{\theta}}^{*}- \widehat{\mathbf{\theta}}\right)\overset{\mathcal{D}}{\underset{n\rightarrow\infty}{\longrightarrow}}\mathcal{N}\left(\mathbf{0},\widehat{\mathbf{M}}^{-1}\widehat{\mathbf{Q}}\widehat{\mathbf{M}}^{-1}\right)$$`] ] ] ] -- .tothe-right[ `$$\widehat{\mathbf{M}}^{-1}\widehat{\mathbf{Q}}\widehat{\mathbf{M}}^{-1}$$` Is the _Empirical_ **Sandwich Variance** <!-- <img src="https://media.giphy.com/media/jOuA5oDxeIWJi/giphy.gif" style="width: 150px" /> --> ] --- ### Elements of the **bread** ( `\(\mathbf{M}\)` ) and **patty** ( `\(\mathbf{Q}\)` ) matrices **depend on `\(\Psi_i\left(\mathbf{\theta}_0\right)\)`** or its **derivatives**: `\(Let k, l \in \{1, \dots d\}\)`, where `\(d = p + s\)`, `\(\mathbf{M} = \left( M_{kl}\right)_{k=1, l=1}^{p,s}\)`, `\(\mathbf{Q} = \left( Q_{kl}\right)_{k=1, l=1}^{p,s}\)` **Asymptotic Variance** .content-box-lightgrey[ `\begin{align} -\frac{1}{n} \sum_{i=1}^n \mathbb{E} \left\{ {\color{blue}{\partial_{\theta_{l}}\Psi^{(k)}_i\left(\mathbf{\theta}_0\right)}} \right\} &\underset{n\rightarrow \infty}{\longrightarrow} M_{kl}\\ \frac{1}{\sqrt{n}}\sum_{i=1}^n \mathbb{E} \left\{ {\color{blue}{\Psi^{(k)}_i\left(\mathbf{\theta}_0\right) \Psi^{(l)}_i\left(\mathbf{\theta}_0\right)}} \right\} &\underset{n\rightarrow \infty}{\longrightarrow} Q_{kl} \end{align}` ] --- ### Elements of the **bread** ( `\(\mathbf{M}\)` ) and **patty** ( `\(\mathbf{Q}\)` ) matrices **depend on `\(\Psi_i\left(\mathbf{\theta}_0\right)\)`** or its **derivatives**: `\(Let k, l \in \{1, \dots d\}\)`, where `\(d = p + s\)`, `\(\mathbf{M} = \left( M_{kl}\right)_{k=1, l=1}^{p,s}\)`, `\(\mathbf{Q} = \left( Q_{kl}\right)_{k=1, l=1}^{p,s}\)` **GCB** version: .content-box-lightgrey[ `\begin{align} -\frac{1}{n} \sum_{i=1}^n \color{blue}{\mathbb{E}^{*}}\left[\color{red}{w_{i}^{*}}\right] \left\{ \partial_{\theta_{l}}\Psi^{(k)}_i\left(\mathbf{\theta}_0\right) \right\} &\underset{n\rightarrow \infty}{\longrightarrow} M_{kl} \end{align}` `\begin{align} \frac{1}{n} &\sum_{i=1}^n \color{blue}{\mathbb{V}ar^{*}} \left[ \color{red}{w_{i}^{*}}\right] \Psi^{(k)}_i(\widehat{\mathbf{\theta}}) \Psi^{(l)}_i(\widehat{\mathbf{\theta}}) \nonumber \\ &+ \frac{1}{n}\sum_{i=1}^n\sum_{j\neq i} \mathbb{C}ov \left[ \color{red}{w_{i}^{*}}, \color{red}{w_{j}^{*}}\right] \left\{ \Psi^{(k)}_i(\widehat{\mathbf{\theta}}) \Psi^{(l)}_j(\widehat{\mathbf{\theta}}) \right\} \underset{n\rightarrow \infty}{\longrightarrow} Q_{kl} \end{align}` ] --- # GCB vs RWLB in _Random Effect_ LMM `$$\mathbf{y}_i = \mathbf{X}_{i}\mathbf{\beta} + \sum_{r=0}^{q} \sigma_r \mathbf{Z}_{ir}\mathbf{u}_{ir} \longrightarrow \begin{array}{l c l} \tilde{\Psi}^{\beta}_i\left(\mathbf{\theta}\right) &=&\mathbf{X}_i^T \mathbf{\Sigma}_i^{-1} \left(\mathbf{y}_i - \mathbf{X}_i\mathbf{\beta}\right), \\ \tilde{\Psi}^{\sigma^{2}_{r}}_i\left(\mathbf{\theta}\right) &=& -\frac{1}{2} \left( \mathbf{y}_i-\mathbf{X}_i\mathbf{\beta}\right)^T \mathbf{\Sigma}_i^{-1}\mathbf{Z}_{ir}\mathbf{Z}_{ir}^T\mathbf{\Sigma}_i^{-1} \left(\mathbf{y}_i-\mathbf{X}_i\mathbf{\beta}\right) + \frac{1}{2} \mathrm{tr}\left( \mathbf{\Sigma}_{i}^{-1} \mathbf{Z}_{ir} \mathbf{Z}_{ir}^T\right). \end{array}$$` -- .pull-left[ **GCB** `\begin{align} \tilde{\Psi}^{\beta}_i\left(\mathbf{\theta}\right) &=\color{red}{w_{i}^{*}}\mathbf{X}_i^T \mathbf{\Sigma}_i^{-1} \left(\mathbf{y}_i - \mathbf{X}_i\mathbf{\beta}\right), \\ \tilde{\Psi}^{\sigma^{2}_{r}}_i\left(\mathbf{\theta}\right) &= -\frac{\color{red}{w_{i}^{*}}}{2} \left( \mathbf{y}_i-\mathbf{X}_i\mathbf{\beta}\right)^T \mathbf{\Sigma}_i^{-1}\mathbf{Z}_{ir}\mathbf{Z}_{ir}^T\mathbf{\Sigma}_i^{-1} \left(\mathbf{y}_i-\mathbf{X}_i\mathbf{\beta}\right) \\ &+ \frac{\color{red}{w_{i}^{*}}}{2} \color{blue}{\mathrm{tr}\left( \mathbf{\Sigma}_{i}^{-1} \mathbf{Z}_{ir} \mathbf{Z}_{ir}^T\right).} \end{align}` ] .pull-right[ **RWLB** (after tedious derivation) .content-box-lightgrey[ `\begin{align} \tilde{\Psi}^{\beta}_i\left(\mathbf{\theta}\right) &=\color{red}{w_{i}^{*}}\mathbf{X}_i^T \mathbf{\Sigma}_i^{-1} \left(\mathbf{y}_i - \mathbf{X}_i\mathbf{\beta}\right), \\ \tilde{\Psi}^{\sigma^{2}_{r}}_i\left(\mathbf{\theta}\right) &= -\frac{\color{red}{w_{i}^{*}}}{2} \left( \mathbf{y}_i-\mathbf{X}_i\mathbf{\beta}\right)^T \mathbf{\Sigma}_i^{-1}\mathbf{Z}_{ir}\mathbf{Z}_{ir}^T\mathbf{\Sigma}_i^{-1} \left(\mathbf{y}_i-\mathbf{X}_i\mathbf{\beta}\right) \\ &+ \color{blue}{\frac{1}{2} \mathrm{tr}\left( \mathbf{\Sigma}_{i}^{-1} \mathbf{Z}_{ir} \mathbf{Z}_{ir}^T\right).} \end{align}` ] ] --- class: dark, middle, center # Simulations -- # 3. **(Gaussian)** Linear Mixed Model (LMM) --- ### Simulation: **Linear Mixed Model** - **Parameters**: .black[4 Fixed Effects] ( `\(\beta_k\)` ) .black[Random intercept] ( `\(\sigma_1\)` ), .black[slope] ( `\(\sigma_2\)` ) and .black[Scale] ( `\(\phi\)` ) - **Number of Simulations**: 1000 with `\(B=1000\)` **Bootstrap Replicates** - **Studied methods**: .green[RWLB] vs .fuchsia[Parametric Bootstrap (PB)] - **Cluster size** ( `\(n_i\)` ): **4** and **12** - **Number of clusters** ( `\(n\)` ): **100** and **250** -- ### .black[Q-Q plots]: _Average_ **Bootstrap** replicates vs. **Empirical** Distributions .content-box-lightgrey[ `$$\sqrt{n} \left(\widehat{\sigma}_1^{*} - \widehat{\sigma}_{1}\right) \ \ \text{vs.} \ \ \sqrt{n} \left(\widehat{\sigma}_{1} - \sigma_1\right)$$` ] --- class: center ### Random **Intercept** Variance `\(\sigma_{1}^{2}\)` **Cluster size** `\(n_i=4\)` .pull-left[ **Number of Clusters:** `\(n=100\)` <img src="index_files/figure-html/unnamed-chunk-27-1.png" style="display: block; margin: auto;" /> ] .pull-right[ **Number of Clusters:** `\(n=250\)` <img src="index_files/figure-html/unnamed-chunk-28-1.png" style="display: block; margin: auto;" /> ] --- class: center ### Random **Intercept** Variance `\(\sigma_1^{2}\)` **Cluster size** `\(n_i=12\)` .pull-left[ **Number of Clusters:** `\(n=100\)` <img src="index_files/figure-html/unnamed-chunk-29-1.png" style="display: block; margin: auto;" /> ] .pull-right[ **Number of Clusters:** `\(n=250\)` <img src="index_files/figure-html/unnamed-chunk-30-1.png" style="display: block; margin: auto;" /> ] --- class: center ### Random **Slope** Variance `\(\sigma_{1}^{2}\)` **Cluster size** `\(n_i=4\)` .pull-left[ **Number of Clusters:** `\(n=100\)` <img src="index_files/figure-html/unnamed-chunk-31-1.png" style="display: block; margin: auto;" /> ] .pull-right[ **Number of Clusters:** `\(n=250\)` <img src="index_files/figure-html/unnamed-chunk-32-1.png" style="display: block; margin: auto;" /> ] --- class: center ### Random **Slope** Variance `\(\sigma_{1}^{2}\)` **Cluster size** `\(n_i=12\)` .pull-left[ **Number of Clusters:** `\(n=100\)` <img src="index_files/figure-html/unnamed-chunk-33-1.png" style="display: block; margin: auto;" /> ] .pull-right[ **Number of Clusters:** `\(n=250\)` <img src="index_files/figure-html/unnamed-chunk-34-1.png" style="display: block; margin: auto;" /> ] --- class: center ### **Scale** `\(\phi\)` **Cluster size** `\(n_i=4\)` .pull-left[ **Number of Clusters:** `\(n=100\)` <img src="index_files/figure-html/unnamed-chunk-36-1.png" style="display: block; margin: auto;" /> ] .pull-right[ **Number of Clusters:** `\(n=250\)` <img src="index_files/figure-html/unnamed-chunk-37-1.png" style="display: block; margin: auto;" /> ] --- class: center ### **Scale** `\(\phi\)` **Cluster size** `\(n_i=12\)` .pull-left[ **Number of Clusters:** `\(n=100\)` <img src="index_files/figure-html/unnamed-chunk-38-1.png" style="display: block; margin: auto;" /> ] .pull-right[ **Number of Clusters:** `\(n=250\)` <img src="index_files/figure-html/unnamed-chunk-39-1.png" style="display: block; margin: auto;" /> ]