Multilevel modelling

class: center, middle, inverse, title-slide

# Multilevel modelling
### 2022/04/26

---

# Multilevel data

There are *many* situations in psychology where we have *nested* data.

Typical cognitive experiments show participants many repeats of similar trials.

![](images/experiment_outlook.png)

---
# Multilevel data

Intervention studies are typically longitudinal - the same people are tested multiple times on the same outcome measure.

![](images/schools.png)

In this example, each pupil is a unit of observation.

But these pupils are not fully independent from each other - pupils who attend one school tend to be more similar to each other than they are to pupils who attend other schools.

Thus, *pupils* (Level 1) are nested in *schools* (Level 2).

---
# Multilevel data

![](images/longitudinal.png)

Other data may be *longitudinal*. For example, you may measure outcomes such as, for example, performance or attitudes on repeated occasions to see how they vary over time.

The measurements each week are the main unit of observation, but they are nested within subjects.

---
# Clustered data

.pull-left[
<img src="4-multilevel-modelling_files/figure-html/non-clust-1.png" width="504" />
]
.pull-right[
.large[
Data from nested designs like those we have just seen often have *clusters* of correlated observations.

Different people have different reaction speeds, or baseline attitudes; different schools have different teachers and different general environments.
]
]

---
# Clustered data

.pull-left[
<img src="4-multilevel-modelling_files/figure-html/corr-subj-1.png" width="504" />
]
.pull-right[
.large[
Data from nested designs like those we have just seen often have *clusters* of correlated observations.

Different people have different reaction speeds, or baseline attitudes; different schools have different teachers and different general environments.
]
]

---
class: center, middle, inverse
# The problem with nesting

---
# sleepstudy

.pull-left[

```r
head(sleepstudy, 12)
```

```
##    Reaction Days Subject
## 1  249.5600    0     308
## 2  258.7047    1     308
## 3  250.8006    2     308
## 4  321.4398    3     308
## 5  356.8519    4     308
## 6  414.6901    5     308
## 7  382.2038    6     308
## 8  290.1486    7     308
## 9  430.5853    8     308
## 10 466.3535    9     308
## 11 222.7339    0     309
## 12 205.2658    1     309
```
]
.pull-right[
The *sleepstudy* dataset contains data from a sleep deprivation experiment.

Over the course of ten days, subjects were only allowed to sleep for 3 hours each night.

Each day their reaction times on a variety of cognitive tasks were recorded.

This is a *nested*, multilevel design.

Each observation - average RT on a given day - is nested within a *subject*.
]

---
# sleepstudy

.pull-left[
<img src="4-multilevel-modelling_files/figure-html/unnamed-chunk-2-1.png" width="504" />
]
.pull-right[
We could simply fit a linear model to the whole dataset.

```r
basic_lm <- lm(Reaction ~ Days,
 data = sleepstudy)
basic_lm
```

```
## 
## Call:
## lm(formula = Reaction ~ Days, data = sleepstudy)
## 
## Coefficients:
## (Intercept)         Days  
##      251.41        10.47
```
]

---

.panelset[
.panel[.panel-name[Joint graph]
.pull-left[
<img src="4-multilevel-modelling_files/figure-html/unnamed-chunk-4-1.png" width="504" />
]
.pull-right[
.large[
But the data clearly has more structure than that!

Here each dot is coloured to show which participant contributed which data points.
]
]
]
.panel[.panel-name[Split panels]
.pull-left[
<img src="4-multilevel-modelling_files/figure-html/unnamed-chunk-5-1.png" width="504" />
]
.pull-right[
.large[
If we split the plot up to show each subject separately, we get more sense of the variability.

For example, Subject 308 shows a very strong effect of sleep deprivation on reaction time, while Subject 309 shows very little effect of sleep deprivation.
]
]
]
.panel[.panel-name[All the things]
.pull-left[
<img src="4-multilevel-modelling_files/figure-html/unnamed-chunk-6-1.png" width="504" />
]
.pull-right[
.large[
Our simple linear model ignores the fact that many of our observations are repeated measurements from each participant.

It assumes the effect is the same for everyone.

There are 18 participants in this study. Some of them are generally faster or slower than others; some of them show more effect of sleep deprivation than others.
]
]
]
]

---
# Simpson's paradox

.pull-left[
<img src="4-multilevel-modelling_files/figure-html/unnamed-chunk-7-1.png" width="504" />
]
.pull-right[
This data has a correlation coefficient of

**-0.5**

As V1 increases, V2 decreases!
]

???

This example is based on https://easystats.github.io/correlation/articles/multilevel.html

Makowski, D., Ben-Shachar, M. S., Patil, I., & Lüdecke, D. (2019).
Methods and Algorithms for Correlation Analysis in R. Journal of Open
Source Software, 5(51), 2306. doi:10.21105/joss.02306

---
# Simpson's paradox

.pull-left[
<img src="4-multilevel-modelling_files/figure-html/unnamed-chunk-8-1.png" width="504" />
]
.pull-right[
.large[
But wait!

What is this?

There are five different groups of people?
]
]

???

This example is based on https://easystats.github.io/correlation/articles/multilevel.html

Makowski, D., Ben-Shachar, M. S., Patil, I., & Lüdecke, D. (2019).
Methods and Algorithms for Correlation Analysis in R. Journal of Open
Source Software, 5(51), 2306. doi:10.21105/joss.02306

---
# Simpson's paradox

.pull-left[
<img src="4-multilevel-modelling_files/figure-html/unnamed-chunk-9-1.png" width="504" />
]
.pull-right[
Within each group, the correlation is the other way round - as V1 increase, V2 also increases!

This is known as **Simpson's paradox**, or the *ecological fallacy*.

The effect if grouping is ignored is the *reverse* of the effect in each individual group.
]

???

This example is based on https://easystats.github.io/correlation/articles/multilevel.html

Makowski, D., Ben-Shachar, M. S., Patil, I., & Lüdecke, D. (2019).
Methods and Algorithms for Correlation Analysis in R. Journal of Open
Source Software, 5(51), 2306. doi:10.21105/joss.02306

---
class: inverse, middle, center
# Estimating multilevel models

---
# Multilevel models

Multilevel models allow us to account for the nested, correlated nature of the data, and explicitly model the variability between people.

You may also see them called:
- Hierarchical models
- Mixed-effects models
- Random-effects models
- Mixed models

---
# Multilevel models using lme4

The most important library for fitting this type of model is `lme4`.

A multilevel model can be fitted with the `lmer()` function.

```r
library(lme4)
multilev <- 
 lmer(Reaction ~ 1 + Days + (1 + Days | Subject), 
 data = sleepstudy)
```

lmer(Reaction ~ .blue[1 + Days] + (.red[1 + Days| Subject]),
     data = sleepstudy)

*.blue[Fixed] effects* are highlighted in .blue[blue].

*.red[Random] effects* are highlighted in .red[red].

---
# Fixed and random effects

.large[  
*Fixed effects* are the *population-average* effect: e.g. the *average* effect of days of sleep deprivation on reaction time.

*Random effects* are those that vary across the *sampling units*. e.g. the variation in average reaction time across people

They are *random* because the *sampling units* are randomly drawn from a wider *population*. e.g. the specific participants in an experiment are usually a random subset of all possible participants
]

---
class: inverse, middle, center
# Random intercepts

---
# Individual intercepts

.panelset[
.panel[.panel-name[Individual intercepts]
.pull-left[
<img src="4-multilevel-modelling_files/figure-html/unnamed-chunk-11-1.png" width="432" />
]
.pull-right[
The black line on this plot shows the overall **mean** reaction time. This is the *intercept* of the basic model.

Each coloured line on this plot shows an individual participant's **mean** reaction time.
]
]
.panel[.panel-name[Split by subject]
.pull-left[
<img src="4-multilevel-modelling_files/figure-html/indiv-ints-1.png" width="504" />
]
.pull-right[
If we look at the plots individually for each subject, we can see the individual intercepts a little more clearly.

Some people are faster on average than the overall mean, while others are slower.

A *random-intercept* model models that variability!

]
]
]

---
# Modelling random intercepts

Remember that in our basic model, the *intercept* represents the mean reaction time.

We can model the variability of the intercept better by including a *random effect* term - *(1 | Subject)*.

```r
int_only <- 
 lmer(Reaction ~ 1 + Days + (1 | Subject),
 data = sleepstudy) # Random intercept
```

This model is a *random-intercept* model - it captures how mean reaction times vary across subjects by finding their individual mean reaction times.

---

```r
summary(int_only)
```

```
## Linear mixed model fit by REML ['lmerMod']
## Formula: Reaction ~ 1 + Days + (1 | Subject)
##    Data: sleepstudy
## 
## REML criterion at convergence: 1786.5
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.2257 -0.5529  0.0109  0.5188  4.2506 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Subject  (Intercept) 1378.2   37.12   
##  Residual              960.5   30.99   
## Number of obs: 180, groups:  Subject, 18
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept) 251.4051     9.7467   25.79
## Days         10.4673     0.8042   13.02
## 
## Correlation of Fixed Effects:
##      (Intr)
## Days -0.371
```

---

```r
tab_model(basic_lm, int_only, dv.labels = c("Reaction time (ms)", "Reaction time (ms)"))
```

<table style="border-collapse:collapse; border:none;">
<tr>
<th style="border-top: double; text-align:center; font-style:normal; font-weight:bold; padding:0.2cm; text-align:left; ">&nbsp;</th>
<th colspan="3" style="border-top: double; text-align:center; font-style:normal; font-weight:bold; padding:0.2cm; ">Reaction time (ms)</th>
<th colspan="3" style="border-top: double; text-align:center; font-style:normal; font-weight:bold; padding:0.2cm; ">Reaction time (ms)</th>
</tr>
<tr>
<td style=" text-align:center; border-bottom:1px solid; font-style:italic; font-weight:normal; text-align:left; ">Predictors</td>
<td style=" text-align:center; border-bottom:1px solid; font-style:italic; font-weight:normal; ">Estimates</td>
<td style=" text-align:center; border-bottom:1px solid; font-style:italic; font-weight:normal; ">CI</td>
<td style=" text-align:center; border-bottom:1px solid; font-style:italic; font-weight:normal; ">p</td>
<td style=" text-align:center; border-bottom:1px solid; font-style:italic; font-weight:normal; ">Estimates</td>
<td style=" text-align:center; border-bottom:1px solid; font-style:italic; font-weight:normal; ">CI</td>
<td style=" text-align:center; border-bottom:1px solid; font-style:italic; font-weight:normal; col7">p</td>
</tr>
<tr>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:left; ">(Intercept)</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">251.41</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">238.36&nbsp;&ndash;&nbsp;264.45</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">&lt;0.001</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">251.41</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">232.17&nbsp;&ndash;&nbsp;270.64</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; col7">&lt;0.001</td>
</tr>
<tr>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:left; ">Days</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">10.47</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">8.02&nbsp;&ndash;&nbsp;12.91</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">&lt;0.001</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">10.47</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">8.88&nbsp;&ndash;&nbsp;12.05</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; col7">&lt;0.001</td>
</tr>
<tr>
<td colspan="7" style="font-weight:bold; text-align:left; padding-top:.8em;">Random Effects</td>
</tr>

<tr>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:left; padding-top:0.1cm; padding-bottom:0.1cm;">&sigma;2</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:left;" colspan="3">&nbsp;</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:left;" colspan="3">960.46</td>
</tr>

<tr>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:left; padding-top:0.1cm; padding-bottom:0.1cm;">&tau;00</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:left;" colspan="3">&nbsp;</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:left;" colspan="3">1378.18 Subject</td>

<tr>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:left; padding-top:0.1cm; padding-bottom:0.1cm;">N</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:left;" colspan="3">&nbsp;</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:left;" colspan="3">18 Subject</td>
<tr>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:left; padding-top:0.1cm; padding-bottom:0.1cm; border-top:1px solid;">Observations</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:left; border-top:1px solid;" colspan="3">180</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:left; border-top:1px solid;" colspan="3">180</td>
</tr>
<tr>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:left; padding-top:0.1cm; padding-bottom:0.1cm;">R2 / R2 adjusted</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:left;" colspan="3">0.286 / 0.282</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:left;" colspan="3">0.280 / 0.704</td>
</tr>

</table>

---

### Standard linear model

```
##              Estimate Std. Error   t value     Pr(>|t|)
## (Intercept) 251.40510   6.610154 38.033169 2.156888e-87
## Days         10.46729   1.238195  8.453663 9.894096e-15
```

### Intercept only mixed-model

```
##              Estimate Std. Error  t value
## (Intercept) 251.40510  9.7467163 25.79383
## Days         10.46729  0.8042214 13.01543
```

The *standard errors* differ, which means the *t-values* differ.

The *intercept* variability increased, while the `Days` variability decreased!

---
# Random effects

.pull-left[
The *fixed* effects give us a measure of average performance and the overall effect of Days of sleep deprivation on RT.

```r
fixef(int_only)
```

```
## (Intercept)        Days 
##   251.40510    10.46729
```
]
.pull-right[
The *random* effects tell us how much variability there is *between-participants*.

```r
summary(int_only)$varcor
```

```
##  Groups   Name        Std.Dev.
##  Subject  (Intercept) 37.124  
##  Residual             30.991
```
]

---
#A quick look at the residuals

.panelset[
.panel[.panel-name[Standard model]
.pull-left[

```r
library(performance)
plot(check_normality(basic_lm), type = "qq")
```

<img src="4-multilevel-modelling_files/figure-html/unnamed-chunk-19-1.png" width="504" />
]
.pull-right[
These residuals don't look great - our predictions at each end are quite poor, suggesting the model is systematically failing at modelling high or low reaction times.

This suggests there's some structure not being captured by the model.
]
]
.panel[.panel-name[Mixed model]
.pull-left[

```r
plot(check_normality(int_only), type = "qq")
```

<img src="4-multilevel-modelling_files/figure-html/unnamed-chunk-20-1.png" width="504" />
]
.pull-right[
This model - the *random intercept* model - seems to be doing a better job than our basic linear model.

The points now lie almost entirely along the line.

This indicates a better correspondence between the model predictions and the actual data!
]
]
]

---
#A quick look at the residuals

.panelset[
.panel[.panel-name[Standard model]
.pull-left[

```r
plot(check_heteroscedasticity(basic_lm))
```

<img src="4-multilevel-modelling_files/figure-html/unnamed-chunk-21-1.png" width="504" />
]
.pull-right[
This looks a little odd. There is clear grouping, in that it looks a lot like each persons data is grouped together.

There is a clear shift in the green line - as you go further to the right, the amount of variability is increases.
]
]
.panel[.panel-name[Mixed model]
.pull-left[

```r
plot(check_heteroscedasticity(int_only))
```

<img src="4-multilevel-modelling_files/figure-html/unnamed-chunk-22-1.png" width="504" />
]
.pull-right[
This is better, but still shows some issues.

The dots seem to *curve* somewhat.

This suggests there is still something not quite right in our model.
]
]
]

---
class: inverse, middle, center
# Random slopes

---
# Individual slopes

.pull-left[
<img src="4-multilevel-modelling_files/figure-html/indiv-regs-1.png" width="504" />
]
.pull-right[
This plot now show individual plots for each participant with the individual effect of `Days` added.

The general trend is consistent, but it's clear that some participants have stronger effects than others.

And it looks a little like people who are generally fast responders show *less* effect of `Days` of sleep deprivation.
]

---
# Modelling random slopes

We can model how much the effect of `Days` varies between participants by adding  *random slopes* to our model - `(Days | Subject)`.

```r
random_slope <- lmer(Reaction ~ 1 + Days + (1 + Days | Subject), 
 data = sleepstudy)
```

Note that `Days` now appears twice.

The first time models the *population-average* effect of `Days`.

The second time models the *individual* effect of `Days`.

---

```r
summary(random_slope)
```

```
## Linear mixed model fit by REML ['lmerMod']
## Formula: Reaction ~ 1 + Days + (1 + Days | Subject)
##    Data: sleepstudy
## 
## REML criterion at convergence: 1743.6
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.9536 -0.4634  0.0231  0.4634  5.1793 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr
##  Subject  (Intercept) 612.10   24.741       
##           Days         35.07    5.922   0.07
##  Residual             654.94   25.592       
## Number of obs: 180, groups:  Subject, 18
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  251.405      6.825  36.838
## Days          10.467      1.546   6.771
## 
## Correlation of Fixed Effects:
##      (Intr)
## Days -0.138
```

---

<table style="border-collapse:collapse; border:none;">
<tr>
<th style="border-top: double; text-align:center; font-style:normal; font-weight:bold; padding:0.2cm; text-align:left; ">&nbsp;</th>
<th colspan="3" style="border-top: double; text-align:center; font-style:normal; font-weight:bold; padding:0.2cm; ">Reaction times (ms)</th>
<th colspan="3" style="border-top: double; text-align:center; font-style:normal; font-weight:bold; padding:0.2cm; ">Reaction times (ms)</th>
</tr>
<tr>
<td style=" text-align:center; border-bottom:1px solid; font-style:italic; font-weight:normal; text-align:left; ">Predictors</td>
<td style=" text-align:center; border-bottom:1px solid; font-style:italic; font-weight:normal; ">Estimates</td>
<td style=" text-align:center; border-bottom:1px solid; font-style:italic; font-weight:normal; ">CI</td>
<td style=" text-align:center; border-bottom:1px solid; font-style:italic; font-weight:normal; ">p</td>
<td style=" text-align:center; border-bottom:1px solid; font-style:italic; font-weight:normal; ">Estimates</td>
<td style=" text-align:center; border-bottom:1px solid; font-style:italic; font-weight:normal; ">CI</td>
<td style=" text-align:center; border-bottom:1px solid; font-style:italic; font-weight:normal; col7">p</td>
</tr>
<tr>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:left; ">(Intercept)</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">251.41</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">232.17&nbsp;&ndash;&nbsp;270.64</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">&lt;0.001</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">251.41</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">237.94&nbsp;&ndash;&nbsp;264.87</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; col7">&lt;0.001</td>
</tr>
<tr>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:left; ">Days</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">10.47</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">8.88&nbsp;&ndash;&nbsp;12.05</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">&lt;0.001</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">10.47</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; ">7.42&nbsp;&ndash;&nbsp;13.52</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:center; col7">&lt;0.001</td>
</tr>
<tr>
<td colspan="7" style="font-weight:bold; text-align:left; padding-top:.8em;">Random Effects</td>
</tr>

<tr>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:left; padding-top:0.1cm; padding-bottom:0.1cm;">&sigma;2</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:left;" colspan="3">960.46</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:left;" colspan="3">654.94</td>
</tr>

<tr>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:left; padding-top:0.1cm; padding-bottom:0.1cm;">&tau;00</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:left;" colspan="3">1378.18 Subject</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:left;" colspan="3">612.10 Subject</td>

<tr>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:left; padding-top:0.1cm; padding-bottom:0.1cm;">&tau;11</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:left;" colspan="3">&nbsp;</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:left;" colspan="3">35.07 Subject.Days</td>

<tr>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:left; padding-top:0.1cm; padding-bottom:0.1cm;">&rho;01</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:left;" colspan="3">&nbsp;</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:left;" colspan="3">0.07 Subject</td>

<tr>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:left; padding-top:0.1cm; padding-bottom:0.1cm;">N</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:left;" colspan="3">18 Subject</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:left;" colspan="3">18 Subject</td>
<tr>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:left; padding-top:0.1cm; padding-bottom:0.1cm; border-top:1px solid;">Observations</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:left; border-top:1px solid;" colspan="3">180</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:left; border-top:1px solid;" colspan="3">180</td>
</tr>
<tr>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:left; padding-top:0.1cm; padding-bottom:0.1cm;">Marginal R2 / Conditional R2</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:left;" colspan="3">0.280 / 0.704</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:left;" colspan="3">0.279 / 0.799</td>
</tr>

</table>

---
# Model comparisons

Is this model an improvement? Use `anova()` to check!

```r
anova(int_only, random_slope)
```

```
## refitting model(s) with ML (instead of REML)
```

```
## Data: sleepstudy
## Models:
## int_only: Reaction ~ 1 + Days + (1 | Subject)
## random_slope: Reaction ~ 1 + Days + (1 + Days | Subject)
##              npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)    
## int_only        4 1802.1 1814.8 -897.04   1794.1                         
## random_slope    6 1763.9 1783.1 -875.97   1751.9 42.139  2  7.072e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

(it's significant, so yes!)

---
# A quick look at the residuals

.pull-left[

```r
plot(check_heteroscedasticity(random_slope))
```

<img src="4-multilevel-modelling_files/figure-html/unnamed-chunk-27-1.png" width="504" />
]
.pull-right[
.large[
These residuals are the best of all so far.

A few points look suspiciously like outliers, but overall, there's little to suggest any particular problems with this model!
]
]
---

```r
check_model(random_slope)
```

---
class: middle, center, inverse
# Multiple random effects

---
# The "language as fixed-effect" fallacy

A common circumstance in psychological research is that we have more than one random effect.

For example, in language experiments, subject often need to read a many different words; these may be words from different categories, or vary in other ways.

These words themselves are random samples, but many researchers treat them as being *fixed*.

[Clark, 1973](https://www.sciencedirect.com/science/article/pii/S0022537173800143)

---
# The `politeness` study

Winter and Grawunder (2012) looked at the relationship between vocal pitch and the level of politeness of a sentence.

Participants were asked to imagine how they would respond to a variety of scenarios when talking politely or informally.

```r
politeness <- read_csv("data/politeness_data.csv")
head(politeness)
```

```
## # A tibble: 6 x 5
## subject gender scenario attitude frequency
## <chr> <chr> <dbl> <chr> <dbl>
## 1 F1 F 1 pol 213.
## 2 F1 F 1 inf 204.
## 3 F1 F 2 pol 285.
## 4 F1 F 2 inf 260.
## 5 F1 F 3 pol 204.
## 6 F1 F 3 inf 287.
```

---
# The `politeness` study

In the `politeness` study, there are *two* distinct groupings:

1) Subjects repeat the same task (imaging a scenario) over and over again

2) Individual scenarios are repeated by different subjects

Thus there are *two* possible sources of correlated data - we'd expect responses to particular scenarios to be fairly consistent across subjects, and responses by individual subjects to be fairly consistent across items,

---

.panelset[
.panel[.panel-name[All data]

```
## Warning: Removed 1 rows containing missing values (geom_point).
```

<img src="4-multilevel-modelling_files/figure-html/unnamed-chunk-30-1.png" width="576" style="display: block; margin: auto;" />
]
.panel[.panel-name[Subject]

```
## Warning: Removed 1 rows containing missing values (geom_point).
```

<img src="4-multilevel-modelling_files/figure-html/unnamed-chunk-31-1.png" width="576" style="display: block; margin: auto;" />
]
.panel[.panel-name[Scenario]

```
## Warning: Removed 1 rows containing missing values (geom_point).
```

<img src="4-multilevel-modelling_files/figure-html/unnamed-chunk-32-1.png" width="576" style="display: block; margin: auto;" />
]
.panel[.panel-name[Both]

```
## Warning: Removed 1 rows containing missing values (geom_point).
```

<img src="4-multilevel-modelling_files/figure-html/unnamed-chunk-33-1.png" width="576" style="display: block; margin: auto;" />
]
]

---
# Variability between subjects

.pull-left[

```r
boxplot(frequency ~ subject,
        data = politeness)
```

<img src="4-multilevel-modelling_files/figure-html/unnamed-chunk-34-1.png" width="504" />
]
.pull-right[
Individual participants vary in their baseline vocal frequency.

Male participants typically have lower frequency voices than female participants.
]

---
# Variability between scenarios

.pull-left[

```r
boxplot(frequency ~ scenario,
        data = politeness)
```

<img src="4-multilevel-modelling_files/figure-html/unnamed-chunk-35-1.png" width="504" />
]
.pull-right[
There seems to be some variability across scenarios.

Scenario 7 seems consistently lower than scenario 4, for example.

But there does seem to be less variability than across participants.

]

---
# Multiple random effects

We can model *both* of these sources of variability simultaneously by adding *multiple random effects*.

```r
full_mod <- lmer(frequency ~ attitude + (1|subject) + (1|scenario),
 data = politeness)
```

Whereas before we only added `(1|subject)`, here we also add `(1|scenario)`.

This models separate intercepts for each subject and each scenario, allowing for, for example, high-pitched individuals or scenarios that typically elicit low-pitched responses.

---

```
## Linear mixed model fit by REML ['lmerMod']
## Formula: frequency ~ attitude + (1 | subject) + (1 | scenario)
##    Data: politeness
## 
## REML criterion at convergence: 793.5
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.2006 -0.5817 -0.0639  0.5625  3.4385 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  scenario (Intercept)  219     14.80   
##  subject  (Intercept) 4015     63.36   
##  Residual              646     25.42   
## Number of obs: 83, groups:  scenario, 7; subject, 6
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  202.588     26.754   7.572
## attitudepol  -19.695      5.585  -3.527
## 
## Correlation of Fixed Effects:
##             (Intr)
## attitudepol -0.103
```

---

<tr>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:left; padding-top:0.1cm; padding-bottom:0.1cm;">&tau;00 scenario</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:left;" colspan="3">218.98</td>

<tr>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:left; padding-top:0.1cm; padding-bottom:0.1cm;">&tau;00 subject</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:left;" colspan="3">4014.54</td>

<tr>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:left; padding-top:0.1cm; padding-bottom:0.1cm;">N subject</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:left;" colspan="3">6</td>

<tr>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:left; padding-top:0.1cm; padding-bottom:0.1cm;">N scenario</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:left;" colspan="3">7</td>
<tr>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:left; padding-top:0.1cm; padding-bottom:0.1cm; border-top:1px solid;">Observations</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:left; border-top:1px solid;" colspan="3">83</td>
</tr>
<tr>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; text-align:left; padding-top:0.1cm; padding-bottom:0.1cm;">Marginal R2 / Conditional R2</td>
<td style=" padding:0.2cm; text-align:left; vertical-align:top; padding-top:0.1cm; padding-bottom:0.1cm; text-align:left;" colspan="3">0.020 / 0.870</td>
</tr>

</table>

---
class: center, middle, inverse
# Some final words and references

---
# Generalized linear mixed effects models

As discussed last week, there are many types of data for which a linear model is *inappropriate*.

Fortunately, we can fit **generalized linear mixed effects models** too!

`glmer(DV ~ IV1 + IV2 + (IV1 | random_factor), family = binomial(), data = your_data)`

---
# Additional reading

[Complete vs Partial vs no pooling](https://www.tjmahr.com/plotting-partial-pooling-in-mixed-effects-models/)

[An introduction to mixed models](https://gkhajduk.github.io/2017-03-09-mixed-models/)

[Keep it Maximal](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3881361/)

[Generalizing over encounters: statistical and theoretical considerations](https://psyarxiv.com/mcrzu/)

[Understanding mixed-effects models through data simulation](https://journals.sagepub.com/doi/full/10.1177/2515245920965119)