Preliminary lmer models

Aims

This vignette demonstrates and explains our reasoning during preliminary data analysis method. The aims of this document were to explore the role of ‘site’ on conidial dispersal.

library("ChickpeaAscoDispersal")
library("tidyverse")
library("lme4")

From this data we hope to interpret the following.

Did experimental location (site) affect conidial dispersal.

In this experiment there are a number of factors which may influence conidia spread. These include:

Experimental location (site).
The time at which a spread event occurred (SpEv).
- The factor SpEv would be nested within site.
- SpEv factor may describe variation in the data that varies between SpEv, such as weather and climate variables.
Wind speed during the spread event.
Wind direction during the spread event.
Distance the trap plants were placed from the ascochyta infested plots (distance).
The bearing in which the trap plants were placed at distances relative to the infested plots (transect).
How the spread event was initiated, with sprinkler irrigation or rainfall.
The quantity of rainfall.

Due to the lack of replicated pots at some of the distances we will ignore transect as a factor. We know wind direction will influence our results and we will need to accept that adds variation for which we may not be able to account for statistically.

I will start using lmer() to analyse the mean number of lesions per plant at each distance. The reps at each distance are defined by ‘pot’, each pot contains three to five chickpea plants. The factor distance is fit as a continuous variable.

Site is a categorical variable explaining the trial location. Each site may have experienced a different number of spread events, defined by the term SpEv. Rainfall is required for conidia to disperse from the infected focus, and each ‘spread event’ constitutes either an overhead irrigation event or a natural rainfall event.

The first models I will look at are asking:

What is the estimated mean lesions per plant as each distance from the focus, given that the conditions of each spread event is nested within each site and the distance the spore travels is dependent on the spread event at each site.
What is the estimated mean lesions per plant as each distance from the focus, given that distance is dependant on the conditions of each spread event.

dat <-
  left_join(lesion_counts, summary_weather, by = c("site", "rep"))

mod1 <-
  lmer(m_lesions ~ distance + (distance | site / SpEv),
       data = dat)

cat("mod1: ")

## mod1:

formula(mod1)

## m_lesions ~ distance + (distance | site/SpEv)

summary(mod1)

## Linear mixed model fit by REML ['lmerMod']
## Formula: m_lesions ~ distance + (distance | site/SpEv)
##    Data: dat
## 
## REML criterion at convergence: 911.9
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.3523 -0.6015 -0.0971  0.4908  5.8015 
## 
## Random effects:
##  Groups    Name        Variance  Std.Dev. Corr 
##  SpEv:site (Intercept) 0.5350421 0.73147       
##            distance    0.0001785 0.01336  -1.00
##  site      (Intercept) 0.2071781 0.45517       
##            distance    0.7845495 0.88575  0.09 
##  Residual              0.7695721 0.87725       
## Number of obs: 334, groups:  SpEv:site, 6; site, 3
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  1.88614    0.41445   4.551
## distance    -0.02603    0.51142  -0.051
## 
## Correlation of Fixed Effects:
##          (Intr)
## distance 0.045 
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')

Let’s examine the model without ‘site’ to test if the model is a worse fit.

mod2 <-
  lmer(m_lesions ~ distance +
         (distance | SpEv),
       data = dat)

## boundary (singular) fit: see help('isSingular')

cat("mod2: ")

## mod2:

formula(mod2)

## m_lesions ~ distance + (distance | SpEv)

# Compare models
anova(mod1, mod2)

## refitting model(s) with ML (instead of REML)

## Data: dat
## Models:
## mod2: m_lesions ~ distance + (distance | SpEv)
## mod1: m_lesions ~ distance + (distance | site/SpEv)
##      npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)
## mod2    6 889.71 912.58 -438.85   877.71                     
## mod1    9 895.45 929.75 -438.72   877.45 0.2606  3     0.9673

A comparison of the two models shows us that mod2 is much better fit given the lower AIC and that there is no significant difference in the models. Following a reductive approach we should remove site from the model.

summary(mod2)

## Linear mixed model fit by REML ['lmerMod']
## Formula: m_lesions ~ distance + (distance | SpEv)
##    Data: dat
## 
## REML criterion at convergence: 889.9
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.3652 -0.5628 -0.1261  0.4583  5.7863 
## 
## Random effects:
##  Groups   Name        Variance  Std.Dev. Corr 
##  SpEv     (Intercept) 0.9534636 0.97645       
##           distance    0.0002697 0.01642  -1.00
##  Residual             0.7698005 0.87738       
## Number of obs: 334, groups:  SpEv, 6
## 
## Fixed effects:
##              Estimate Std. Error t value
## (Intercept)  1.963345   0.406414   4.831
## distance    -0.027052   0.006974  -3.879
## 
## Correlation of Fixed Effects:
##          (Intr)
## distance -0.986
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')

We can also note that as the distance increases there are less mean lesions per pot, and the variance increases.

From here we should continue with a generalised additive model (GAM), which can handle non-linear terms better than a linear model.

P. Melloy

2024-05-04

Aims