Covariates in LM

Data

# RCBD example: 4 gens, 3 blocks, 12 observations
data <- read.csv("../data/example_1.csv") |>
  mutate(gen = as.factor(gen), block = as.factor(block))
head(data)

  block gen yield
1     1  g1   7.4
2     2  g1   6.5
3     3  g1   5.6
4     1  g2   9.8
5     2  g2   6.8
6     3  g2   6.2

# A numeric covariate derived from blocks (for illustration only)
data$covariate <- as.numeric(data$block)
data$covariate_s <- scale(data$covariate, scale = FALSE)

Two models: raw vs centered covariate

mod_1 <- lm(formula = yield ~ 1 + covariate + gen, data = data)
mod_2 <- lm(formula = yield ~ 1 + covariate_s + gen, data = data)

tibble(
  term = names(coef(mod_1)),
  coef_1 = unname(coef(mod_1)), se_1 = sqrt(diag(vcov(mod_1))),
  coef_2 = unname(coef(mod_2)), se_2 = sqrt(diag(vcov(mod_2)))
) |>
  mutate_if(is.numeric, round, 2) |>
  gt() |>
  fmt_number(columns = c(coef_1, coef_2), decimals = 3) |>
  tab_header(title = "Coefficient estimates and SEs")

Coefficient estimates and SEs
term	coef_1	se_1	coef_2	se_2
(Intercept)	8.600	0.63	6.500	0.40
covariate	−1.050	0.24	−1.050	0.24
geng2	1.100	0.57	1.100	0.57
geng3	0.100	0.57	0.100	0.57
geng4	1.800	0.57	1.800	0.57

Estimated marginal means (EMMs) for genotypes

Marginal means with model 1

mm_1 <- emmeans(mod_1, ~gen)
L_1 <- mm_1@linfct
rownames(L_1) <- paste0("gen_", 1:4)
print(L_1)

      (Intercept) covariate geng2 geng3 geng4
gen_1           1         2     0     0     0
gen_2           1         2     1     0     0
gen_3           1         2     0     1     0
gen_4           1         2     0     0     1

Marginal means with model 2

mm_2 <- emmeans(mod_2, ~gen)
L_2 <- mm_2@linfct
rownames(L_2) <- paste0("gen_", 1:4)
print(L_2)

      (Intercept) covariate_s geng2 geng3 geng4
gen_1           1           0     0     0     0
gen_2           1           0     1     0     0
gen_3           1           0     0     1     0
gen_4           1           0     0     0     1

We can clearly see the differences in the construction of the \(L\) matrix in order to get proper estimations of the marginal means. In mod_1, the covariate column equals the mean of the uncentered covariate (here, 2) for each gen row. In mod_2, the covariate column is 0 because we centered it.

C_1 <- mm_1@V
print(round(C_1, 2))

            (Intercept) covariate geng2 geng3 geng4
(Intercept)        0.40     -0.12 -0.16 -0.16 -0.16
covariate         -0.12      0.06  0.00  0.00  0.00
geng2             -0.16      0.00  0.32  0.16  0.16
geng3             -0.16      0.00  0.16  0.32  0.16
geng4             -0.16      0.00  0.16  0.16  0.32

C_2 <- mm_2@V
print(round(C_2, 2))

            (Intercept) covariate_s geng2 geng3 geng4
(Intercept)        0.16        0.00 -0.16 -0.16 -0.16
covariate_s        0.00        0.06  0.00  0.00  0.00
geng2             -0.16        0.00  0.32  0.16  0.16
geng3             -0.16        0.00  0.16  0.32  0.16
geng4             -0.16        0.00  0.16  0.16  0.32

The variance–covariance matrix of the model coefficients is also influenced by the scaling of the covariate, as we observed earlier through the standard errors. When the covariate is centered, the covariance between the intercept and the covariate becomes zero, reflecting the removal of linear dependence between the intercept and the covariate.

Relationship	Before centering	After centering
Intercept–covariate	Correlated	Uncorrelated (0)
Covariate–genotypes	0	0
Genotype–genotype	unchanged	unchanged

EMM_1 <- L_1 %*% mm_1@bhat
var_EMM_1 <- L_1 %*% C_1 %*% t(L_1)
se_EMM_1 <- sqrt(diag(var_EMM_1))
data.frame(
  Gen = rownames(EMM_1),
  EMM_1 = EMM_1, 
  var_EMM_1 = diag(var_EMM_1),
  se_EMM_1
) |> gt()

Gen	EMM_1	var_EMM_1	se_EMM_1
gen_1	6.5	0.16	0.4
gen_2	7.6	0.16	0.4
gen_3	6.6	0.16	0.4
gen_4	8.3	0.16	0.4

EMM_2 <- L_2 %*% mm_2@bhat
var_EMM_2 <- L_2 %*% C_2 %*% t(L_2)
se_EMM_2 <- sqrt(diag(var_EMM_2))
data.frame(
  Gen = rownames(EMM_2),
  EMM_2 = EMM_2, 
  var_EMM_2 = diag(var_EMM_2),
  se_EMM_2
) |> gt()

Gen	EMM_2	var_EMM_2	se_EMM_2
gen_1	6.5	0.16	0.4
gen_2	7.6	0.16	0.4
gen_3	6.6	0.16	0.4
gen_4	8.3	0.16	0.4

This vignette outlines practical considerations for incorporating covariates in linear models, including the impact of centering on the intercept, implications for standard errors, and consequences for the estimation of marginal means.