12 Advanced models for differential abundance

GLMs are the basis for advanced testing of differential abundance in sequencing data. This is necessary, as the sequencing data sets deviate from symmetric, continuous, Gaussian assumptions in many ways.

12.2 Generalized linear models: a brief overview

Let us briefly discuss the ideas underlying generalized linear models.

The Generalized linear model (GLM) allows a richer family of probability distributions to describe the data. Intuitively speaking, GLMs allow the modeling of nonlinear, nonsymmetric, and nongaussian associations. GLMs consist of three elements:

  • A probability distribution for the data (from exponential family)

  • A linear predictor targeting the mean, or expectation: \(Xb\)

  • A link function g such that \(E(Y) = \mu = g^{-1}(Xb)\).

Let us fit Poisson with (natural) log-link just to demonstrate how generalized linear models could be fitted in R. We fit the abundance (read counts) assuming that the data is Poisson distributed, and the logarithm of its mean, or expectation, is obtained with a linear model. For further examples in R, you can also check the statmethods website.

Investigate the model output:

Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.09286 0.02357 88.79275 0

Note the link between mean and estimated coefficient (\(\mu = e^{Xb}\)):

## [1] 8.108108
## (Intercept) 
##    8.108108

12.3 DESeq2: differential abundance testing for sequencing data

12.3.1 Fitting DESeq2

[DESeq2 analysis]((https://genomebiology.biomedcentral.com/articles/10.1186/s13059-014-0550-8) accommodates those particular assumptions about sequencing data.

baseMean log2FoldChange lfcSE stat pvalue padj taxon
29.20535 1.91205 0.13432 14.23457 0.00000 0.00000 Clostridium difficile et rel.
51.65152 3.04116 0.28687 10.60107 0.00000 0.00000 Mitsuokella multiacida et rel.
12.39749 1.83825 0.18531 9.91994 0.00000 0.00000 Klebisiella pneumoniae et rel.
44.16494 1.78333 0.23072 7.72937 0.00000 0.00000 Megasphaera elsdenii et rel.
66.93783 1.68345 0.25330 6.64609 0.00000 0.00000 Escherichia coli et rel.
3.63459 1.53142 0.23140 6.61792 0.00000 0.00000 Weissella et rel.
5.74035 3.07334 0.47848 6.42308 0.00000 0.00000 Serratia
0.42171 1.70079 0.47147 3.60743 0.00031 0.00075 Moraxellaceae