Microbiome data analysis with microViz

David Barnett

Part 3: PCA & Differential Abundance

Datasets

shao19
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 819 taxa and 1644 samples ]
sample_data() Sample Data:       [ 1644 samples by 11 sample variables ]
tax_table()   Taxonomy Table:    [ 819 taxa by 6 taxonomic ranks ]
phy_tree()    Phylogenetic Tree: [ 819 tips and 818 internal nodes ]
mice
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 3229 taxa and 520 samples ]
sample_data() Sample Data:       [ 520 samples by 11 sample variables ]
tax_table()   Taxonomy Table:    [ 3229 taxa by 7 taxonomic ranks ]
ibd
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 36349 taxa and 91 samples ]
sample_data() Sample Data:       [ 91 samples by 15 sample variables ]
tax_table()   Taxonomy Table:    [ 36349 taxa by 6 taxonomic ranks ]

More Dissimilarity

  • Ecological & Phylogenetic dissimilarities / distances:
    • Binary Jaccard
    • Bray Curtis
    • UniFrac (unweighted/weighted/generalized)
    • Others…
  • Euclidean distances
    • Widely used in data science
    • Useful for microbiome abundance data??

Euclidean distances

Pythagoras: \(c = \sqrt{a^2 + b^2}\)

\[d\left(p, q\right) = \sqrt{\sum _{i=1}^{n_{taxa}} \left( p_{i}-q_{i}\right)^2 }\]

Euclidean distance between samples p and q

dist(pq, method = "euclidean")
  p
q 5

Naïve Euclidean PCoA

  • Sensitive to sparsity
  • Excessive emphasis on high-abundance taxa

The PCoA looks weird!

Taxon Abundance transformations!

  • Already used:
    • Compositional (proportions)
    • Binary (Presence or Absence)
  • Log transformations
    • Log2, Log10, natural log?
  • Centered Log Ratio - CLR

Transformations - none

Transformations - none - all data!

Transformations - Compositions

Transformations - Binary

Transformations - log10(x+1)

Transformations - log10(x+“halfmin”)

Log10-transformed Euclidean PCoA

Transformations - Centered Log Ratio

Centered Log Ratio tranformation

  • What?
    • Calculate geometric mean for each sample
    • Divide each taxon by the geometric mean of each sample
    • Take the natural log
  • Why?
    • The compositional data problem

Compositional data problem

Gloor 2017 - Microbiome Datasets are Compositional: and this is not optional.

Aitchison PCoA (CLR + Euclidean)

PCA - where’s the “o” gone?

Principal Components Analysis (not Co-ordinates)

  • Similar aim to PCoA (and name!)
  • But doesn’t use distances!
  • Rotation of original dimensions (taxa)
  • Euclidean PCoA equivalent results to PCA

Principal Components Analysis - PCA

PCA provides (taxa) loadings!

PCA-sorted bar chart (iris plot)

Differential Abundance

  • like RNA Seq. Differential Expression
  • but for microbial taxon abundances

Differential Abundance - aim

  • Is experimental condition X associated with relative abundance of bacterial genus Y?
  • Is environmental exposure X associated with …?
  • Is host characteristic X associated with… ?

Differential Abundance - methods

  • Simple methods: Wilcoxon tests, Spearman corr., Linear Reg.
  • CoDa methods: ALDEx2, ANCOM-II, ANCOM-BC
  • RNA seq. methods: DESeq2, limma, edgeR
  • Other: LEfSe, Beta-binomial reg., ZINB reg., ZIGDM reg., …!

Differential Abundance - which rank?

Differential Abundance with microViz

Keep it simple (-ish)

  • Filter out rare taxa strictly (>5% prev or more)

  • Linear regression models

    • Log2 transform of taxon proportions
    • (approach borrowed from Maaslin2)
  • Model every taxon, at all ranks!

    • Visualize on a cladogram tree

Differential Abundance

Your turn

See Gloor 2017 Microbiome Datasets Are Compositional: And This Is Not Optional

  • Try out differential abundance testing and creating taxonomic association tree plots

Nearing 2021 is a great intro to the Differential Abundance topic/problem: https://www.nature.com/articles/s41467-022-28034-z

  • This is the last topic - thank you for your attention and effort :)