Overview

📦 microViz is an R package for analysis and visualization of microbiome sequencing data.

🔨 microViz functions are intended to be easy to use and flexible.

🔬 microViz extends and complements popular microbial ecology packages like phyloseq, vegan, & microbiome.

Learn more

📎 This website is the best place for documentation and examples: https://david-barnett.github.io/microViz/

Installation

You can install the latest available microViz package version using the following instructions.

# Installing from github requires the devtools package
install.packages("devtools") 

# To install the latest "released" version of this package
devtools::install_github("david-barnett/microViz@0.9.1") # check 0.9.1 is the latest release

# To install the very latest version:
devtools::install_github("david-barnett/microViz")
# If you encounter a bug please try the latest version & let me know if the bug persists!

# If the Bioconductor dependencies don't automatically install you can install
# them yourself like this:
if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") 
BiocManager::install(c("phyloseq", "microbiome", "ComplexHeatmap"))

💻 Windows users - will need to have RTools installed so that your computer can build this package (follow instructions here: http://jtleek.com/modules/01_DataScientistToolbox/02_10_rtools/)

🍎 macOS users - might need to install xquartz to make the heatmaps work (to do this with homebrew, run the following command in your mac’s Terminal: brew install --cask xquartz

📦 I highly recommend using renv for managing your R package installations across multiple projects.

🐳 Alternatively, for Docker users an image with the main branch installed is available at: https://hub.docker.com/r/barnettdavid/microviz-rocker-verse

📅 microViz is tested to work with R version 4 on Windows, MacOS, and Ubuntu 18 and 20. R version 3.6.* should probably work, but I don’t formally test this.

Interactive ordination exploration

microViz provides a Shiny app for an easy way to start exploring your microbiome data: all you need is a phyloseq object.

# example data from corncob package
pseq <- corncob::ibd_phylo %>%
  tax_fix() %>%
  phyloseq_validate()
ord_explore(pseq) # gif generated with microViz version 0.7.4 (plays at 1.75x speed)

Example analyses

library(phyloseq)
library(dplyr)
#> Warning: package 'dplyr' was built under R version 4.1.2
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(ggplot2)
# get some example data
data("dietswap", package = "microbiome")

# create a couple of numerical variables to use as constraints or conditions
dietswap <- dietswap %>%
  ps_mutate(
    weight = recode(bmi_group, obese = 3, overweight = 2, lean = 1),
    female = if_else(sex == "female", true = 1, false = 0),
    african = if_else(nationality == "AFR", true = 1, false = 0)
  )
# add a couple of missing values to show how microViz handles missing data
sample_data(dietswap)$african[c(3, 4)] <- NA

Looking at your data

You have quite a few samples in your phyloseq object, and would like to visualise their compositions. Perhaps these example data differ participant nationality?

dietswap %>%
  comp_barplot(
    tax_level = "Genus", n_taxa = 15, other_name = "Other",
    taxon_renamer = function(x) stringr::str_remove(x, " [ae]t rel."),
    palette = distinct_palette(n = 15, add = "grey90"),
    merge_other = FALSE, bar_outline_colour = "darkgrey"
  ) +
  coord_flip() +
  facet_wrap("nationality", nrow = 1, scales = "free") +
  labs(x = NULL, y = NULL) +
  theme(axis.text.y = element_blank(), axis.ticks.y = element_blank())
#> Registered S3 method overwritten by 'seriation':
#>   method         from 
#>   reorder.hclust vegan

htmp <- dietswap %>%
  ps_mutate(nationality = as.character(nationality)) %>%
  tax_transform("log2", add = 1, chain = TRUE) %>%
  comp_heatmap(
    taxa = tax_top(dietswap, n = 30), grid_col = NA, name = "Log2p",
    taxon_renamer = function(x) stringr::str_remove(x, " [ae]t rel."),
    colors = heat_palette(palette = viridis::turbo(11)),
    row_names_side = "left", row_dend_side = "right", sample_side = "bottom",
    sample_anno = sampleAnnotation(
      Nationality = anno_sample_cat(
        var = "nationality", col = c(AAM = "grey35", AFR = "grey85"),
        box_col = NA, legend_title = "Nationality", size = grid::unit(4, "mm")
      )
    )
  )

ComplexHeatmap::draw(
  object = htmp, annotation_legend_list = attr(htmp, "AnnoLegends"),
  merge_legends = TRUE
)

Example ordination plot workflow

Ordination methods can also help you to visualise if overall microbial ecosystem composition differs markedly between groups, e.g. BMI.

Here is one option as an example:

  1. Filter out rare taxa (e.g. remove Genera not present in at least 10% of samples) - use tax_filter()
  2. Aggregate the taxa into bacterial families (for example) - use tax_agg()
  3. Transform the microbial data with the centre-log-ratio transformation - use tax_transform()
  4. Perform PCA with the clr-transformed features (equivalent to aitchison distance PCoA) - use ord_calc()
  5. Plot the first 2 axes of this PCA ordination, colouring samples by group and adding taxon loading arrows to visualise which taxa generally differ across your samples - use ord_plot()
  6. Customise the theme of the ggplot as you like and/or add features like ellipses or annotations
# perform ordination
unconstrained_aitchison_pca <- dietswap %>%
  tax_filter(min_prevalence = 0.1, tax_level = "Genus") %>%
  tax_agg("Family") %>%
  tax_transform("clr") %>%
  ord_calc()
#> Proportional min_prevalence given: 0.1 --> min 23/222 samples.
# ord_calc will automatically infer you want a "PCA" here
# specify explicitly with method = "PCA", or you can pick another method

# create plot
pca_plot <- unconstrained_aitchison_pca %>%
  ord_plot(
    plot_taxa = 1:6, colour = "bmi_group", size = 1.5,
    tax_vec_length = 0.325,
    tax_lab_style = tax_lab_style(max_angle = 90, aspect_ratio = 0.5),
    auto_caption = 8
  )

# customise plot
customised_plot <- pca_plot +
  stat_ellipse(aes(linetype = bmi_group, colour = bmi_group), size = 0.3) +
  scale_colour_brewer(palette = "Set1") +
  theme(legend.position = "bottom") +
  coord_fixed(ratio = 0.5, clip = "off") # makes rotated labels align correctly

# show plot
customised_plot

PERMANOVA

You visualised your ordinated data in the plot above. Below you can see how to perform a PERMANOVA to test the significance of BMI’s association with overall microbial composition. This example uses the Family-level aitchison distance to correspond with the plot above.

# calculate distances
aitchison_dists <- dietswap %>%
  tax_filter(min_prevalence = 0.1) %>%
  tax_transform("identity", rank = "Family") %>%
  dist_calc("aitchison")
#> Proportional min_prevalence given: 0.1 --> min 23/222 samples.

# the more permutations you request, the longer it takes
# but also the more stable and precise your p-values become
aitchison_perm <- aitchison_dists %>% 
  dist_permanova(
    seed = 1234, # for set.seed to ensure reproducibility of random process
    n_processes = 1, n_perms = 99, # you should use at least 999!
    variables = "bmi_group"
  )
#> 2022-04-28 12:21:50 - Starting PERMANOVA with 99 perms with 1 processes
#> 2022-04-28 12:21:50 - Finished PERMANOVA

# view the permanova results
perm_get(aitchison_perm) %>% as.data.frame()
#>            Df  SumOfSqs         R2        F Pr(>F)
#> bmi_group   2  104.0678 0.04177157 4.773379   0.01
#> Residual  219 2387.2862 0.95822843       NA     NA
#> Total     221 2491.3540 1.00000000       NA     NA

# view the info stored about the distance calculation
info_get(aitchison_perm)
#> ps_extra info:
#> tax_agg = Family 
#> tax_transform = identity 
#> tax_scale = NA 
#> distMethod = aitchison 
#> ordMethod = NA 
#> constraints = NA 
#> conditions = NA

Constrained partial ordination

You could visualise the effect of the (numeric/logical) variables in your permanova directly using the ord_plot function with constraints (and conditions).

perm2 <- aitchison_dists %>% 
  dist_permanova(variables = c("weight", "african", "sex"), seed = 321)
#> Dropping samples with missings: 2
#> 2022-04-28 12:21:50 - Starting PERMANOVA with 999 perms with 1 processes
#> 2022-04-28 12:21:53 - Finished PERMANOVA

We’ll visualise the effect of nationality and bodyweight on sample composition, after first removing the effect of sex.

perm2 %>%
  ord_calc(constraints = c("weight", "african"), conditions = "female") %>%
  ord_plot(
    colour = "nationality", size = 2.5, alpha = 0.35,
    auto_caption = 7,
    constraint_vec_length = 1,
    constraint_vec_style = vec_constraint(size = 1.5, colour = "grey15"),
    constraint_lab_style = constraint_lab_style(
      max_angle = 90, size = 3, aspect_ratio = 0.35, colour = "black"
    )
  ) +
  stat_ellipse(aes(colour = nationality), size = 0.2) +
  scale_color_brewer(palette = "Set1") +
  coord_fixed(ratio = 0.35, clip = "off") +
  theme(legend.position = c(0.9, 0.1), legend.background = element_rect())
#> 
#> Centering (mean) and scaling (sd) the constraints and conditions:
#>  weight
#>  african
#>  female

Correlation Heatmaps

microViz heatmaps are powered by ComplexHeatmap and annotated with taxa prevalence and/or abundance.

# set up the data with numerical variables and filter to top taxa
psq <- dietswap %>%
  ps_mutate(
    weight = recode(bmi_group, obese = 3, overweight = 2, lean = 1),
    female = if_else(sex == "female", true = 1, false = 0),
    african = if_else(nationality == "AFR", true = 1, false = 0)
  ) %>%
  tax_filter(
    tax_level = "Genus", min_prevalence = 1 / 10, min_sample_abundance = 1 / 10
  ) %>%
  tax_transform("identity", rank = "Genus")
#> Proportional min_prevalence given: 0.1 --> min 23/222 samples.

# randomly select 30 taxa from the 50 most abundant taxa (just for an example)
set.seed(123)
taxa <- sample(tax_top(psq, n = 50), size = 30)
# actually draw the heatmap
cor_heatmap(
  data = psq, taxa = taxa,
  taxon_renamer = function(x) stringr::str_remove(x, " [ae]t rel."),
  tax_anno = taxAnnotation(
    Prev. = anno_tax_prev(undetected = 50),
    Log2 = anno_tax_box(undetected = 50, trans = "log2", zero_replace = 1)
  )
)

Citation

😇 If you find any part of microViz useful to your work, please consider citing the JOSS article:

Barnett et al., (2021). microViz: an R package for microbiome data visualization and statistics. Journal of Open Source Software, 6(63), 3201, https://doi.org/10.21105/joss.03201

Contributing

Bug reports, questions, suggestions for new features, and other contributions are all welcome. Feel free to create a GitHub Issue or write on the Discussions page. Alternatively you could also contact me (David) on Twitter @_david_barnett_ .

This project is released with a Contributor Code of Conduct and by participating in this project you agree to abide by its terms.

Session info

sessionInfo()
#> R version 4.1.1 (2021-08-10)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19044)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=English_United States.1252 
#> [2] LC_CTYPE=English_United States.1252   
#> [3] LC_MONETARY=English_United States.1252
#> [4] LC_NUMERIC=C                          
#> [5] LC_TIME=English_United States.1252    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] ggplot2_3.3.5       dplyr_1.0.8         phyloseq_1.38.0    
#> [4] microViz_0.9.0.9009
#> 
#> loaded via a namespace (and not attached):
#>   [1] ggtext_0.1.1           nlme_3.1-155           matrixStats_0.61.0    
#>   [4] bitops_1.0-7           RColorBrewer_1.1-3     doParallel_1.0.17     
#>   [7] GenomeInfoDb_1.30.1    tools_4.1.1            utf8_1.2.2            
#>  [10] R6_2.5.1               vegan_2.5-7            DBI_1.1.2             
#>  [13] BiocGenerics_0.40.0    mgcv_1.8-38            colorspace_2.0-3      
#>  [16] GetoptLong_1.0.5       permute_0.9-7          rhdf5filters_1.6.0    
#>  [19] ade4_1.7-18            withr_2.5.0            tidyselect_1.1.2      
#>  [22] gridExtra_2.3          compiler_4.1.1         microbiome_1.16.0     
#>  [25] cli_3.2.0              Biobase_2.54.0         Cairo_1.5-15          
#>  [28] TSP_1.2-0              xml2_1.3.3             labeling_0.4.2        
#>  [31] scales_1.2.0           stringr_1.4.0          digest_0.6.29         
#>  [34] rmarkdown_2.13         XVector_0.34.0         pkgconfig_2.0.3       
#>  [37] htmltools_0.5.2        fastmap_1.1.0          highr_0.9             
#>  [40] GlobalOptions_0.1.2    rlang_1.0.2            rstudioapi_0.13       
#>  [43] shape_1.4.6            generics_0.1.2         farver_2.1.0          
#>  [46] jsonlite_1.8.0         RCurl_1.98-1.6         magrittr_2.0.2        
#>  [49] GenomeInfoDbData_1.2.7 biomformat_1.22.0      Matrix_1.4-0          
#>  [52] Rcpp_1.0.8.3           munsell_0.5.0          S4Vectors_0.32.3      
#>  [55] Rhdf5lib_1.16.0        fansi_1.0.2            ape_5.6-2             
#>  [58] viridis_0.6.2          lifecycle_1.0.1        stringi_1.7.6         
#>  [61] yaml_2.3.5             MASS_7.3-55            zlibbioc_1.40.0       
#>  [64] rhdf5_2.38.1           Rtsne_0.15             plyr_1.8.7            
#>  [67] grid_4.1.1             parallel_4.1.1         crayon_1.5.1          
#>  [70] lattice_0.20-45        Biostrings_2.62.0      splines_4.1.1         
#>  [73] gridtext_0.1.4         multtest_2.50.0        circlize_0.4.14       
#>  [76] magick_2.7.3           ComplexHeatmap_2.10.0  knitr_1.37            
#>  [79] pillar_1.7.0           igraph_1.2.11          rjson_0.2.21          
#>  [82] markdown_1.1           reshape2_1.4.4         codetools_0.2-18      
#>  [85] stats4_4.1.1           glue_1.6.1             corncob_0.2.0         
#>  [88] evaluate_0.14          data.table_1.14.2      png_0.1-7             
#>  [91] vctrs_0.3.8            foreach_1.5.2          gtable_0.3.0          
#>  [94] purrr_0.3.4            tidyr_1.2.0            clue_0.3-60           
#>  [97] assertthat_0.2.1       xfun_0.30              survival_3.2-13       
#> [100] viridisLite_0.4.0      seriation_1.3.5        tibble_3.1.6          
#> [103] iterators_1.0.14       registry_0.5-1         IRanges_2.28.0        
#> [106] cluster_2.1.2          ellipsis_0.3.2