Overview

📦 microViz is an R package for analysis and visualization of microbiome sequencing data.

🔨 microViz functions are intended to be beginner-friendly but flexible.

🔬 microViz extends or complements popular microbial ecology packages, including phyloseq, vegan, & microbiome.

Learn more

📎 This website is the best place for documentation and examples: https://david-barnett.github.io/microViz/

Installation

microViz is not (yet) available from CRAN. You can install microViz from R Universe, or from GitHub.

I recommend you first install the Bioconductor dependencies using the code below.

if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")
BiocManager::install(c("phyloseq", "microbiome", "ComplexHeatmap"), update = FALSE)

Installation of microViz from R Universe

install.packages(
  "microViz",
  repos = c(davidbarnett = "https://david-barnett.r-universe.dev", getOption("repos"))
)

I also recommend you install the following suggested CRAN packages.

install.packages("ggtext") # for rotated labels on ord_plot() 
install.packages("ggraph") # for taxatree_plots()
install.packages("DT") # for tax_fix_interactive()
install.packages("corncob") # for beta binomial models in tax_model()

Installation of microViz from GitHub

# Installing from GitHub requires the remotes package
install.packages("remotes")
# Windows users will also need to have RTools installed! http://jtleek.com/modules/01_DataScientistToolbox/02_10_rtools/

# To install the latest version:
remotes::install_github("david-barnett/microViz")

# To install a specific "release" version of this package, e.g. an old version 
remotes::install_github("david-barnett/microViz@0.12.0") 

Installation notes

🍎 macOS users - might need to install xquartz to make the heatmaps work (to do this with homebrew, run the following command in your mac’s Terminal: brew install --cask xquartz

📦 I highly recommend using renv for managing your R package installations across multiple projects.

🐳 For Docker users an image with the main branch installed is available at: https://hub.docker.com/r/barnettdavid/microviz-rocker-verse

📅 microViz is tested to work with R version 4.* on Windows, MacOS, and Ubuntu 20. R version 3.6.* should probably work, but I don’t fully test this.

Interactive ordination exploration

library(microViz)
#> microViz version 0.12.1 - Copyright (C) 2021-2024 David Barnett
#> ! Website: https://david-barnett.github.io/microViz
#> ✔ Useful?  For citation details, run: `citation("microViz")`
#> ✖ Silence? `suppressPackageStartupMessages(library(microViz))`

microViz provides a Shiny app for an easy way to start exploring your microbiome data: all you need is a phyloseq object.

# example data from corncob package
pseq <- microViz::ibd %>%
  tax_fix() %>%
  phyloseq_validate()
ord_explore(pseq) # gif generated with microViz version 0.7.4 (plays at 1.75x speed)

Example analyses (on HITChip data)

# get some example data
data("dietswap", package = "microbiome")

# create a couple of numerical variables to use as constraints or conditions
dietswap <- dietswap %>%
  ps_mutate(
    weight = recode(bmi_group, obese = 3, overweight = 2, lean = 1),
    female = if_else(sex == "female", true = 1, false = 0),
    african = if_else(nationality == "AFR", true = 1, false = 0)
  )
# add a couple of missing values to show how microViz handles missing data
sample_data(dietswap)$african[c(3, 4)] <- NA

Looking at your data

You have quite a few samples in your phyloseq object, and would like to visualize their compositions. Perhaps these example data differ by participant nationality?

dietswap %>%
  comp_barplot(
    tax_level = "Genus", n_taxa = 15, other_name = "Other",
    taxon_renamer = function(x) stringr::str_remove(x, " [ae]t rel."),
    palette = distinct_palette(n = 15, add = "grey90"),
    merge_other = FALSE, bar_outline_colour = "darkgrey"
  ) +
  coord_flip() +
  facet_wrap("nationality", nrow = 1, scales = "free") +
  labs(x = NULL, y = NULL) +
  theme(axis.text.y = element_blank(), axis.ticks.y = element_blank())
#> Registered S3 method overwritten by 'seriation':
#>   method         from 
#>   reorder.hclust vegan

htmp <- dietswap %>%
  ps_mutate(nationality = as.character(nationality)) %>%
  tax_transform("log2", add = 1, chain = TRUE) %>%
  comp_heatmap(
    taxa = tax_top(dietswap, n = 30), grid_col = NA, name = "Log2p",
    taxon_renamer = function(x) stringr::str_remove(x, " [ae]t rel."),
    colors = heat_palette(palette = viridis::turbo(11)),
    row_names_side = "left", row_dend_side = "right", sample_side = "bottom",
    sample_anno = sampleAnnotation(
      Nationality = anno_sample_cat(
        var = "nationality", col = c(AAM = "grey35", AFR = "grey85"),
        box_col = NA, legend_title = "Nationality", size = grid::unit(4, "mm")
      )
    )
  )

ComplexHeatmap::draw(
  object = htmp, annotation_legend_list = attr(htmp, "AnnoLegends"),
  merge_legends = TRUE
)

Example ordination plot workflow

Ordination methods can also help you to visualize if overall microbial ecosystem composition differs markedly between groups, e.g. BMI.

Here is one option as an example:

  1. Aggregate the taxa into bacterial families (for example) - use tax_agg()
  2. Transform the microbial data with the centered-log-ratio transformation - use tax_transform()
  3. Perform PCA with the clr-transformed features (equivalent to Aitchison distance PCoA) - use ord_calc()
  4. Plot the first 2 axes of this PCA ordination, colouring samples by group and adding taxon loading arrows to visualize which taxa generally differ across your samples - use ord_plot()
  5. Customise the theme of the ggplot as you like and/or add features like ellipses or annotations
# perform ordination
unconstrained_aitchison_pca <- dietswap %>%
  tax_agg("Family") %>%
  tax_transform("clr") %>%
  ord_calc()
# ord_calc will automatically infer you want a "PCA" here
# specify explicitly with method = "PCA", or you can pick another method

# create plot
pca_plot <- unconstrained_aitchison_pca %>%
  ord_plot(
    plot_taxa = 1:6, colour = "bmi_group", size = 1.5,
    tax_vec_length = 0.325,
    tax_lab_style = tax_lab_style(max_angle = 90, aspect_ratio = 1),
    auto_caption = 8
  )

# customise plot
customised_plot <- pca_plot +
  stat_ellipse(aes(linetype = bmi_group, colour = bmi_group), linewidth = 0.3) + # linewidth not size, since ggplot 3.4.0
  scale_colour_brewer(palette = "Set1") +
  theme(legend.position = "bottom") +
  coord_fixed(ratio = 1, clip = "off") # makes rotated labels align correctly

# show plot
customised_plot

PERMANOVA

You visualised your ordinated data in the plot above. Below you can see how to perform a PERMANOVA to test the significance of BMI’s association with overall microbial composition. This example uses the Family-level Aitchison distance to correspond with the plot above.

# calculate distances
aitchison_dists <- dietswap %>%
  tax_transform("identity", rank = "Family") %>%
  dist_calc("aitchison")

# the more permutations you request, the longer it takes
# but also the more stable and precise your p-values become
aitchison_perm <- aitchison_dists %>%
  dist_permanova(
    seed = 1234, # for set.seed to ensure reproducibility of random process
    n_processes = 1, n_perms = 99, # you should use at least 999!
    variables = "bmi_group"
  )
#> 2024-04-03 14:06:14.500859 - Starting PERMANOVA with 99 perms with 1 processes
#> 2024-04-03 14:06:14.55802 - Finished PERMANOVA

# view the permanova results
perm_get(aitchison_perm) %>% as.data.frame()
#>            Df SumOfSqs         R2        F Pr(>F)
#> bmi_group   2  109.170 0.04104336 4.686602   0.01
#> Residual  219 2550.700 0.95895664       NA     NA
#> Total     221 2659.869 1.00000000       NA     NA

# view the info stored about the distance calculation
info_get(aitchison_perm)
#> psExtra info:
#> tax_agg = "Family" tax_trans = "identity" dist_method = "aitchison"

Constrained partial ordination

You could visualise the effect of the (numeric/logical) variables in your permanova directly using the ord_plot function with constraints (and conditions).

perm2 <- aitchison_dists %>%
  dist_permanova(variables = c("weight", "african", "sex"), seed = 321)
#> Dropping samples with missings: 2
#> 2024-04-03 14:06:14.574135 - Starting PERMANOVA with 999 perms with 1 processes
#> 2024-04-03 14:06:16.223648 - Finished PERMANOVA

We’ll visualise the effect of nationality and bodyweight on sample composition, after first removing the effect of sex.

perm2 %>%
  ord_calc(constraints = c("weight", "african"), conditions = "female") %>%
  ord_plot(
    colour = "nationality", size = 2.5, alpha = 0.35,
    auto_caption = 7,
    constraint_vec_length = 1,
    constraint_vec_style = vec_constraint(1.5, colour = "grey15"),
    constraint_lab_style = constraint_lab_style(
      max_angle = 90, size = 3, aspect_ratio = 0.8, colour = "black"
    )
  ) +
  stat_ellipse(aes(colour = nationality), linewidth = 0.2) + # linewidth not size since ggplot 3.4.0
  scale_color_brewer(palette = "Set1") +
  coord_fixed(ratio = 0.8, clip = "off", xlim = c(-4, 4)) +
  theme(legend.position = c(0.9, 0.1), legend.background = element_rect())
#> 
#> Centering (mean) and scaling (sd) the constraints and/or conditions:
#>  weight
#>  african
#>  female
#> Warning: A numeric `legend.position` argument in `theme()` was deprecated in ggplot2
#> 3.5.0.
#> ℹ Please use the `legend.position.inside` argument of `theme()` instead.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.

Correlation Heatmaps

microViz heatmaps are powered by ComplexHeatmap and annotated with taxa prevalence and/or abundance.

# set up the data with numerical variables and filter to top taxa
psq <- dietswap %>%
  ps_mutate(
    weight = recode(bmi_group, obese = 3, overweight = 2, lean = 1),
    female = if_else(sex == "female", true = 1, false = 0),
    african = if_else(nationality == "AFR", true = 1, false = 0)
  ) %>%
  tax_filter(
    tax_level = "Genus", min_prevalence = 1 / 10, min_sample_abundance = 1 / 10
  ) %>%
  tax_transform("identity", rank = "Genus")
#> Proportional min_prevalence given: 0.1 --> min 23/222 samples.

# randomly select 30 taxa from the 50 most abundant taxa (just for an example)
set.seed(123)
taxa <- sample(tax_top(psq, n = 50), size = 30)
# actually draw the heatmap
cor_heatmap(
  data = psq, taxa = taxa,
  taxon_renamer = function(x) stringr::str_remove(x, " [ae]t rel."),
  tax_anno = taxAnnotation(
    Prev. = anno_tax_prev(undetected = 50),
    Log2 = anno_tax_box(undetected = 50, trans = "log2", zero_replace = 1)
  )
)

Citation

😇 If you find any part of microViz useful to your work, please consider citing the JOSS article:

Barnett et al., (2021). microViz: an R package for microbiome data visualization and statistics. Journal of Open Source Software, 6(63), 3201, https://doi.org/10.21105/joss.03201

Contributing

Bug reports, questions, suggestions for new features, and other contributions are all welcome. Feel free to create a GitHub Issue or write on the Discussions page.

This project is released with a Contributor Code of Conduct and by participating in this project you agree to abide by its terms.

Session info

sessionInfo()
#> R version 4.3.2 (2023-10-31)
#> Platform: aarch64-apple-darwin20 (64-bit)
#> Running under: macOS Sonoma 14.4.1
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib 
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0
#> 
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> time zone: Europe/Amsterdam
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] ggplot2_3.5.0   dplyr_1.1.4     phyloseq_1.46.0 microViz_0.12.1
#> [5] testthat_3.2.1  devtools_2.4.5  usethis_2.2.3  
#> 
#> loaded via a namespace (and not attached):
#>   [1] RColorBrewer_1.1-3      shape_1.4.6.1           rstudioapi_0.16.0      
#>   [4] jsonlite_1.8.8          magrittr_2.0.3          farver_2.1.1           
#>   [7] rmarkdown_2.26          GlobalOptions_0.1.2     fs_1.6.3               
#>  [10] zlibbioc_1.48.0         vctrs_0.6.5             multtest_2.58.0        
#>  [13] memoise_2.0.1           Cairo_1.6-2             RCurl_1.98-1.14        
#>  [16] htmltools_0.5.8         curl_5.2.1              Rhdf5lib_1.24.2        
#>  [19] rhdf5_2.46.1            htmlwidgets_1.6.4       plyr_1.8.9             
#>  [22] cachem_1.0.8            commonmark_1.9.1        igraph_2.0.3           
#>  [25] mime_0.12               lifecycle_1.0.4         iterators_1.0.14       
#>  [28] pkgconfig_2.0.3         Matrix_1.6-5            R6_2.5.1               
#>  [31] fastmap_1.1.1           clue_0.3-65             GenomeInfoDbData_1.2.11
#>  [34] shiny_1.8.1.1           digest_0.6.35           selectr_0.4-2          
#>  [37] colorspace_2.1-0        S4Vectors_0.40.2        ps_1.7.6               
#>  [40] pkgload_1.3.4           seriation_1.5.4         vegan_2.6-4            
#>  [43] labeling_0.4.3          fansi_1.0.6             httr_1.4.7             
#>  [46] mgcv_1.9-1              compiler_4.3.2          remotes_2.5.0          
#>  [49] doParallel_1.0.17       withr_3.0.0             viridis_0.6.5          
#>  [52] pkgbuild_1.4.4          highr_0.10              MASS_7.3-60.0.1        
#>  [55] sessioninfo_1.2.2       rjson_0.2.21            biomformat_1.30.0      
#>  [58] permute_0.9-7           tools_4.3.2             chromote_0.2.0         
#>  [61] ape_5.7-1               httpuv_1.6.15           glue_1.7.0             
#>  [64] nlme_3.1-164            rhdf5filters_1.14.1     promises_1.2.1         
#>  [67] gridtext_0.1.5          grid_4.3.2              Rtsne_0.17             
#>  [70] cluster_2.1.6           reshape2_1.4.4          ade4_1.7-22            
#>  [73] generics_0.1.3          gtable_0.3.4            microbiome_1.24.0      
#>  [76] ca_0.71.1               tidyr_1.3.1             websocket_1.4.1        
#>  [79] data.table_1.15.4       xml2_1.3.6              utf8_1.2.4             
#>  [82] XVector_0.42.0          BiocGenerics_0.48.1     markdown_1.12          
#>  [85] foreach_1.5.2           pillar_1.9.0            stringr_1.5.1          
#>  [88] later_1.3.2             circlize_0.4.16         splines_4.3.2          
#>  [91] ggtext_0.1.2            lattice_0.22-6          survival_3.5-8         
#>  [94] tidyselect_1.2.1        registry_0.5-1          ComplexHeatmap_2.18.0  
#>  [97] Biostrings_2.70.2       miniUI_0.1.1.1          knitr_1.45             
#> [100] gridExtra_2.3           IRanges_2.36.0          stats4_4.3.2           
#> [103] xfun_0.43               Biobase_2.62.0          matrixStats_1.2.0      
#> [106] brio_1.1.4              stringi_1.8.3           yaml_2.3.8             
#> [109] evaluate_0.23           codetools_0.2-20        tibble_3.2.1           
#> [112] cli_3.6.2               xtable_1.8-4            munsell_0.5.1          
#> [115] processx_3.8.4          Rcpp_1.0.12             GenomeInfoDb_1.38.7    
#> [118] png_0.1-8               parallel_4.3.2          ellipsis_0.3.2         
#> [121] profvis_0.3.8           urlchecker_1.0.1        bitops_1.0-7           
#> [124] viridisLite_0.4.2       scales_1.3.0            purrr_1.0.2            
#> [127] crayon_1.5.2            GetoptLong_1.0.5        rlang_1.1.3            
#> [130] TSP_1.2-4               rvest_1.0.4