Fixing your tax_table

Quick start / TLDR

Use tax_fix() on your phyloseq data with default arguments to repair most tax_table problems (missing or uninformative values). If you still encounter errors using e.g. tax_agg, try using the Shiny app tax_fix_interactive() to help you generate tax_fix code that will fix your particular tax_table problems.

library(phyloseq)
suppressPackageStartupMessages(library(microViz))

Intro

This article will explain some of the common problems that can occur in your phyloseq tax_table, and that might cause problems for e.g. tax_agg. You can fix these problems with the help of tax_fix and tax_fix_interactive.

Fixing problems

Let’s look at some example data from the corncob package:

pseq <- microViz::ibd
pseq
#> phyloseq-class experiment-level object
#> otu_table()   OTU Table:         [ 36349 taxa and 91 samples ]
#> sample_data() Sample Data:       [ 91 samples by 15 sample variables ]
#> tax_table()   Taxonomy Table:    [ 36349 taxa by 7 taxonomic ranks ]

The Species rank appears to be blank for many entries. This is a problem you may well encounter in your data: unique sequences or OTUs often cannot be annotated at lower taxonomic ranks.

tax_table(pseq)[40:54, 4:7] # highest 3 ranks not shown, to save space
#> Taxonomy Table:     [15 taxa by 4 taxonomic ranks]:
#>        Order               Family               Genus                  Species
#> OTU.40 "Enterobacteriales" "Enterobacteriaceae" "Escherichia/Shigella" ""     
#> OTU.41 "Coriobacteriales"  "Coriobacteriaceae"  "Gordonibacter"        ""     
#> OTU.42 "Clostridiales"     "Ruminococcaceae"    "Faecalibacterium"     ""     
#> OTU.43 "Clostridiales"     "Ruminococcaceae"    ""                     ""     
#> OTU.44 "Bacteroidales"     "Prevotellaceae"     "Prevotella"           ""     
#> OTU.45 "Bacteroidales"     "Bacteroidaceae"     "Bacteroides"          ""     
#> OTU.46 "Enterobacteriales" "Enterobacteriaceae" "Klebsiella"           ""     
#> OTU.47 "Bacteroidales"     "Prevotellaceae"     "Prevotella"           ""     
#> OTU.48 "Clostridiales"     "Lachnospiraceae"    "Blautia"              ""     
#> OTU.49 "Clostridiales"     "Ruminococcaceae"    "Faecalibacterium"     ""     
#> OTU.50 "Enterobacteriales" "Enterobacteriaceae" "Escherichia/Shigella" ""     
#> OTU.51 "Bacteroidales"     "Prevotellaceae"     "Prevotella"           ""     
#> OTU.52 "Clostridiales"     "Lachnospiraceae"    "Clostridium_XlVa"     ""     
#> OTU.53 "Bacteroidales"     "Prevotellaceae"     "Prevotella"           ""     
#> OTU.54 "Clostridiales"     "Lachnospiraceae"    ""                     ""

If we would try to aggregate at Genus or Family rank level, we discover that blank values at these ranks prevent taxonomic aggregation. This is because, for example, it looks like OTU.43 and OTU.54 share the same (empty) Genus name, ““, despite being different at a higher rank, Family.

# tax_agg(pseq, rank = "Family") # this fails, and sends (helpful) messages about taxa problems

So we should run tax_fix first, which will fix most problems with default settings, allowing taxa to be aggregated successfully (at any rank). If you still have errors when using tax_agg after tax_fix, carefully read the error and accompanying messages. Often you can copy suggested tax_fix code from the tax_agg error. You should generally also have a look around your tax_table for other uninformative values, using tax_fix_interactive.

pseq %>%
  tax_fix() %>%
  tax_agg(rank = "Family")
#> psExtra object - a phyloseq object with extra slots:
#> 
#> phyloseq-class experiment-level object
#> otu_table()   OTU Table:         [ 59 taxa and 91 samples ]
#> sample_data() Sample Data:       [ 91 samples by 15 sample variables ]
#> tax_table()   Taxonomy Table:    [ 59 taxa by 5 taxonomic ranks ]
#> 
#> psExtra info:
#> tax_agg = "Family"

What does tax_fix do?

tax_fix searches all the ranks of the phyloseq object tax_table for:

short values, like “g__”, ““,” “, etc. (any with fewer characters than min_length)
common, longer but uninformative values like “unknown” (see full list at ?tax_fix)
NAs

tax_fix replaces these values with the next higher taxonomic rank, e.g. an “unknown” Family within the Order Clostridiales will be renamed “Clostridiales Order”, as seen below.

pseq %>%
  tax_fix(min_length = 4) %>%
  tax_agg("Family") %>%
  # ps_get() %>% # needed in older versions of microViz (< 0.10.0)
  tax_table() %>%
  .[1:8, 3:5] # removes the first 2 ranks and shows only first 8 rows for nice printing
#> Taxonomy Table:     [8 taxa by 3 taxonomic ranks]:
#>                       Class           Order             Family                 
#> Ruminococcaceae       "Clostridia"    "Clostridiales"   "Ruminococcaceae"      
#> Bacteroidaceae        "Bacteroidia"   "Bacteroidales"   "Bacteroidaceae"       
#> Prevotellaceae        "Bacteroidia"   "Bacteroidales"   "Prevotellaceae"       
#> Lachnospiraceae       "Clostridia"    "Clostridiales"   "Lachnospiraceae"      
#> Veillonellaceae       "Negativicutes" "Selenomonadales" "Veillonellaceae"      
#> Clostridiales Order   "Clostridia"    "Clostridiales"   "Clostridiales Order"  
#> Peptostreptococcaceae "Clostridia"    "Clostridiales"   "Peptostreptococcaceae"
#> Porphyromonadaceae    "Bacteroidia"   "Bacteroidales"   "Porphyromonadaceae"

Interactive solutions

You can use tax_fix_interactive() to explore your data’s tax_table visually, and interactively find and fix problematic entries. You can then copy your automagically personalised tax_fix code from tax_fix_interactive’s output, to paste into your script. Below is a screen capture video of tax_fix in action, using some other artificially mangled example data (see details at ?tax_fix_interactive()).

tax_fix_interactive(example_data)

Abundance filtering as a solution

Sequences that are unclassified at fairly high ranks e.g. Class are often very low abundance (or possibly represent sequencing errors/chimeras), if you are using data from an environment that is typically well represented in reference databases. So if you are struggling with what to do with unclassified taxa, consider if you can just remove them first using tax_filter() (perhaps using fairly relaxed filtering criteria like min_prevalence of 2 samples, or min_total_abundance of 1000 reads, and keeping the tax_level argument as NA, so that no aggregation is attempted before filtering).

Alternatives

microbiome::aggregate_taxa() also solves some tax_table problems, e.g. where multiple distinct genera converge again to the same species name like “” or “s__”, it will make unique taxa names by pasting together all of the rank names. However this can produce some very long names, which need to be manually shortened before use in plots. Plus, it doesn’t replace names like “s__” if they only occur once. Moreover, when creating ordination plots with microViz, only tax_agg() will record the aggregation level for provenance tracking and automated plot captioning.

Session info

devtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.5.0 (2025-04-11)
#>  os       Ubuntu 24.04.2 LTS
#>  system   x86_64, linux-gnu
#>  ui       X11
#>  language en
#>  collate  C.UTF-8
#>  ctype    C.UTF-8
#>  tz       UTC
#>  date     2025-04-14
#>  pandoc   3.1.11 @ /opt/hostedtoolcache/pandoc/3.1.11/x64/ (via rmarkdown)
#>  quarto   NA
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package          * version date (UTC) lib source
#>  ade4               1.7-23  2025-02-14 [1] RSPM
#>  ape                5.8-1   2024-12-16 [1] RSPM
#>  Biobase            2.67.0  2024-10-29 [1] Bioconduc~
#>  BiocGenerics       0.53.6  2025-01-27 [1] Bioconduc~
#>  biomformat         1.35.0  2024-10-29 [1] Bioconduc~
#>  Biostrings         2.75.4  2025-02-21 [1] Bioconduc~
#>  bslib              0.9.0   2025-01-30 [1] RSPM
#>  cachem             1.1.0   2024-05-16 [1] RSPM
#>  cli                3.6.4   2025-02-13 [1] RSPM
#>  cluster            2.1.8.1 2025-03-12 [3] CRAN (R 4.5.0)
#>  codetools          0.2-20  2024-03-31 [3] CRAN (R 4.5.0)
#>  colorspace         2.1-1   2024-07-26 [1] RSPM
#>  crayon             1.5.3   2024-06-20 [1] RSPM
#>  data.table         1.17.0  2025-02-22 [1] RSPM
#>  desc               1.4.3   2023-12-10 [1] RSPM
#>  devtools           2.4.5   2022-10-11 [1] RSPM
#>  digest             0.6.37  2024-08-19 [1] RSPM
#>  dplyr              1.1.4   2023-11-17 [1] RSPM
#>  ellipsis           0.3.2   2021-04-29 [1] RSPM
#>  evaluate           1.0.3   2025-01-10 [1] RSPM
#>  fastmap            1.2.0   2024-05-15 [1] RSPM
#>  foreach            1.5.2   2022-02-02 [1] RSPM
#>  fs                 1.6.5   2024-10-30 [1] RSPM
#>  generics           0.1.3   2022-07-05 [1] RSPM
#>  GenomeInfoDb       1.43.4  2025-01-24 [1] Bioconduc~
#>  GenomeInfoDbData   1.2.14  2025-04-14 [1] Bioconductor
#>  ggplot2            3.5.2   2025-04-09 [1] RSPM
#>  glue               1.8.0   2024-09-30 [1] RSPM
#>  gtable             0.3.6   2024-10-25 [1] RSPM
#>  htmltools          0.5.8.1 2024-04-04 [1] RSPM
#>  htmlwidgets        1.6.4   2023-12-06 [1] RSPM
#>  httpuv             1.6.15  2024-03-26 [1] RSPM
#>  httr               1.4.7   2023-08-15 [1] RSPM
#>  igraph             2.1.4   2025-01-23 [1] RSPM
#>  IRanges            2.41.3  2025-02-12 [1] Bioconduc~
#>  iterators          1.0.14  2022-02-05 [1] RSPM
#>  jquerylib          0.1.4   2021-04-26 [1] RSPM
#>  jsonlite           2.0.0   2025-03-27 [1] RSPM
#>  knitr              1.50    2025-03-16 [1] RSPM
#>  later              1.4.2   2025-04-08 [1] RSPM
#>  lattice            0.22-6  2024-03-20 [3] CRAN (R 4.5.0)
#>  lifecycle          1.0.4   2023-11-07 [1] RSPM
#>  magrittr           2.0.3   2022-03-30 [1] RSPM
#>  MASS               7.3-65  2025-02-28 [3] CRAN (R 4.5.0)
#>  Matrix             1.7-3   2025-03-11 [3] CRAN (R 4.5.0)
#>  memoise            2.0.1   2021-11-26 [1] RSPM
#>  mgcv               1.9-1   2023-12-21 [3] CRAN (R 4.5.0)
#>  microViz         * 0.12.7  2025-04-14 [1] local
#>  mime               0.13    2025-03-17 [1] RSPM
#>  miniUI             0.1.1.1 2018-05-18 [1] RSPM
#>  multtest           2.63.0  2024-10-29 [1] Bioconduc~
#>  munsell            0.5.1   2024-04-01 [1] RSPM
#>  nlme               3.1-168 2025-03-31 [3] CRAN (R 4.5.0)
#>  permute            0.9-7   2022-01-27 [1] RSPM
#>  phyloseq         * 1.51.0  2025-01-23 [1] Bioconduc~
#>  pillar             1.10.2  2025-04-05 [1] RSPM
#>  pkgbuild           1.4.7   2025-03-24 [1] RSPM
#>  pkgconfig          2.0.3   2019-09-22 [1] RSPM
#>  pkgdown            2.1.1   2024-09-17 [1] RSPM
#>  pkgload            1.4.0   2024-06-28 [1] RSPM
#>  plyr               1.8.9   2023-10-02 [1] RSPM
#>  profvis            0.4.0   2024-09-20 [1] RSPM
#>  promises           1.3.2   2024-11-28 [1] RSPM
#>  purrr              1.0.4   2025-02-05 [1] RSPM
#>  R6                 2.6.1   2025-02-15 [1] RSPM
#>  ragg               1.4.0   2025-04-10 [1] RSPM
#>  Rcpp               1.0.14  2025-01-12 [1] RSPM
#>  remotes            2.5.0   2024-03-17 [1] RSPM
#>  reshape2           1.4.4   2020-04-09 [1] RSPM
#>  rhdf5              2.51.2  2025-01-08 [1] Bioconduc~
#>  rhdf5filters       1.19.2  2025-03-05 [1] Bioconduc~
#>  Rhdf5lib           1.29.2  2025-03-24 [1] Bioconduc~
#>  rlang              1.1.5   2025-01-17 [1] RSPM
#>  rmarkdown          2.29    2024-11-04 [1] RSPM
#>  S4Vectors          0.45.4  2025-02-11 [1] Bioconduc~
#>  sass               0.4.9   2024-03-15 [1] RSPM
#>  scales             1.3.0   2023-11-28 [1] RSPM
#>  sessioninfo        1.2.3   2025-02-05 [1] RSPM
#>  shiny              1.10.0  2024-12-14 [1] RSPM
#>  stringi            1.8.7   2025-03-27 [1] RSPM
#>  stringr            1.5.1   2023-11-14 [1] RSPM
#>  survival           3.8-3   2024-12-17 [3] CRAN (R 4.5.0)
#>  systemfonts        1.2.2   2025-04-04 [1] RSPM
#>  textshaping        1.0.0   2025-01-20 [1] RSPM
#>  tibble             3.2.1   2023-03-20 [1] RSPM
#>  tidyselect         1.2.1   2024-03-11 [1] RSPM
#>  UCSC.utils         1.3.1   2025-01-15 [1] Bioconduc~
#>  urlchecker         1.0.1   2021-11-30 [1] RSPM
#>  usethis            3.1.0   2024-11-26 [1] RSPM
#>  vctrs              0.6.5   2023-12-01 [1] RSPM
#>  vegan              2.6-10  2025-01-29 [1] RSPM
#>  withr              3.0.2   2024-10-28 [1] RSPM
#>  xfun               0.52    2025-04-02 [1] RSPM
#>  xtable             1.8-4   2019-04-21 [1] RSPM
#>  XVector            0.47.2  2025-01-08 [1] Bioconduc~
#>  yaml               2.3.10  2024-07-26 [1] RSPM
#> 
#>  [1] /home/runner/work/_temp/Library
#>  [2] /opt/R/4.5.0/lib/R/site-library
#>  [3] /opt/R/4.5.0/lib/R/library
#>  * ── Packages attached to the search path.
#> 
#> ──────────────────────────────────────────────────────────────────────────────