Removes taxa (from all samples) that do not meet a given criterion or combination of criteria. If a value for min_prevalence, min_total_abundance or min_sample_abundance is 1 or greater, then it is treated as an absolute minimum number of samples/reads. If <1, it is treated as proportion of all samples/reads. This function is designed to work with counts. otu_table must contain counts particularly if you want to set a non-zero value for min_total_abundance.
Usage
tax_filter(
ps,
min_prevalence = 1,
prev_detection_threshold = 1,
min_total_abundance = 0,
min_sample_abundance = 0,
tax_level = NA,
names_only = FALSE,
use_counts = TRUE,
undetected = NULL,
verbose = TRUE
)
Arguments
- ps
phyloseq or psExtra (ideally with count data available)
- min_prevalence
number or proportion of samples that a taxon must be present in
- prev_detection_threshold
min required counts (or value) for a taxon to be considered present in that sample (or set undetected arg)
- min_total_abundance
minimum total readcount of a taxon, summed across all samples (can be proportion of all counts)
- min_sample_abundance
taxa must have at least this many reads in one or more samples
- tax_level
if given, aggregates data at named taxonomic rank before filtering, but returns phyloseq at the ORIGINAL level of aggregation!
- names_only
if names_only is true return only names of taxa, not the phyloseq
- use_counts
expect count data in phyloseq otu_table? default is TRUE
- undetected
e.g. 0, value at (or below) which a taxon is considered not present in that sample. If set, this overrides prev_detection_threshold.
- verbose
message about proportional prevalence calculations?
Examples
data("dietswap", package = "microbiome")
# Dropping rare and low abundance taxa #
# Filter at unique taxa level, keeping only those with a prevalence of 10% or more
# and at least 10 thousand reads when summed across all samples.
# Then aggregate to Family level taxonomy.
dietswap %>%
tax_filter(min_prevalence = 0.1, min_total_abundance = 10000) %>%
tax_agg("Family")
#> Proportional min_prevalence given: 0.1 --> min 23/222 samples.
#> psExtra object - a phyloseq object with extra slots:
#>
#> phyloseq-class experiment-level object
#> otu_table() OTU Table: [ 9 taxa and 222 samples ]
#> sample_data() Sample Data: [ 222 samples by 8 sample variables ]
#> tax_table() Taxonomy Table: [ 9 taxa by 2 taxonomic ranks ]
#>
#> psExtra info:
#> tax_agg = "Family"
# Keeping ubiquitous families #
# keep only families that have at least 1000 counts present in 90% of samples
# then aggregate the remaining taxa at 'Genus' level
dietswap %>%
tax_filter(
tax_level = "Family", min_prevalence = 0.90,
prev_detection_threshold = 1000
) %>%
tax_agg("Genus")
#> Proportional min_prevalence given: 0.9 --> min 200/222 samples.
#> psExtra object - a phyloseq object with extra slots:
#>
#> phyloseq-class experiment-level object
#> otu_table() OTU Table: [ 28 taxa and 222 samples ]
#> sample_data() Sample Data: [ 222 samples by 8 sample variables ]
#> tax_table() Taxonomy Table: [ 28 taxa by 3 taxonomic ranks ]
#>
#> psExtra info:
#> tax_agg = "Genus"