Skip to contents

Performs differential abundance analysis using LinDA (for count/proportion data) or standard linear models (for other data types). Automatically detects whether the predictor is categorical or continuous.

Usage

generate_taxa_test_single(
  data.obj,
  time.var = NULL,
  t.level = NULL,
  group.var,
  ref.level = NULL,
  adj.vars = NULL,
  prev.filter = 0,
  abund.filter = 0,
  feature.level,
  feature.dat.type = c("count", "proportion", "other"),
  ...
)

Arguments

data.obj

A MicrobiomeStat data object, which is a list containing at minimum the following components:

  • feature.tab: A matrix of feature abundances (taxa/genes as rows, samples as columns)

  • meta.dat: A data frame of sample metadata (samples as rows)

Optional components include:

  • feature.ann: A matrix/data frame of feature annotations (e.g., taxonomy)

  • tree: A phylogenetic tree object (class "phylo")

  • feature.agg.list: Pre-aggregated feature tables by taxonomy

Data objects can be created using converters like mStat_convert_phyloseq_to_data_obj or importers like mStat_import_qiime2_as_data_obj.

time.var

Character string specifying the column name in meta.dat containing the time variable. Required for longitudinal and paired analyses. Should be a factor or character with meaningful time point labels.

t.level

Character string specifying the time level/value to subset data to. Default NULL does not subset data.

group.var

Character string specifying the column name in meta.dat containing the grouping variable (e.g., treatment, condition, phenotype). Used for between-group comparisons.

ref.level

Character string specifying the reference level for categorical group.var. If NULL, the first level alphabetically is used. Ignored for continuous variables.

adj.vars

Character vector specifying column names in meta.dat to be used as covariates for adjustment in statistical models. These variables will be included as fixed effects.

prev.filter

Numeric value between 0 and 1. Features with prevalence (proportion of non-zero samples) below this threshold will be excluded from analysis. Default is usually 0 (no filtering).

abund.filter

Numeric value. Features with mean abundance below this threshold will be excluded from analysis. Default is usually 0 (no filtering).

feature.level

Character vector specifying the taxonomic or annotation level(s) for analysis. Should match column names in feature.ann, such as "Phylum", "Family", "Genus", etc. Use "original" to analyze at the original feature level without aggregation.

feature.dat.type

Character string specifying the data type of feature.tab. One of:

  • "count": Raw count data (will be normalized)

  • "proportion": Relative abundance data (should sum to 1 per sample)

  • "other": Pre-transformed data (no transformation applied)

...

Additional arguments passed to the linda function.

Value

A nested list structure where:

  • First level: Named by feature.level (e.g., "Phylum", "Genus")

  • Second level: Named by comparisons

    • For categorical group.var: One element per non-reference level, named as "Level vs Reference (Reference)" (e.g., "Treatment vs Control (Reference)")

    • For continuous group.var: One element named by the variable name (e.g., "age")

  • Each element is a data.frame with the following columns:

    • Variable: Feature/taxon name

    • Coefficient: Log2 fold change (for categorical) or slope coefficient (for continuous)

    • SE: Standard error of the coefficient

    • P.Value: Raw p-value from the statistical test

    • Adjusted.P.Value: FDR-adjusted p-value using Benjamini-Hochberg method

    • Mean.Abundance: Mean abundance of the feature across all samples

    • Prevalence: Proportion of samples where the feature is present (non-zero)

Details

## Statistical Methods

The function automatically selects the appropriate statistical method based on feature.dat.type:

**For "count" or "proportion" data:** Uses LinDA (Linear models for Differential Abundance), which:

  • Handles compositional nature of microbiome data

  • Performs internal normalization (no pre-normalization needed)

  • Handles zero-inflation via pseudo-count or imputation

  • Corrects bias from centered log-ratio transformation

**For "other" data:** Uses standard linear regression models, suitable for:

  • Pre-transformed data (e.g., log-transformed, CLR-transformed)

  • Data where compositional adjustments are not needed

## Variable Type Handling

The function automatically detects the type of group.var:

**Categorical variables (factor or character):**

  • Uses dummy coding with k-1 coefficients for k levels

  • First level (alphabetically) becomes the reference

  • Each coefficient compares a level to the reference

  • Example: For treatment groups A, B, C, output includes "B vs A (Reference)" and "C vs A (Reference)"

**Continuous variables (numeric or integer):**

  • Uses single slope coefficient

  • Tests for linear association with abundance

  • Coefficient represents effect per unit increase

  • Example: For age, coefficient shows abundance change per year

## Analysis Workflow

1. **Data Subsetting**: If time.var and t.level provided, subsets to that timepoint

2. **Feature Aggregation**: Aggregates features to specified taxonomic level if not "original"

3. **Filtering**: Applies prevalence and abundance filters

4. **Statistical Testing**: Runs LinDA or linear models depending on feature.dat.type

5. **Results Compilation**: Extracts coefficients, standard errors, p-values, and calculates FDR-adjusted p-values using Benjamini-Hochberg method

6. **Metadata Addition**: Adds mean abundance and prevalence for each feature

The function supports covariate adjustment via adj.vars and allows taxonomic aggregation at multiple levels for customized analyses.

Examples

if (FALSE) { # \dontrun{
data(peerj32.obj)
test.list <- generate_taxa_test_single(
    data.obj = peerj32.obj,
    time.var = "time",
    t.level = "2",
    group.var = "group",
    adj.vars = "sex",
    feature.dat.type = "count",
    feature.level = c("Phylum","Genus","Family"),
    prev.filter = 0.1,
    abund.filter = 0.0001,
)
plot.list <- generate_taxa_volcano_single(
    data.obj = peerj32.obj,
    group.var = "group",
    test.list = test.list,
    feature.sig.level = 0.1,
    feature.mt.method = "none"
)
} # }