Longitudinal Taxa Association Test — generate_taxa_association_test

Tests associations between taxa abundances and a grouping variable in longitudinal microbiome data using LinDA mixed-effects models.

Usage

generate_taxa_association_test_long(
  data.obj,
  subject.var,
  group.var,
  adj.vars = NULL,
  prev.filter = 0,
  abund.filter = 0,
  feature.level,
  feature.dat.type = c("count", "proportion", "other"),
  ...
)

Arguments

data.obj

A MicrobiomeStat data object, which is a list containing at minimum the following components:

feature.tab: A matrix of feature abundances (taxa/genes as rows, samples as columns)
meta.dat: A data frame of sample metadata (samples as rows)

Optional components include:

feature.ann: A matrix/data frame of feature annotations (e.g., taxonomy)
tree: A phylogenetic tree object (class "phylo")
feature.agg.list: Pre-aggregated feature tables by taxonomy

Data objects can be created using converters like mStat_convert_phyloseq_to_data_obj or importers like mStat_import_qiime2_as_data_obj.

subject.var

Character string specifying the column name in meta.dat that uniquely identifies each subject or sample unit. Required for longitudinal and paired designs to track repeated measurements.

group.var

Character string specifying the column name in meta.dat containing the grouping variable (e.g., treatment, condition, phenotype). Used for between-group comparisons.

adj.vars

Character vector specifying column names in meta.dat to be used as covariates for adjustment in statistical models. These variables will be included as fixed effects.

prev.filter

Numeric value between 0 and 1. Features with prevalence (proportion of non-zero samples) below this threshold will be excluded from analysis. Default is usually 0 (no filtering).

abund.filter

Numeric value. Features with mean abundance below this threshold will be excluded from analysis. Default is usually 0 (no filtering).

feature.level

Character vector specifying the taxonomic or annotation level(s) for analysis. Should match column names in feature.ann, such as "Phylum", "Family", "Genus", etc. Use "original" to analyze at the original feature level without aggregation.

feature.dat.type

Character string specifying the data type of feature.tab. One of:

"count": Raw count data (will be normalized)
"proportion": Relative abundance data (should sum to 1 per sample)
"other": Pre-transformed data (no transformation applied)

...

Additional arguments passed to underlying functions.

Value

A nested list structure where:

First level: Named by feature.level (e.g., "Phylum", "Genus")
Second level: Named by tested comparisons between groups (e.g., "Level vs Reference (Reference)" for categorical variables, or variable name for continuous variables)
Each element is a data.frame with the following columns:
- Variable: Feature/taxon name
- Coefficient: Log2 fold change (categorical) or slope (continuous)
- SE: Standard error of the coefficient
- P.Value: Raw p-value from LinDA's statistical test
- Adjusted.P.Value: FDR-adjusted p-value (Benjamini-Hochberg)
- Mean.Abundance: Mean abundance across all samples
- Prevalence: Proportion of samples where feature is present (non-zero)

Analysis uses LinDA mixed-effects models to test associations between taxa abundances and the grouping variable, accounting for subject-level random effects.

Details

Based on whether adj.vars is NULL, the formula tests:

- When adj.vars is NOT NULL: - Tests group.var and adj.vars main effects. - Adjusted for adj.vars.

- When adj.vars is NULL: - Tests group.var main effects only. - Unadjusted analysis.

Subject variability is accounted for through random effects.

Examples

if (FALSE) { # \dontrun{
# Example 1: Generate taxa association tests and volcano plots for the ecam dataset
data("ecam.obj")
test.list <- generate_taxa_association_test_long(
  data.obj = ecam.obj,
  subject.var = "studyid",
  group.var = "delivery",
  feature.level = c("Phylum", "Class"),
  feature.dat.type = c("count")
)

volcano_plots_ecam <- generate_taxa_volcano_single(
  data.obj = ecam.obj,
  group.var = "delivery",
  test.list = test.list,
  feature.sig.level = 0.1,
  feature.mt.method = "fdr"
)


# Example 2: Generate taxa association tests and volcano plots for a subset of the T2D dataset
data("subset_T2D.obj")
test.list_T2D <- generate_taxa_association_test_long(
  data.obj = subset_T2D.obj,
  subject.var = "subject_id",
  feature.level = "Genus",
  group.var = "subject_race",
  feature.dat.type = c("count"),
  prev.filter = 0.1,
  abund.filter = 0.001
)

volcano_plots_T2D <- generate_taxa_volcano_single(
  data.obj = subset_T2D.obj,
  group.var = "subject_race",
  test.list = test.list_T2D,
  feature.sig.level = 0.1,
  feature.mt.method = "none"
)
} # }