Skip to contents

Generates a comprehensive PDF/HTML report for longitudinal microbiome analysis including alpha diversity, beta diversity, and taxonomic composition over time.

Usage

mStat_generate_report_long(
  data.obj,
  group.var,
  vis.adj.vars = NULL,
  test.adj.vars = NULL,
  strata.var = NULL,
  subject.var,
  time.var,
  t0.level = NULL,
  ts.levels = NULL,
  depth = NULL,
  alpha.obj = NULL,
  alpha.name = c("shannon", "observed_species"),
  dist.obj = NULL,
  dist.name = c("BC", "Jaccard"),
  pc.obj = NULL,
  prev.filter = 0.1,
  abund.filter = 1e-04,
  bar.area.feature.no = 30,
  heatmap.feature.no = 30,
  vis.feature.level = NULL,
  test.feature.level = NULL,
  feature.dat.type = c("count", "proportion", "other"),
  feature.analysis.rarafy = TRUE,
  feature.change.func = "relative change",
  feature.mt.method = "none",
  feature.sig.level = 0.1,
  feature.box.axis.transform = c("sqrt"),
  base.size = 16,
  theme.choice = "bw",
  custom.theme = NULL,
  palette = NULL,
  pdf = TRUE,
  file.ann = NULL,
  pdf.wid = 11,
  pdf.hei = 8.5,
  output.file,
  output.format = c("pdf", "html")
)

Arguments

data.obj

A MicrobiomeStat data object, which is a list containing at minimum the following components:

  • feature.tab: A matrix of feature abundances (taxa/genes as rows, samples as columns)

  • meta.dat: A data frame of sample metadata (samples as rows)

Optional components include:

  • feature.ann: A matrix/data frame of feature annotations (e.g., taxonomy)

  • tree: A phylogenetic tree object (class "phylo")

  • feature.agg.list: Pre-aggregated feature tables by taxonomy

Data objects can be created using converters like mStat_convert_phyloseq_to_data_obj or importers like mStat_import_qiime2_as_data_obj.

group.var

Character string specifying the column name in meta.dat containing the grouping variable (e.g., treatment, condition, phenotype). Used for between-group comparisons.

vis.adj.vars

Character vector of covariate names to visualize in plots.

test.adj.vars

Character vector of covariate names for statistical adjustment.

strata.var

Character string specifying the column name in meta.dat for stratification. When provided, analyses and visualizations will be performed separately within each stratum (e.g., by site, batch, or sex).

subject.var

Character string specifying the column name in meta.dat that uniquely identifies each subject or sample unit. Required for longitudinal and paired designs to track repeated measurements.

time.var

Character string specifying the column name in meta.dat containing the time variable. Required for longitudinal and paired analyses. Should be a factor or character with meaningful time point labels.

t0.level

Character or numeric value specifying the baseline time point for longitudinal or paired analyses. Should match a value in the time.var column.

ts.levels

Character vector specifying the follow-up time points for longitudinal or paired analyses. Should match values in the time.var column.

depth

Numeric value or NULL. Rarefaction depth for diversity calculations. If NULL, uses minimum sample depth or no rarefaction.

alpha.obj

A list containing pre-calculated alpha diversity indices. If NULL and alpha diversity is needed, it will be calculated automatically. Names should match the alpha.name parameter (e.g., "shannon", "simpson"). See mStat_calculate_alpha_diversity.

alpha.name

Character vector specifying which alpha diversity indices to analyze. Options include:

  • "shannon": Shannon diversity index

  • "simpson": Simpson diversity index

  • "observed_species": Observed species richness

  • "chao1": Chao1 richness estimator

  • "ace": ACE richness estimator

  • "pielou": Pielou's evenness

dist.obj

A list of pre-calculated distance matrices. If NULL and distances are needed, they will be calculated automatically. List names should match dist.name (e.g., "BC" for Bray-Curtis). See mStat_calculate_beta_diversity.

dist.name

Character vector specifying which distance metrics to use. Options depend on available methods:

  • "BC": Bray-Curtis dissimilarity

  • "Jaccard": Jaccard distance

  • "UniFrac": Unweighted UniFrac (requires tree)

  • "GUniFrac": Generalized UniFrac (requires tree)

  • "WUniFrac": Weighted UniFrac (requires tree)

  • "JS": Jensen-Shannon divergence

pc.obj

Pre-calculated PCoA/PCA results from mStat_calculate_PC. If NULL, computed automatically.

prev.filter

Numeric value between 0 and 1. Features with prevalence (proportion of non-zero samples) below this threshold will be excluded from analysis. Default is usually 0 (no filtering).

abund.filter

Numeric value. Features with mean abundance below this threshold will be excluded from analysis. Default is usually 0 (no filtering).

bar.area.feature.no

Number of top features to show in bar/area plots (default 20).

heatmap.feature.no

Number of top features to show in heatmaps (default 20).

vis.feature.level

Taxonomic level(s) for visualization.

test.feature.level

Taxonomic level(s) for statistical testing.

feature.dat.type

Character string specifying the data type of feature.tab. One of:

  • "count": Raw count data (will be normalized)

  • "proportion": Relative abundance data (should sum to 1 per sample)

  • "other": Pre-transformed data (no transformation applied)

feature.analysis.rarafy

Logical, whether to rarefy data for feature analysis (default TRUE).

feature.change.func

Method for calculating change: "relative change", "absolute change", or "log fold change".

feature.mt.method

Multiple testing correction: "fdr" or "none" (default "fdr").

feature.sig.level

Significance threshold for highlighting features (default 0.1).

feature.box.axis.transform

Y-axis transformation for boxplots: "identity", "sqrt", or "log".

base.size

Numeric value specifying the base font size for plot text elements. Default is typically 16.

theme.choice

Character string specifying the ggplot2 theme to use. Options include:

  • "bw": Black and white theme (theme_bw)

  • "classic": Classic theme (theme_classic)

  • "minimal": Minimal theme (theme_minimal)

  • "prism": GraphPad Prism-like theme

  • "nature": Nature journal style

  • "light": Light theme (theme_light)

Can also use a custom ggplot2 theme object via custom.theme.

custom.theme

A custom ggplot2 theme object to override theme.choice. Should be created using ggplot2::theme() or a complete theme function.

palette

Character vector of colors or a named palette for the plot. If NULL, uses default MicrobiomeStat color scheme. Can be:

  • A vector of color codes (e.g., c("#E41A1C", "#377EB8"))

  • A palette name recognized by the plotting function

pdf

Logical. If TRUE, saves the plot(s) to PDF file(s) in the current working directory. Default is TRUE.

file.ann

Character string for additional annotation to append to output filenames. Useful for distinguishing multiple outputs.

pdf.wid

Numeric value specifying the width of PDF output in inches. Default is typically 11.

pdf.hei

Numeric value specifying the height of PDF output in inches. Default is typically 8.5.

output.file

Output report filename (required).

output.format

Output format: "pdf" or "html".

Value

A PDF report containing:

- Summary statistics table: Sample size, covariates, time points

- Alpha diversity boxplots: Boxplots colored by time and groups

- Alpha diversity spaghetti plots: Line plots of trajectories

- Alpha diversity trends: Tables with LME model results

- Alpha volatility: Tables with LM model results

- Beta diversity PCoA plots: Ordination plots colored by time and groups

- Beta distance boxplots: Boxplots colored by time and groups

- Beta spaghetti plots: Individual line plots

- Beta diversity trends: Tables with LME results on distances

- Beta volatility: Tables with LM results on distances

- Feature area plots: Stacked area plots showing composition

- Feature heatmaps: Heatmaps colored by relative abundance

- Feature volcano plots: With trend/volatility significance

- Feature boxplots: Distribution by time and groups

- Feature spaghetti plots: Individual line plots

Details

This function generates a comprehensive longitudinal microbiome report with:

1. Summary Statistics - Table with sample size, number of timepoints, covariates

2. Alpha Diversity Analysis - Boxplots of alpha diversity vs time, colored by groups - Spaghetti plots of alpha trajectories for each subject - Linear mixed effects model results for alpha diversity trend - Linear model results for alpha diversity volatility

3. Beta Diversity Analysis - PCoA plots colored by time points and groups - Boxplots of PCoA coordinate 1 vs time, colored by groups - Spaghetti plots of distance from baseline vs time - Linear mixed effects models for beta diversity distance trend - Linear models for beta diversity volatility

4. Taxonomic Composition Analysis - Stacked area plots of average composition by time and groups - Heatmaps of relative abundance colored from low (blue) to high (red) - Volcano plots highlighting significant taxa in trend and volatility - Boxplots of significant taxa by time and groups - Spaghetti plots for significant taxa vs time

Author

Chen Yang [email protected]

Examples

if (FALSE) { # \dontrun{
data(subset_T2D.obj)
mStat_generate_report_long(
  data.obj = subset_T2D.obj,
  group.var = "subject_race",
  strata.var = NULL,
  test.adj.vars = NULL,
  vis.adj.vars = NULL,
  subject.var = "subject_id",
  time.var = "visit_number_num",
  t0.level = NULL,
  ts.levels = NULL,
  alpha.obj = NULL,
  alpha.name = c("shannon","observed_species"),
  dist.obj = NULL,
  dist.name = c("BC",'Jaccard'),
  pc.obj = NULL,
  feature.mt.method = "none",
  feature.sig.level = 0.3,
  vis.feature.level = c("Family","Genus"),
  test.feature.level = c("Family"),
  feature.change.func = "relative change",
  feature.dat.type = "count",
  prev.filter = 0.1,
  abund.filter = 1e-4,
  bar.area.feature.no = 40,
  heatmap.feature.no = 40,
  feature.box.axis.transform = "sqrt",
  theme.choice = "bw",
  base.size = 20,
  output.file = "/Users/apple/Research/MicrobiomeStat/result/per_time_test_report.pdf",
  output.format = "pdf"
)

mStat_generate_report_long(
  data.obj = subset_T2D.obj,
  group.var = "subject_race",
  strata.var = NULL,
  test.adj.vars = NULL,
  vis.adj.vars = NULL,
  subject.var = "subject_id",
  time.var = "visit_number_num",
  t0.level = NULL,
  ts.levels = NULL,
  alpha.obj = NULL,
  alpha.name = c("shannon","observed_species"),
  dist.obj = NULL,
  dist.name = c("BC",'Jaccard'),
  pc.obj = NULL,
  feature.mt.method = "none",
  feature.sig.level = 0.3,
  vis.feature.level = c("Family","Genus"),
  test.feature.level = c("Family"),
  feature.change.func = "relative change",
  feature.dat.type = "count",
  prev.filter = 0.1,
  abund.filter = 1e-4,
  bar.area.feature.no = 40,
  heatmap.feature.no = 40,
  feature.box.axis.transform = "sqrt",
  theme.choice = "bw",
  base.size = 20,
  output.file = "/Users/apple/Research/MicrobiomeStat/result/per_time_test_report.html",
  output.format = "html"
)
data(ecam.obj)
mStat_generate_report_long(
  data.obj = ecam.obj,
  group.var = "antiexposedall",
  strata.var = NULL,
  test.adj.vars = "delivery",
  vis.adj.vars = "delivery",
  subject.var = "subject.id",
  time.var = "month_num",
  t0.level = NULL,
  ts.levels = NULL,
  alpha.obj = NULL,
  alpha.name = c("shannon","observed_species"),
  dist.obj = NULL,
  dist.name = c("BC",'Jaccard'),
  pc.obj = NULL,
  feature.mt.method = "none",
  feature.sig.level = 0.3,
  vis.feature.level = c("Family","Genus"),
  test.feature.level = c("Family"),
  feature.change.func = "relative change",
  feature.dat.type = "proportion",
  prev.filter = 0.1,
  abund.filter = 1e-4,
  bar.area.feature.no = 40,
  heatmap.feature.no = 40,
  feature.box.axis.transform = "sqrt",
  theme.choice = "bw",
  base.size = 20,
  output.file = "/Users/apple/Research/MicrobiomeStat/result/ecam_obj_report_long.pdf"
)

mStat_generate_report_long(
  data.obj = ecam.obj,
  group.var = "antiexposedall",
  strata.var = NULL,
  test.adj.vars = "delivery",
  vis.adj.vars = "delivery",
  subject.var = "subject.id",
  time.var = "month_num",
  t0.level = NULL,
  ts.levels = NULL,
  alpha.obj = NULL,
  alpha.name = c("shannon","observed_species"),
  dist.obj = NULL,
  dist.name = c("BC",'Jaccard'),
  pc.obj = NULL,
  feature.mt.method = "none",
  feature.sig.level = 0.3,
  vis.feature.level = c("Family","Genus"),
  test.feature.level = c("Family"),
  feature.change.func = "relative change",
  feature.dat.type = "proportion",
  prev.filter = 0.1,
  abund.filter = 1e-4,
  bar.area.feature.no = 40,
  heatmap.feature.no = 40,
  feature.box.axis.transform = "sqrt",
  theme.choice = "bw",
  base.size = 20,
  output.file = "/Users/apple/Research/MicrobiomeStat/result/ecam_obj_report_long.html",
  output.format = "html"
)
} # }