Interactive consensus building for cell type annotation
Source:R/consensus_annotation.R
interactive_consensus_annotation.Rd
This function implements an interactive voting and discussion mechanism where multiple LLMs collaborate to reach a consensus on cell type annotations, particularly focusing on clusters with low agreement. The process includes:
Initial voting by all LLMs
Identification of controversial clusters
Detailed discussion for controversial clusters
Final summary by a designated LLM (default: Claude)
Usage
interactive_consensus_annotation(
input,
tissue_name = NULL,
models = c("claude-sonnet-4-20250514", "claude-3-7-sonnet-20250219",
"claude-3-5-sonnet-20241022", "claude-3-5-haiku-20241022", "gemini-2.0-flash",
"gemini-1.5-pro", "qwen-max-2025-01-25", "gpt-4o", "grok-3-latest"),
api_keys,
top_gene_count = 10,
controversy_threshold = 0.7,
entropy_threshold = 1,
max_discussion_rounds = 3,
consensus_check_model = NULL,
log_dir = "logs",
cache_dir = NULL,
use_cache = TRUE,
base_urls = NULL,
clusters_to_analyze = NULL,
force_rerun = FALSE
)
Arguments
- input
Either a data frame from Seurat's FindAllMarkers() function containing differential gene expression results (must have columns: 'cluster', 'gene', and 'avg_log2FC'), or a list where each element has a 'genes' field containing marker genes for a cluster. Cluster IDs must be numeric starting from 0.
- tissue_name
Character string specifying the tissue type for context-aware cell type annotation. If NULL, generic cell type annotation will be performed.
- models
Character vector of model names to use for consensus annotation. Minimum 2 models required. Supports models from OpenAI, Anthropic, DeepSeek, Google, Alibaba, Stepfun, Zhipu, MiniMax, X.AI, and OpenRouter.
- api_keys
Named list of API keys. Can use provider names as keys (e.g., "openai", "anthropic") or model names as keys (e.g., "gpt-4o").
- top_gene_count
Integer specifying the number of top marker genes to use for annotation per cluster (default: 10).
- controversy_threshold
Numeric value between 0 and 1 for consensus proportion threshold. Clusters below this threshold are considered controversial (default: 0.7).
- entropy_threshold
Numeric value for entropy threshold. Higher entropy indicates more disagreement among models (default: 1.0).
- max_discussion_rounds
Integer specifying maximum number of discussion rounds for controversial clusters (default: 3).
- consensus_check_model
Character string specifying which model to use for consensus checking. If NULL, uses the first model from the models list.
- log_dir
Character string specifying directory for log files (default: "logs").
- cache_dir
Character string or NULL. Cache directory for storing results. NULL uses system cache, "local" uses current directory, "temp" uses temporary directory, or specify custom path.
- use_cache
Logical indicating whether to use caching (default: TRUE).
- base_urls
Named list or character string specifying custom API base URLs. Useful for proxies or alternative endpoints. If NULL, uses official endpoints.
- clusters_to_analyze
Character or numeric vector specifying which clusters to analyze. If NULL (default), all clusters are analyzed.
- force_rerun
Logical indicating whether to force rerun of all specified clusters, ignoring cache. Only affects controversial cluster discussions (default: FALSE).