
Interactive consensus building for cell type annotation
Source:R/consensus_annotation.R
interactive_consensus_annotation.Rd
This function implements an interactive voting and discussion mechanism where multiple LLMs collaborate to reach a consensus on cell type annotations, particularly focusing on clusters with low agreement. The process includes:
Initial voting by all LLMs
Identification of controversial clusters
Detailed discussion for controversial clusters
Final summary by a designated LLM (default: Claude)
This function implements an interactive voting and discussion mechanism where multiple LLMs collaborate to reach a consensus on cell type annotations, particularly focusing on clusters with low agreement. The process includes:
Initial voting by all LLMs
Identification of controversial clusters
Detailed discussion for controversial clusters
Final summary by a designated LLM (default: Claude)
Usage
interactive_consensus_annotation(
input,
tissue_name = NULL,
models = c("claude-3-7-sonnet-20250219", "claude-3-5-sonnet-latest",
"claude-3-5-haiku-latest", "gemini-2.0-flash", "gemini-1.5-pro",
"qwen-max-2025-01-25", "gpt-4o", "grok-3-latest"),
api_keys,
top_gene_count = 10,
controversy_threshold = 0.7,
entropy_threshold = 1,
max_discussion_rounds = 3,
consensus_check_model = NULL,
log_dir = "logs",
cache_dir = "consensus_cache",
use_cache = TRUE
)
Arguments
- input
One of the following:
A data frame from Seurat's FindAllMarkers() function containing differential gene expression results (must have columns: 'cluster', 'gene', and 'avg_log2FC'). The function will select the top genes based on avg_log2FC for each cluster.
A list where each element has a 'genes' field containing marker genes for a cluster. This can be in one of these formats:
Named with cluster IDs: list("0" = list(genes = c(...)), "1" = list(genes = c(...)))
Named with cell type names: list(t_cells = list(genes = c(...)), b_cells = list(genes = c(...)))
Unnamed list: list(list(genes = c(...)), list(genes = c(...)))
For both input types, if cluster IDs are numeric and start from 1, they will be automatically converted to 0-based indexing (e.g., cluster 1 becomes cluster 0) for consistency.
- tissue_name
Optional input of tissue name
- models
Vector of model names to participate in the discussion. Supported models:
OpenAI: 'gpt-4o', 'o1'
Anthropic: 'claude-3-7-sonnet-20250219', 'claude-3-5-sonnet-latest', 'claude-3-5-haiku-latest', 'claude-3-opus'
DeepSeek: 'deepseek-chat', 'deepseek-reasoner'
Google: 'gemini-2.0-flash', 'gemini-2.0-flash-exp', 'gemini-1.5-pro', 'gemini-1.5-flash'
Alibaba: 'qwen-max-2025-01-25'
Stepfun: 'step-2-16k', 'step-2-mini', 'step-1-8k'
Zhipu: 'glm-4-plus', 'glm-3-turbo'
MiniMax: 'minimax-text-01'
X.AI: 'grok-3-latest', 'grok-3', 'grok-3-fast', 'grok-3-fast-latest', 'grok-3-mini', 'grok-3-mini-latest', 'grok-3-mini-fast', 'grok-3-mini-fast-latest'
OpenRouter: Provides access to models from multiple providers through a single API. Format: 'provider/model-name'
OpenAI models: 'openai/gpt-4o', 'openai/gpt-4o-mini', 'openai/gpt-4-turbo', 'openai/gpt-4', 'openai/gpt-3.5-turbo'
Anthropic models: 'anthropic/claude-3-7-sonnet-20250219', 'anthropic/claude-3-5-sonnet-latest', 'anthropic/claude-3-5-haiku-latest', 'anthropic/claude-3-opus'
Meta models: 'meta-llama/llama-3-70b-instruct', 'meta-llama/llama-3-8b-instruct', 'meta-llama/llama-2-70b-chat'
Google models: 'google/gemini-2.5-pro-preview-03-25', 'google/gemini-1.5-pro-latest', 'google/gemini-1.5-flash'
Mistral models: 'mistralai/mistral-large', 'mistralai/mistral-medium', 'mistralai/mistral-small'
Other models: 'microsoft/mai-ds-r1', 'perplexity/sonar-small-chat', 'cohere/command-r', 'deepseek/deepseek-chat', 'thudm/glm-z1-32b'
- api_keys
Named list of API keys. Can be provided in two formats:
With provider names as keys:
list("openai" = "sk-...", "anthropic" = "sk-ant-...", "openrouter" = "sk-or-...")
With model names as keys:
list("gpt-4o" = "sk-...", "claude-3-opus" = "sk-ant-...")
The system first tries to find the API key using the provider name. If not found, it then tries using the model name. Example:
api_keys <- list( "openai" = Sys.getenv("OPENAI_API_KEY"), "anthropic" = Sys.getenv("ANTHROPIC_API_KEY"), "openrouter" = Sys.getenv("OPENROUTER_API_KEY"), "claude-3-opus" = "sk-ant-api03-specific-key-for-opus" )
- top_gene_count
Number of top differential genes to use
- controversy_threshold
Consensus proportion threshold (default: 0.7). Clusters with consensus proportion below this value will be marked as controversial
- entropy_threshold
Entropy threshold for identifying controversial clusters (default: 1.0)
- max_discussion_rounds
Maximum number of discussion rounds for controversial clusters (default: 3)
- consensus_check_model
Model to use for consensus checking
- log_dir
Directory for storing logs
- cache_dir
Directory for storing cache
- use_cache
Whether to use cached results