Interactive consensus building for cell type annotation — interactive_consensus

This function implements an interactive voting and discussion mechanism where multiple LLMs collaborate to reach a consensus on cell type annotations, particularly focusing on clusters with low agreement. The process includes:

Initial voting by all LLMs
Identification of controversial clusters
Detailed discussion for controversial clusters
Final summary by a designated LLM (default: Claude)

Initial voting by all LLMs
Identification of controversial clusters
Detailed discussion for controversial clusters
Final summary by a designated LLM (default: Claude)

Usage

interactive_consensus_annotation(
  input,
  tissue_name = NULL,
  models = c("claude-sonnet-4-20250514", "claude-3-7-sonnet-20250219",
    "claude-3-5-sonnet-20241022", "claude-3-5-haiku-20241022", "gemini-2.0-flash",
    "gemini-1.5-pro", "qwen-max-2025-01-25", "gpt-4o", "grok-3-latest"),
  api_keys,
  top_gene_count = 10,
  controversy_threshold = 0.7,
  entropy_threshold = 1,
  max_discussion_rounds = 3,
  consensus_check_model = NULL,
  log_dir = "logs",
  cache_dir = "consensus_cache",
  use_cache = TRUE,
  base_urls = NULL,
  clusters_to_analyze = NULL,
  force_rerun = FALSE
)

Arguments

input

One of the following:

A data frame from Seurat's FindAllMarkers() function containing differential gene expression results (must have columns: 'cluster', 'gene', and 'avg_log2FC'). The function will select the top genes based on avg_log2FC for each cluster.
A list where each element has a 'genes' field containing marker genes for a cluster. This can be in one of these formats:
- Named with cluster IDs: list("0" = list(genes = c(...)), "1" = list(genes = c(...)))
- Named with cell type names: list(t_cells = list(genes = c(...)), b_cells = list(genes = c(...)))
- Unnamed list: list(list(genes = c(...)), list(genes = c(...)))
For both input types, if cluster IDs are numeric and start from 1, they will be automatically converted to 0-based indexing (e.g., cluster 1 becomes cluster 0) for consistency.

tissue_name

Optional input of tissue name

models

Vector of model names to participate in the discussion. Supported models:

OpenAI: 'gpt-4o', 'gpt-4o-mini', 'gpt-4.1', 'gpt-4.1-mini', 'gpt-4.1-nano', 'gpt-4-turbo', 'gpt-3.5-turbo', 'o1', 'o1-mini', 'o1-preview', 'o1-pro'
Anthropic: 'claude-sonnet-4-20250514', 'claude-opus-4-20250514', 'claude-3-7-sonnet-20250219', 'claude-3-5-sonnet-20241022', 'claude-3-5-haiku-20241022', 'claude-3-opus-20240229'
DeepSeek: 'deepseek-chat', 'deepseek-r1', 'deepseek-r1-zero', 'deepseek-reasoner'
Google: 'gemini-2.5-pro', 'gemini-2.5-flash', 'gemini-2.0-flash', 'gemini-2.0-flash-lite', 'gemini-1.5-pro-latest', 'gemini-1.5-flash-latest', 'gemini-1.5-flash-8b'
Alibaba: 'qwen-max-2025-01-25', 'qwen3-72b'
Stepfun: 'step-2-16k', 'step-2-mini', 'step-1-8k'
Zhipu: 'glm-4-plus', 'glm-3-turbo'
MiniMax: 'minimax-text-01'
X.AI: 'grok-3-latest', 'grok-3', 'grok-3-fast', 'grok-3-fast-latest', 'grok-3-mini', 'grok-3-mini-latest', 'grok-3-mini-fast', 'grok-3-mini-fast-latest'
OpenRouter: Provides access to models from multiple providers through a single API. Format: 'provider/model-name'
- OpenAI models: 'openai/gpt-4o', 'openai/gpt-4o-mini', 'openai/gpt-4-turbo', 'openai/gpt-4', 'openai/gpt-3.5-turbo'
- Anthropic models: 'anthropic/claude-sonnet-4', 'anthropic/claude-opus-4', 'anthropic/claude-3.7-sonnet', 'anthropic/claude-3.5-sonnet', 'anthropic/claude-3.5-haiku', 'anthropic/claude-3-opus'
- Meta models: 'meta-llama/llama-3-70b-instruct', 'meta-llama/llama-3-8b-instruct', 'meta-llama/llama-2-70b-chat'
- Google models: 'google/gemini-2.5-pro', 'google/gemini-2.5-flash', 'google/gemini-2.0-flash', 'google/gemini-1.5-pro-latest', 'google/gemini-1.5-flash'
- Mistral models: 'mistralai/mistral-large', 'mistralai/mistral-medium', 'mistralai/mistral-small'
- Other models: 'microsoft/mai-ds-r1', 'perplexity/sonar-small-chat', 'cohere/command-r', 'deepseek/deepseek-chat', 'thudm/glm-z1-32b'

api_keys

Named list of API keys. Can be provided in two formats:

With provider names as keys: list("openai" = "sk-...", "anthropic" = "sk-ant-...", "openrouter" = "sk-or-...")
With model names as keys: list("gpt-4o" = "sk-...", "claude-3-opus" = "sk-ant-...")

The system first tries to find the API key using the provider name. If not found, it then tries using the model name. Example:

api_keys <- list(
  "openai" = Sys.getenv("OPENAI_API_KEY"),
  "anthropic" = Sys.getenv("ANTHROPIC_API_KEY"),
  "openrouter" = Sys.getenv("OPENROUTER_API_KEY"),
  "claude-3-opus" = "sk-ant-api03-specific-key-for-opus"
)

top_gene_count

Number of top differential genes to use

controversy_threshold

Consensus proportion threshold (default: 0.7). Clusters with consensus proportion below this value will be marked as controversial

entropy_threshold

Entropy threshold for identifying controversial clusters (default: 1.0)

max_discussion_rounds

Maximum number of discussion rounds for controversial clusters (default: 3)

consensus_check_model

Model to use for consensus checking

log_dir

Directory for storing logs

cache_dir

Directory for storing cache

use_cache

Whether to use cached results

base_urls

Optional custom base URLs for API endpoints. Can be:

A single character string: Applied to all providers (e.g., "https://api.proxy.com/v1")
A named list: Provider-specific URLs (e.g., list(openai = "https://openai-proxy.com/v1", anthropic = "https://anthropic-proxy.com/v1")). This is useful for:
- Chinese users accessing international APIs through proxies
- Enterprise users with internal API gateways
- Development/testing with local or alternative endpoints If NULL (default), uses official API endpoints for each provider.

clusters_to_analyze

Optional vector of cluster IDs to analyze. If NULL (default), all clusters in the input will be analyzed. Must be character or numeric values that match the cluster IDs in your input. Examples:

For numeric clusters: c(0, 2, 5) or c("0", "2", "5")
This is useful when you want to focus on specific clusters without filtering the input data
Non-existent cluster IDs will be ignored with a warning

force_rerun

Logical. If TRUE, ignore cached results and force re-analysis of all specified clusters. Useful when you want to re-analyze clusters with different context or for subtype identification. Default is FALSE. Note: This parameter only affects the discussion phase for controversial clusters.

Value

A list containing consensus results, logs, and annotations