Skip to contents

This function implements an interactive voting and discussion mechanism where multiple LLMs collaborate to reach a consensus on cell type annotations, particularly focusing on clusters with low agreement. The process includes:

  1. Initial voting by all LLMs

  2. Identification of controversial clusters

  3. Detailed discussion for controversial clusters

  4. Final summary by a designated LLM (default: Claude)

This function implements an interactive voting and discussion mechanism where multiple LLMs collaborate to reach a consensus on cell type annotations, particularly focusing on clusters with low agreement. The process includes:

  1. Initial voting by all LLMs

  2. Identification of controversial clusters

  3. Detailed discussion for controversial clusters

  4. Final summary by a designated LLM (default: Claude)

Usage

interactive_consensus_annotation(
  input,
  tissue_name = NULL,
  models = c("claude-3-7-sonnet-20250219", "claude-3-5-sonnet-latest",
    "claude-3-5-haiku-latest", "gemini-2.0-flash", "gemini-1.5-pro",
    "qwen-max-2025-01-25", "gpt-4o", "grok-3-latest"),
  api_keys,
  top_gene_count = 10,
  controversy_threshold = 0.7,
  entropy_threshold = 1,
  max_discussion_rounds = 3,
  consensus_check_model = NULL,
  log_dir = "logs",
  cache_dir = "consensus_cache",
  use_cache = TRUE
)

Arguments

input

One of the following:

  • A data frame from Seurat's FindAllMarkers() function containing differential gene expression results (must have columns: 'cluster', 'gene', and 'avg_log2FC'). The function will select the top genes based on avg_log2FC for each cluster.

  • A list where each element has a 'genes' field containing marker genes for a cluster. This can be in one of these formats:

    • Named with cluster IDs: list("0" = list(genes = c(...)), "1" = list(genes = c(...)))

    • Named with cell type names: list(t_cells = list(genes = c(...)), b_cells = list(genes = c(...)))

    • Unnamed list: list(list(genes = c(...)), list(genes = c(...)))

  • For both input types, if cluster IDs are numeric and start from 1, they will be automatically converted to 0-based indexing (e.g., cluster 1 becomes cluster 0) for consistency.

tissue_name

Optional input of tissue name

models

Vector of model names to participate in the discussion. Supported models:

  • OpenAI: 'gpt-4o', 'o1'

  • Anthropic: 'claude-3-7-sonnet-20250219', 'claude-3-5-sonnet-latest', 'claude-3-5-haiku-latest', 'claude-3-opus'

  • DeepSeek: 'deepseek-chat', 'deepseek-reasoner'

  • Google: 'gemini-2.0-flash', 'gemini-2.0-flash-exp', 'gemini-1.5-pro', 'gemini-1.5-flash'

  • Alibaba: 'qwen-max-2025-01-25'

  • Stepfun: 'step-2-16k', 'step-2-mini', 'step-1-8k'

  • Zhipu: 'glm-4-plus', 'glm-3-turbo'

  • MiniMax: 'minimax-text-01'

  • X.AI: 'grok-3-latest', 'grok-3', 'grok-3-fast', 'grok-3-fast-latest', 'grok-3-mini', 'grok-3-mini-latest', 'grok-3-mini-fast', 'grok-3-mini-fast-latest'

  • OpenRouter: Provides access to models from multiple providers through a single API. Format: 'provider/model-name'

    • OpenAI models: 'openai/gpt-4o', 'openai/gpt-4o-mini', 'openai/gpt-4-turbo', 'openai/gpt-4', 'openai/gpt-3.5-turbo'

    • Anthropic models: 'anthropic/claude-3-7-sonnet-20250219', 'anthropic/claude-3-5-sonnet-latest', 'anthropic/claude-3-5-haiku-latest', 'anthropic/claude-3-opus'

    • Meta models: 'meta-llama/llama-3-70b-instruct', 'meta-llama/llama-3-8b-instruct', 'meta-llama/llama-2-70b-chat'

    • Google models: 'google/gemini-2.5-pro-preview-03-25', 'google/gemini-1.5-pro-latest', 'google/gemini-1.5-flash'

    • Mistral models: 'mistralai/mistral-large', 'mistralai/mistral-medium', 'mistralai/mistral-small'

    • Other models: 'microsoft/mai-ds-r1', 'perplexity/sonar-small-chat', 'cohere/command-r', 'deepseek/deepseek-chat', 'thudm/glm-z1-32b'

api_keys

Named list of API keys. Can be provided in two formats:

  1. With provider names as keys: list("openai" = "sk-...", "anthropic" = "sk-ant-...", "openrouter" = "sk-or-...")

  2. With model names as keys: list("gpt-4o" = "sk-...", "claude-3-opus" = "sk-ant-...")

The system first tries to find the API key using the provider name. If not found, it then tries using the model name. Example:

api_keys <- list(
  "openai" = Sys.getenv("OPENAI_API_KEY"),
  "anthropic" = Sys.getenv("ANTHROPIC_API_KEY"),
  "openrouter" = Sys.getenv("OPENROUTER_API_KEY"),
  "claude-3-opus" = "sk-ant-api03-specific-key-for-opus"
)

top_gene_count

Number of top differential genes to use

controversy_threshold

Consensus proportion threshold (default: 0.7). Clusters with consensus proportion below this value will be marked as controversial

entropy_threshold

Entropy threshold for identifying controversial clusters (default: 1.0)

max_discussion_rounds

Maximum number of discussion rounds for controversial clusters (default: 3)

consensus_check_model

Model to use for consensus checking

log_dir

Directory for storing logs

cache_dir

Directory for storing cache

use_cache

Whether to use cached results

Value

A list containing consensus results, logs, and annotations

A list containing consensus results, logs, and annotations