Skip to contents

mLLMCelltype logo

Multi-LLM Consensus Framework for Cell Type Annotation in scRNA-seq Data

Tweet Stars Forks Discord

LicenseLast CommitIssuesReleasebioRxiv

Multi-LLM Consensus Architecture for Cell Type Annotation in scRNA-seq Data

mLLMCelltype is an R package that leverages various large language models (LLMs) for automated cell type annotation in single-cell RNA sequencing data. The package implements a multi-LLM consensus architecture where multiple LLMs collaborate through structured deliberation to provide more reliable annotations than any single model could achieve alone.

Key Features

  • Multi-LLM Consensus Mechanism: Harnesses collective intelligence from diverse LLMs to overcome single-model limitations and biases
  • Structured Deliberation Process: For controversial clusters, LLMs engage in collaborative discussion across multiple rounds, evaluating evidence and refining annotations together
  • Uncertainty Quantification: Explicitly quantifies annotation uncertainty through consensus proportion and Shannon entropy
  • No Reference Dataset Required: Does not rely on pre-existing reference datasets, can annotate various tissues and species
  • Support for Multiple LLM Providers:
    • OpenAI (GPT-4o, GPT-4.1, GPT-3.5-Turbo, O1 series)
    • Anthropic (Claude 3.7 Sonnet, Claude 3.5 Sonnet/Haiku/Opus)
    • Google (Gemini 2.5 Pro, Gemini 2.0, Gemini 1.5 series)
    • X.AI (Grok-3, Grok-3 Fast, Grok-3 Mini series)
    • DeepSeek (DeepSeek Chat, DeepSeek Reasoner)
    • Qwen (Qwen Max)
    • Zhipu (GLM-4 Plus, GLM-3 Turbo)
    • MiniMax (MiniMax Text)
    • Stepfun (Step-2, Step-1 series)
    • OpenRouter (access to Meta Llama, Mistral, Microsoft, Perplexity, Cohere, and more)
  • Seamless Integration with Seurat: Can directly use Seurat’s FindAllMarkers() output as input

Quick Start

# Install the package
devtools::install_github("cafferychen777/mLLMCelltype", subdir = "R")

# Load the package
library(mLLMCelltype)

# Set API keys
Sys.setenv(ANTHROPIC_API_KEY = "your-anthropic-api-key")
Sys.setenv(OPENAI_API_KEY = "your-openai-api-key")
Sys.setenv(GEMINI_API_KEY = "your-gemini-api-key")

# Use multiple models for annotation
models <- c(
  "claude-3-7-sonnet-20250219",
  "gpt-4o",
  "gemini-2.5-pro"
)

# Run multi-model annotation
results <- list()
for (model in models) {
  provider <- get_provider(model)
  api_key <- switch(provider,
                   "anthropic" = Sys.getenv("ANTHROPIC_API_KEY"),
                   "openai" = Sys.getenv("OPENAI_API_KEY"),
                   "gemini" = Sys.getenv("GEMINI_API_KEY"))

  results[[model]] <- annotate_cell_types(
    input = pbmc_markers,
    tissue_name = "human PBMC",
    model = model,
    api_key = api_key
  )
}

# Create consensus using interactive consensus annotation
api_keys <- list(
  anthropic = Sys.getenv("ANTHROPIC_API_KEY"),
  openai = Sys.getenv("OPENAI_API_KEY"),
  gemini = Sys.getenv("GEMINI_API_KEY")
)

consensus_results <- interactive_consensus_annotation(
  input = pbmc_markers,
  tissue_name = "human PBMC",
  models = models,  # Use the models defined above
  api_keys = api_keys,
  controversy_threshold = 0.7,
  entropy_threshold = 1.0,
  max_discussion_rounds = 3,
  consensus_check_model = "claude-3-7-sonnet-20250219"
)

# Print consensus results summary
print_consensus_summary(consensus_results)

Visualization

mLLMCelltype Visualization

Citation

If you use mLLMCelltype in your research, please cite our paper:

@article{Yang2025.04.10.647852,
  author = {Chen Yang and Xianyang Zhang and Jun Chen},
  title = {Large Language Model Consensus Substantially Improves the Cell Type Annotation Accuracy for scRNA-seq Data},
  elocation-id = {2025.04.10.647852},
  year = {2025},
  doi = {10.1101/2025.04.10.647852},
  publisher = {Cold Spring Harbor Laboratory},
  URL = {https://www.biorxiv.org/content/early/2025/04/17/2025.04.10.647852},
  journal = {bioRxiv}
}

You can also cite this in plain text format:

Yang, C., Zhang, X., & Chen, J. (2025). Large Language Model Consensus Substantially Improves the Cell Type Annotation Accuracy for scRNA-seq Data. bioRxiv. https://doi.org/10.1101/2025.04.10.647852

Learn More

Please check our documentation to learn more about mLLMCelltype.