ChatSpatial API Reference

Complete reference for all ChatSpatial MCP tools, parameters, and data models.

Overview

ChatSpatial provides 16 MCP tools for spatial transcriptomics analysis. Each tool follows the Model Context Protocol specification with:

  • JSON Schema validation for all inputs and outputs
  • Structured error handling with detailed error messages
  • Type-safe parameters with automatic validation
  • Return types include images, data, and metadata

Tool Categories

CategoryToolsDescription
Data Managementload_data, preprocess_dataData loading, QC, and preprocessing
Cell Annotationannotate_cell_types7 annotation methods with reference data support
Spatial Analysisanalyze_spatial_data, identify_spatial_domains, register_spatial_dataComprehensive spatial pattern analysis, domain identification, and registration
Gene Analysisfind_spatial_genes, find_markers, analyze_enrichmentSpatial variable genes, differential expression, and enrichment
Cell Communicationanalyze_cell_communicationLigand-receptor interaction analysis
Deconvolutiondeconvolve_dataCell type proportion estimation
Integrationintegrate_samplesMulti-modal and batch integration
Trajectoryanalyze_velocity_data, analyze_trajectory_dataRNA velocity analysis and trajectory inference
Visualizationvisualize_data20 plot types with MCP image outputs

Quick Reference

Essential Tools

# Data loading and preprocessing
load_data(data_path="data.h5ad", name="dataset")
preprocess_data(data_id="dataset", normalize_total=True, log1p=True)

# Core analysis
identify_spatial_domains(data_id="dataset", method="spagcn")
annotate_cell_types(data_id="dataset", method="tangram")
analyze_cell_communication(data_id="dataset", method="liana")
analyze_enrichment(data_id="dataset", method="spatial_enrichmap")

# Advanced spatial analysis
register_spatial_data(source_id="section1", target_id="section2")
analyze_spatial_data(data_id="dataset", params={"analysis_type": "geary", "genes": ["gene"]})

# Visualization
visualize_data(data_id="dataset", plot_type="spatial_domains")

Parameter Patterns

All tools follow consistent parameter patterns:

  • data_path: Path to data file (load_data only)
  • data_type: Data format specification (load_data only)
  • data_id: Required string identifier for loaded datasets
  • method: Analysis method selection with fallbacks
  • *_key: Keys for accessing data layers (e.g., spatial_key, batch_key)
  • use_*: Boolean flags for optional features
  • n_*: Numeric parameters (neighbors, components, etc.)
  • *_threshold: Filtering and significance thresholds

Data Management

load_data

Load spatial transcriptomics data from various formats.

Signature:

load_data(
    data_path: str,
    data_type: str = "auto", 
    name: Optional[str] = None,
    context: Context = None
) -> SpatialDataset

Supported Formats:

  • H5AD: AnnData format with spatial coordinates
  • CSV: Expression matrix with separate coordinate file
  • H5/HDF5: Hierarchical data format
  • 10x Visium: Space Ranger outputs
  • Zarr: Cloud-optimized arrays

Parameters:

ParameterTypeDefaultDescription
data_pathstr-Path to the data file or directory
data_typestr"auto"Type of spatial data (auto, 10x_visium, slide_seq, merfish, seqfish, other, h5ad). If ‘auto’, will try to determine the type from the file extension or directory structure
nameOptional[str]NoneOptional name for the dataset
contextContextNoneOptional MCP context for logging

Example:

result = load_data(
    data_path="data/mouse_brain_visium.h5ad",
    data_type="auto",  # or "other" for generic h5ad files
    name="mouse_brain"
)
print(f"Loaded dataset: {result.id}")

preprocess_data

Preprocessing pipeline for spatial transcriptomics data.

Signature:

preprocess_data(
    data_id: str,
    min_genes: int = 200,
    min_cells: int = 3,
    normalize_total: bool = True,
    log1p: bool = True,
    highly_variable_genes: bool = True,
    n_top_genes: int = 2000,
    pca: bool = True,
    neighbors: bool = True,
    clustering: bool = True,
    umap: bool = True
) -> PreprocessingResult

Features:

  • Quality control filtering
  • Normalization and scaling
  • Highly variable gene selection
  • Dimensionality reduction (PCA, UMAP)
  • Neighbor graph construction
  • Leiden clustering

Cell Annotation

annotate_cell_types

Cell type annotation with multiple methods.

Signature:

annotate_cell_types(
    data_id: str,
    method: str = "tangram",
    reference_data_id: Optional[str] = None,
    marker_genes: Optional[Dict] = None,
    confidence_threshold: float = 0.5
) -> AnnotationResult

Available Methods:

MethodDescriptionRequirements
tangramSpatial mapping with referenceSingle-cell reference data
sctypeAutomated cell type identificationTissue type specification
cell2locationProbabilistic deconvolutionReference signatures
scanviSemi-supervised annotationReference data with labels
cellassignProbabilistic assignmentMarker gene matrix
mllmcelltypeMulti-modal LLM classifierPre-trained model

Example:

# Reference-based annotation with Tangram
result = annotate_cell_types(
    data_id="spatial_dataset",
    method="tangram",
    reference_data_id="reference_scRNA_dataset"
)

# CellAssign with custom marker genes
markers = {
    "T_cells": ["CD3D", "CD3E", "CD3G"],
    "B_cells": ["CD19", "CD20", "MS4A1"],
    "Macrophages": ["CD68", "CD163", "CSF1R"]
}

result = annotate_cell_types(
    data_id="dataset",
    method="cellassign",
    marker_genes=markers
)

Spatial Analysis

identify_spatial_domains

Identify spatial domains and tissue architecture.

Signature:

identify_spatial_domains(
    data_id: str,
    method: str = "spagcn",
    n_clusters: Optional[int] = None,
    resolution: float = 1.0,
    spatial_key: str = "spatial"
) -> SpatialDomainResult

Available Methods:

MethodDescriptionUse Case
spagcnGraph convolutional networksGeneral spatial domains
stagateSpatial-temporal attentionComplex tissue architecture
leidenCommunity detectionQuick clustering
louvainModularity optimizationAlternative clustering

analyze_spatial_data

Spatial pattern analysis.

Signature:

analyze_spatial_data(
    data_id: str,
    analysis_type: str = "autocorrelation",
    genes: Optional[List[str]] = None,
    method: str = "moran"
) -> SpatialStatisticsResult

Analysis Types:

  • autocorrelation: Spatial autocorrelation (Moran’s I, Geary’s C)
  • hotspots: Hotspot detection (Getis-Ord Gi*)
  • patterns: Spatial expression patterns
  • neighborhoods: Neighborhood analysis

register_spatial_data

Register and align spatial transcriptomics data across multiple tissue sections.

Signature:

register_spatial_data(
    source_id: str,
    target_id: str,
    method: str = "paste",
    landmarks: Optional[List[Dict[str, Any]]] = None
) -> Dict[str, Any]

Available Methods:

MethodDescriptionUse Case
pastePASTE algorithm for spatial alignmentMulti-slice integration

Features:

  • Cross-section spatial alignment
  • Transformation matrix computation
  • Landmark-guided registration
  • Batch correction integration
  • Quality metrics for alignment assessment

Example:

# Register consecutive tissue sections
result = register_spatial_data(
    source_id="section_1",
    target_id="section_2", 
    method="paste"
)

print(f"Registration successful with transformation matrix")
print(f"Alignment quality score: {result['alignment_score']:.3f}")

analyze_spatial_data (Enhanced)

Unified spatial statistics analysis with support for 12 different analysis types.

Signature:

analyze_spatial_data(
    data_id: str,
    params: Dict[str, Any]
) -> SpatialStatisticsResult

Available Analysis Types:

Analysis TypeDescriptionKey Parameters
moranGlobal Moran’s I spatial autocorrelationgenes, moran_n_perms
local_moranLocal Moran’s I (LISA) for hotspot detectiongenes, n_neighbors
gearyGeary’s C spatial autocorrelationgenes, moran_n_perms
getis_ordGetis-Ord Gi* hot/cold spot analysisgenes, n_neighbors
neighborhoodNeighborhood enrichment analysiscluster_key, n_neighbors
co_occurrenceCell type co-occurrence patternscluster_key, n_neighbors
ripleyRipley’s K/L point pattern analysiscluster_key
centralityGraph centrality measurescluster_key
bivariate_moranBivariate spatial correlationgene_pairs
join_countJoin count for categorical datacluster_key
network_propertiesSpatial network analysiscluster_key
spatial_centralitySpatial-specific centralitycluster_key

New Unified Gene Selection:

The genes parameter now provides unified gene selection across all relevant analysis types:

# Example: Local Moran's I analysis
result = analyze_spatial_data(
    data_id="tissue_dataset",
    params={
        "analysis_type": "local_moran",
        "genes": ["CD8A", "FOXP3"],  # Unified parameter
        "n_neighbors": 6
    }
)

# Example: Geary's C analysis  
result = analyze_spatial_data(
    data_id="tissue_dataset",
    params={
        "analysis_type": "geary", 
        "genes": ["GAPDH", "MKI67"],  # Same unified parameter
        "n_neighbors": 8
    }
)

Gene Analysis

find_spatial_genes

Identify spatially variable genes using multiple methods.

Signature:

find_spatial_genes(
    data_id: str,
    method: str = "sparkx",
    n_genes: int = 1000,
    alpha: float = 0.05
) -> SpatialVariableGenesResult

Available Methods:

MethodDescriptionStrengths
sparkxSPARK-X non-parametric methodFast, accurate
spatialdeGaussian process modelsVariable patterns

find_markers

Find marker genes for cell types or spatial domains.

Signature:

find_markers(
    data_id: str,
    groupby: str = "cell_type",
    method: str = "wilcoxon",
    n_genes: int = 100,
    logfc_threshold: float = 0.25
) -> DifferentialExpressionResult

analyze_enrichment

Perform gene set enrichment analysis on spatial transcriptomics data.

Signature:

analyze_enrichment(
    data_id: str,
    method: str = "spatial_enrichmap",
    gene_sets: Optional[Union[List[str], Dict[str, List[str]]]] = None,
    gene_set_database: str = "GO_Biological_Process",
    spatial_key: str = "spatial",
    n_neighbors: int = 6,
    smoothing: bool = True,
    min_genes: int = 10
) -> EnrichmentResult

Available Methods:

MethodDescriptionUse Case
spatial_enrichmapSpatially-aware enrichment mappingSpatial pathway analysis
pathway_gseaGene Set Enrichment AnalysisRanked gene lists
pathway_oraOver-representation analysisDiscrete gene sets
pathway_enrichrEnrichr web service integrationOnline databases
pathway_ssgseaSingle-sample GSEASample-level enrichment

Features:

  • Spatial awareness for tissue-specific pathways
  • Multiple database support (GO, KEGG, Reactome)
  • Custom gene set analysis
  • Spatial smoothing and covariate correction
  • Statistical significance testing with FDR correction

Example:

# Custom gene set enrichment
custom_pathways = {
    "Neuronal_Signaling": ["SNAP25", "SYN1", "GRIN1", "GRIA1"],
    "Glial_Function": ["GFAP", "AQP4", "S100B", "ALDH1L1"]
}

result = analyze_enrichment(
    data_id="brain_dataset",
    method="spatial_enrichmap",
    gene_sets=custom_pathways,
    smoothing=True,
    n_neighbors=8
)

print(f"Found {result.n_significant} significant pathways")

Cell Communication

analyze_cell_communication

Analyze cell-cell communication using ligand-receptor interactions.

Signature:

analyze_cell_communication(
    data_id: str,
    method: str = "liana",
    groupby: str = "cell_type",
    spatial_mode: str = "global",
    database: str = "consensus"
) -> CellCommunicationResult

Available Methods:

MethodDescriptionFeatures
lianaComprehensive LR analysisMultiple databases, spatial modes
cellphonedbStatistical interaction testingPermutation testing
cellchat_lianaCellChat via LIANAPathway analysis

Spatial Modes:

  • global: Cell type-level interactions
  • local: Spatially-aware interactions
  • bivariate: Pairwise spatial analysis

Deconvolution

deconvolve_data

Estimate cell type proportions in spatial transcriptomics data.

Signature:

deconvolve_data(
    data_id: str,
    method: str = "cell2location",
    reference_data_id: Optional[str] = None,
    n_factors: int = 50
) -> DeconvolutionResult

Available Methods:

MethodDescriptionRequirements
cell2locationBayesian deconvolutionReference single-cell data
stereoscopeProbabilistic deconvolutionReference signatures
rctdRobust cell type decompositionReference profiles

Full documentation will be added in future versions

Integration

integrate_samples

Integrate multiple spatial transcriptomics datasets.

Signature:

integrate_samples(
    data_ids: List[str],
    method: str = "harmony",
    batch_key: str = "batch"
) -> IntegrationResult

Available Methods:

MethodDescriptionUse Case
harmonyHarmony batch correctionSimple batch effects
scviVariational integrationComplex batch effects
combatComBat batch correctionGene expression normalization

Note: Harmony parameters are hardcoded in the implementation for optimal performance:

  • sigma=0.1 (diversity clustering penalty parameter)
  • max_iter_harmony=10 (maximum iterations for convergence)
  • nclust=None (automatic cluster number detection)
  • verbose=True (progress display enabled)

Full documentation will be added in future versions

Trajectory

analyze_velocity_data

Analyze RNA velocity to understand cellular dynamics.

Signature:

analyze_velocity_data(
    data_id: str,
    method: str = "scvelo",
    mode: str = "dynamical"
) -> VelocityResult

Available Methods:

MethodDescriptionFeatures
scveloStandard RNA velocity analysisStochastic, deterministic, and dynamical models
veloviDeep learning RNA velocityMore accurate velocity with uncertainty quantification (requires scvi-tools)
sirvReference-based velocityTransfer velocity from reference dataset (not yet implemented)

analyze_trajectory_data

Infer cellular trajectories and pseudotime from spatial data.

Signature:

analyze_trajectory_data(
    data_id: str,
    method: str = "cellrank",
    spatial_weight: float = 0.5
) -> TrajectoryResult

Available Methods:

MethodDescriptionFeatures
dptDiffusion pseudotimeClassic pseudotime inference (no velocity needed)
palantirProbabilistic trajectory inferenceBranch probability analysis (no velocity needed)
cellrankRNA velocity-based trajectory inferenceFate mapping and terminal states (requires velocity)

Important Note: VELOVI is a velocity computation method (see analyze_velocity_data above), not a trajectory inference method. After computing velocity with VELOVI, use CellRank, Palantir, or DPT for trajectory inference.

Full documentation will be added in future versions

Visualization

visualize_data

Create visualizations with MCP image outputs.

Signature:

visualize_data(
    data_id: str,
    params: VisualizationParameters
) -> Image

Key Parameters in VisualizationParameters:

  • plot_type: str = “spatial” (visualization type)
  • feature: Optional[Union[str, List[str]]] = None (gene/column to visualize)
  • colormap: str = “viridis” (color scheme)
  • figure_size: Optional[Tuple[int, int]] = None (width, height)
  • dpi: int = 100 (resolution)

Plot Types (20 Total):

TypeDescriptionUse Case
spatialSpatial gene expressionGene visualization
spatial_domainsSpatial domain overlayDomain identification
umapUMAP embeddingDimensionality reduction
heatmapExpression heatmapMulti-gene comparison
violinDistribution plotsExpression distributions
deconvolutionCell type proportion mapsDeconvolution results
cell_communicationCommunication networksInteraction visualization
multi_geneMulti-gene spatial panelsGene comparison
lr_pairsLigand-receptor pairsLR interaction analysis
gene_correlationGene correlation analysisCo-expression patterns
rna_velocityRNA velocity plotsTrajectory inference
trajectoryDevelopmental trajectoriesPseudotime analysis
spatial_analysisSpatial statistics (6 subtypes)Pattern analysis
spatial_enrichmentSpatial enrichment mapsFunctional enrichment
pathway_enrichmentPathway enrichment plotsGSEA visualization
spatial_interactionCell-cell interactionsSpatial communication
batch_integrationBatch integration qualityBatch correction QC

MCP Integration:

All visualizations return MCP Image objects for direct display in LLM agents like Claude Desktop.

Error Handling

Error Types

ChatSpatial implements error handling:

Error TypeDescriptionCommon Causes
ValidationErrorInvalid parameters or data formatWrong parameter types, out-of-range values
DataErrorMissing data or incompatible datasetsMissing required columns, incompatible data structures
MethodErrorAlgorithm-specific failuresMethod not applicable to data type
ResourceErrorMemory or computation limitsInsufficient memory, timeout exceeded
SystemErrorFile I/O or environment issuesFile not found, permission denied

Error Response Format

{
  "error": {
    "code": 1001,
    "message": "Invalid parameter: n_clusters must be positive",
    "type": "ValidationError",
    "details": {"parameter": "n_clusters", "value": -1},
    "suggestions": ["Use n_clusters > 0", "Set n_clusters=None for auto"]
  }
}

Usage Examples

Chaining Analysis

# Complete workflow  
result = load_data(data_path="data.h5ad", name="sample")
preprocess_data(data_id=result.id)
identify_spatial_domains(data_id=result.id, method="spagcn")
annotate_cell_types(data_id=result.id, method="tangram", reference_data_id="ref")
analyze_cell_communication(data_id=result.id, method="liana")
analyze_enrichment(data_id=result.id, method="spatial_enrichmap")
visualize_data(data_id=result.id, plot_type="spatial_domains")

Parameter Optimization

# Test multiple resolutions
for res in [0.5, 1.0, 1.5, 2.0]:
    identify_spatial_domains(
        data_id="sample",
        resolution=res,
        method="spagcn"
    )

Batch Processing

# Process multiple samples
sample_files = ["sample1.h5ad", "sample2.h5ad", "sample3.h5ad"]
for sample_file in sample_files:
    result = load_data(data_path=f"data/{sample_file}", name=sample_file.replace(".h5ad", ""))
    preprocess_data(data_id=result.id)
    identify_spatial_domains(data_id=result.id)

# Register multiple tissue sections
sections = ["section_1", "section_2", "section_3"]
for i in range(len(sections)-1):
    register_spatial_data(
        source_id=sections[i+1], 
        target_id=sections[i],
        method="paste"
    )

See Also


Table of contents