Advanced Analysis

This tutorial covers advanced multi-modal spatial transcriptomics analysis workflows using ChatSpatial. Learn how to combine multiple analysis methods and handle complex datasets.

Overview

Advanced analysis in ChatSpatial involves:

  • Multi-modal data integration
  • Complex analytical workflows
  • Cross-technology comparisons
  • Advanced visualization techniques
  • Performance optimization strategies

Prerequisites

  • Complete the Basic Workflow tutorial
  • Understand spatial transcriptomics technologies
  • Familiarity with Python and data analysis concepts

Multi-Modal Analysis Workflow

1. Loading Multiple Datasets

# Load different spatial datasets
visium_result = load_data(
    data_path="data/mouse_brain_visium.h5ad",
    name="visium_brain",
    data_type="10x_visium"
)

merfish_result = load_data(
    data_path="data/mouse_brain_merfish.h5ad",
    name="merfish_brain", 
    data_type="merfish"
)

2. Cross-Platform Data Integration

# Preprocess both datasets with consistent parameters
for data_id in [visium_result.id, merfish_result.id]:
    preprocess_data(
        data_id=data_id,
        normalization="log",  # Use "log" or "pearson_residuals" (sct requires use_scvi_preprocessing=True)
        filter_genes=True,
        min_cells=5
    )

3. Comparative Cell Type Annotation

Use multiple annotation methods to cross-validate results:

# Method 1: Reference-based annotation
annotate_cell_types(
    data_id=visium_result.id,
    method="tangram",
    reference_path="reference/mouse_brain_sc.h5ad"
)

# Method 2: Marker-based annotation
annotate_cell_types(
    data_id=visium_result.id,
    method="sctype",
)

# Method 3: Probabilistic deconvolution
annotate_cell_types(
    data_id=visium_result.id,
    method="cell2location",
    reference_path="reference/mouse_brain_sc.h5ad"
)

4. Advanced Spatial Domain Analysis

Compare spatial organization across technologies:

# High-resolution method for MERFISH
identify_spatial_domains(
    data_id=merfish_result.id,
    method="leiden", 
    n_domains=15,
    resolution=0.5
)

# Graph-based method for Visium
identify_spatial_domains(
    data_id=visium_result.id,
    method="spagcn",
    n_domains=10,
    resolution=1.0
)

Advanced Visualization Strategies

Multi-Panel Comparative Plots

# Create comprehensive visualization
visualize_data(
    data_id=visium_result.id,
    plot_type="spatial_domains",
    save_path="results/visium_domains.pdf"
)

visualize_data(
    data_id=merfish_result.id, 
    plot_type="spatial_domains",
    save_path="results/merfish_domains.pdf"
)

Custom Gene Expression Analysis

# Identify technology-specific spatial genes
spatial_genes_visium = identify_spatial_genes(
    data_id=visium_result.id,
    method="spatialde",
    n_genes=100
)

spatial_genes_merfish = identify_spatial_genes(
    data_id=merfish_result.id,
    method="spatialde", 
    n_genes=100
)

# Visualize shared spatial genes
shared_genes = list(set(spatial_genes_visium.genes) & set(spatial_genes_merfish.genes))

visualize_data(
    data_id=visium_result.id,
    plot_type="gene_expression",
    genes=shared_genes[:6],
    save_path="results/shared_spatial_genes.pdf"
)

Advanced Cell Communication Analysis

Multi-Method Communication Analysis

# LIANA+ comprehensive analysis
comm_liana = analyze_cell_communication(
    data_id=visium_result.id,
    method="liana",
    cell_type_column="cell_type"
)

# CellChat pathway analysis
comm_cellchat = analyze_cell_communication(
    data_id=visium_result.id,
    method="cellchat",
    cell_type_column="cell_type"
)

# Cross-validate results
visualize_data(
    data_id=visium_result.id,
    plot_type="cell_communication",
    save_path="results/communication_networks.pdf"
)

Spatial Communication Patterns

# Analyze distance-dependent communication
analyze_spatial_data(
    data_id=visium_result.id,
    analysis_type="spatial_communication",
    max_distance=100.0
)

Performance Optimization

Memory-Efficient Processing

# Process large datasets in chunks
def process_large_dataset(data_path, chunk_size=5000):
    # Load with subsampling
    result = load_data(
        data_path=data_path,
        name="large_dataset",
        subsample=chunk_size
    )
    
    # Efficient preprocessing
    preprocess_data(
        data_id=result.id,
        normalization_method="pearson_residuals",  # Memory efficient
        filter_genes=True
    )
    
    return result

Batch Processing Workflows

# Process multiple samples efficiently
sample_paths = [
    "data/sample_1.h5ad",
    "data/sample_2.h5ad", 
    "data/sample_3.h5ad"
]

results = []
for i, path in enumerate(sample_paths):
    result = load_data(path, f"sample_{i+1}")
    preprocess_data(result.id)
    annotate_cell_types(result.id, method="sctype")
    results.append(result)

Quality Control and Validation

Cross-Method Validation

# Validate spatial domains with multiple methods
methods = ["spagcn", "stagate", "leiden"]
domain_results = {}

for method in methods:
    domain_results[method] = identify_spatial_domains(
        data_id=visium_result.id,
        method=method,
        n_domains=8
    )

# Compare consistency across methods
analyze_spatial_data(
    data_id=visium_result.id,
    analysis_type="domain_consistency",
    methods=methods
)

Statistical Validation

# Marker gene validation
markers = find_markers(
    data_id=visium_result.id,
    group_by="spatial_domains",
    method="wilcoxon"
)

# Spatial autocorrelation analysis
analyze_spatial_data(
    data_id=visium_result.id,
    analysis_type="spatial_autocorrelation",
    genes=markers.genes[:20]
)

Advanced Use Cases

1. Developmental Trajectory Analysis

# Identify developmental trajectories in spatial context
analyze_spatial_data(
    data_id=visium_result.id,
    analysis_type="trajectory_inference",
    method="slingshot",
    root_cell_type="stem_cells"
)

# Visualize trajectories
visualize_data(
    data_id=visium_result.id,
    plot_type="trajectory",
    save_path="results/developmental_trajectories.pdf"
)

2. Disease vs Healthy Comparison

# Load disease and control samples
disease_data = load_data("data/disease_sample.h5ad", "disease")
control_data = load_data("data/control_sample.h5ad", "control")

# Comparative analysis
for data_id in [disease_data.id, control_data.id]:
    preprocess_data(data_id)
    identify_spatial_domains(data_id, method="spagcn")
    
# Find disease-specific patterns
analyze_spatial_data(
    data_id=disease_data.id,
    analysis_type="differential_spatial",
    reference_id=control_data.id
)

3. Time-Series Spatial Analysis

# Load time course data
timepoints = ["t0", "t6", "t12", "t24"]
time_data = {}

for tp in timepoints:
    time_data[tp] = load_data(f"data/timecourse_{tp}.h5ad", tp)
    preprocess_data(time_data[tp].id)

# Analyze temporal changes
analyze_spatial_data(
    data_ids=[data.id for data in time_data.values()],
    analysis_type="temporal_dynamics",
    timepoints=timepoints
)

Best Practices

1. Parameter Selection

  • Resolution tuning: Start with default parameters, then fine-tune
  • Method selection: Consider data characteristics and biological questions
  • Validation: Always cross-validate with multiple approaches

2. Computational Efficiency

  • Memory management: Monitor memory usage for large datasets
  • Parallel processing: Utilize multiple cores when available
  • Caching: Save intermediate results to avoid recomputation

3. Result Interpretation

  • Biological validation: Confirm results with known biology
  • Statistical testing: Apply appropriate statistical tests
  • Visualization: Use multiple visualization approaches

Troubleshooting

Common Issues

  1. Memory errors with large datasets
    • Use subsampling or chunked processing
    • Increase available RAM
    • Use more memory-efficient algorithms
  2. Method convergence failures
    • Try different initialization parameters
    • Use alternative methods
    • Check data quality
  3. Inconsistent results across methods
    • Validate with known markers
    • Check preprocessing consistency
    • Consider method-specific biases

Performance Tips

  • Use SSD storage for large datasets
  • Optimize Python environment (conda/mamba)
  • Consider GPU acceleration where available
  • Monitor system resources during analysis

Next Steps

This advanced tutorial provides the foundation for sophisticated spatial transcriptomics analysis workflows. Combine these techniques based on your specific research questions and data characteristics.