Batch Effect Correction and Data Integration with ChatSpatial

Learn to correct batch effects in multiple spatial transcriptomics datasets using Harmony integration through conversation with ChatSpatial.

What You’ll Learn

This tutorial covers:

  1. Identify and assess batch effects in multi-sample datasets
  2. Apply Harmony integration using conversational commands
  3. Understand current parameter limitations (theta, lambda, sigma)
  4. Handle multi-modal data integration across different platforms
  5. Validate integration quality with available metrics
  6. Troubleshoot common integration issues through dialogue
  7. Apply best practices for analysis results

Before You Start

What You Need

  • Multiple spatial datasets (different batches/samples/conditions)
  • ChatSpatial configured with Harmony integration capabilities
  • Basic understanding of your experimental design
  • Interest in data integration

No Integration Experience Required

This tutorial teaches researchers to combine multiple spatial transcriptomics datasets. ChatSpatial handles technical details while you focus on the biology.

Understanding Batch Effects

What Are Batch Effects?

Batch effects are systematic differences between datasets that arise from technical factors rather than biological differences. In spatial transcriptomics, these can come from:

  • Technical factors: Different sequencing runs, library prep dates, operators
  • Platform differences: 10x Visium vs Slide-seq vs MERFISH
  • Sample processing: Tissue preparation, storage time, fixation methods
  • Environmental factors: Temperature, reagent lots, instrument drift

Step 1: Loading and Exploring Multiple Datasets

💬 Starting Integration


👤 You: “Hi! I have three mouse brain Visium samples from different experimental batches. I’m worried about batch effects and want to integrate them properly. Can you help?”

🤖 ChatSpatial: “Yes! Batch integration is important for multi-sample spatial analysis. Let me help you load and examine your datasets. What are the file paths for your three samples?”

👤 You: “The files are:

  • /data/batch1_mouse_brain.h5ad (control group)
  • /data/batch2_mouse_brain.h5ad (treatment group)
  • /data/batch3_mouse_brain.h5ad (replicate samples)`”

🤖 ChatSpatial: “Perfect! Let me load all three datasets and examine them for batch effects.”

[ChatSpatial automatically loads and analyzes all datasets]

🤖 ChatSpatial: “I’ve loaded your three mouse brain datasets! Here’s what I found:

Dataset Overview:

  • Batch 1: 2,698 spots, 31,053 genes (sequenced Jan 2024)
  • Batch 2: 2,845 spots, 30,987 genes (sequenced Feb 2024)
  • Batch 3: 2,752 spots, 31,124 genes (sequenced Mar 2024)

Initial Batch Effect Assessment:

  • Gene detection rates vary significantly between batches (18.2k vs 21.3k vs 19.8k)
  • Total UMI counts show batch-specific patterns
  • Mitochondrial gene percentages differ across batches
  • Library complexity varies between sequencing runs

⚠️ Strong batch effects detected! Integration is recommended. Should I proceed with Harmony integration?”


🎯 What ChatSpatial Detected

Quality Metrics Comparison:

  • Gene count distributions - identified systematic shifts
  • UMI count patterns - found batch-specific biases
  • Quality score differences - detected technical variation
  • Spatial coverage - assessed tissue representation consistency

Step 2: Batch Effect Assessment and Visualization

💬 Understanding the Problem


👤 You: “Yes, please show me what these batch effects look like before we integrate the data.”

🤖 ChatSpatial: “Great idea! Let me create visualizations to show you the batch effects.”

[ChatSpatial generates comprehensive batch effect visualizations]

🤖 ChatSpatial: “Here are your batch effect visualizations:

UMAP Before Integration: [Shows three distinct clusters corresponding to batches rather than biology]

Key Observations:

  • Samples cluster by batch, not by biological similarity
  • Batch 2 shows the strongest separation (likely sequencing depth differences)
  • Cell types are scattered across batch-specific clusters
  • Spatial patterns are obscured by technical variation

Quantitative Assessment:

  • Silhouette score by batch: 0.73 (high = strong batch effects)
  • Visual separation: Clear batch clustering observable in UMAP
  • Technical variation: Detectable across samples in gene expression patterns

Without integration:

  • ❌ Missing shared spatial domains across samples
  • ❌ Incorrect batch-specific ‘cell types’
  • ❌ Biased differential expression results
  • ❌ Wrong conclusions about treatment effects

Ready to fix this with Harmony integration?”


🔍 Batch Effect Indicators

Visual Clues:

  • ✅ Samples separate by batch in UMAP/tSNE
  • ✅ Distinct clustering patterns per dataset
  • ✅ Different gene expression distributions
  • ✅ Inconsistent spatial domain identification

Step 3: Basic Harmony Integration

💬 Your First Integration


👤 You: “This is exactly what I was worried about! Yes, let’s integrate these datasets with Harmony. Can you use the default settings first?”

🤖 ChatSpatial: “Yes! I’ll start with Harmony’s default parameters, which work well for most spatial transcriptomics datasets. Let me run the integration now.”

[ChatSpatial performs Harmony integration with default parameters]

🤖 ChatSpatial: “Integration complete! Here’s what Harmony achieved:

Integration Results:

  • Harmony iterations: 10 (converged successfully)
  • Integration time: 45 seconds
  • Memory usage: 2.1 GB peak

Default Parameters Used:

  • theta (batch correction strength): 2.0
  • lambda (regularization): 1.0
  • sigma (clustering width): 0.1
  • max_iter_harmony: 10

Quality Improvement:

  • Silhouette score by batch: 0.73 → 0.28 (batch effects reduced)
  • Visual mixing: Significant improvement observable in UMAP
  • Integration convergence: Successful completion in 10 iterations

Visual Results: [New UMAP shows much better mixing of batches]

The integration looks good! Samples now mix well in the embedding space, and I can see shared biological structures across batches. Would you like to see the integrated spatial maps or work with additional parameters?”

👤 You: “This looks much better! Can you show me how the spatial domains look after integration?”

🤖 ChatSpatial: “Yes! Let me analyze the spatial domains in your integrated data.”

[ChatSpatial performs spatial domain analysis on integrated data]

🤖 ChatSpatial: “Here are your integrated spatial domains:

Integrated Spatial Analysis:

  • 7 shared spatial domains identified across all samples
  • Consistent cortical layering now visible across batches
  • Hippocampal structures properly aligned between samples
  • White matter regions show clear consistency

Cross-Sample Consistency:

  • Domain 0 (Outer cortex): Present in all 3 samples
  • Domain 1 (Inner cortex): Consistent spatial organization
  • Domain 2 (Hippocampus): Properly aligned across batches
  • Domain 3 (White matter): Clear structural consistency
  • Domain 4 (Ventricular zone): Shared across samples
  • Domain 5 (Corpus callosum): Anatomically consistent
  • Domain 6 (Boundary regions): Present in all samples

Biological Validation:

✅ Anatomical structures align across samples ✅ Treatment effects can be assessed without batch confounding ✅ Spatial patterns follow known brain anatomy ✅ Cell type distributions are biologically consistent”


🎯 Integration Quality Indicators

Current Quality Metrics:

  • Silhouette score by batch decreased significantly
  • Visual mixing improved in UMAP embedding
  • Spatial coherence maintained across samples
  • Integration convergence achieved successfully

Step 4: Understanding Current Implementation Limitations

💬 Current Implementation Status


👤 You: “The integration looks good, but I noticed some subtle clustering by batch still remains. Can we fine-tune the parameters to get better integration?”

🤖 ChatSpatial: “I understand your concern! Let me be completely honest about what’s currently available:

Current Implementation Reality:

  • Parameter tuning: Not yet implemented
  • theta and lambda parameters: Not configurable in current version
  • sigma: Hardcoded to 0.1 (cannot be adjusted)
  • Advanced quality metrics: kBET and LISI scores not yet available

What Currently Works:

  1. Basic Harmony integration with standard defaults
  2. Silhouette score calculation for batch effect assessment
  3. Visual assessment through UMAP plots
  4. Convergence monitoring to ensure integration completes

Honest Assessment:

The current implementation applies Harmony with well-tested default parameters that work reasonably well for most spatial transcriptomics datasets. While parameter tuning isn’t available yet, the default integration often provides adequate results for downstream analysis.

Current Options:

  • Assess whether current integration quality meets your analysis needs
  • Focus on ensuring good data quality before integration
  • Use visual inspection to validate biological preservation

Future Development:

Parameter tuning and advanced quality metrics are planned for future releases.”


🎯 Current Implementation Status

What’s Available Now:

  • ✅ Basic Harmony integration with optimized defaults
  • ✅ Silhouette score validation
  • ✅ Visual assessment through UMAP plots
  • ✅ Convergence monitoring

What’s Not Yet Available:

  • ❌ Manual parameter tuning (theta, lambda)
  • ❌ Advanced quality metrics (kBET, LISI)
  • ❌ Parameter optimization workflows
  • ❌ Platform-specific parameter sets

Step 5: Multi-Modal Integration Strategies

💬 Handling Different Platforms


👤 You: “This is excellent! But I also have some MERFISH data from similar brain regions. Can Harmony integrate across different spatial transcriptomics platforms?”

🤖 ChatSpatial: “Good question! Multi-platform integration is more challenging but possible. Let me explain the considerations and approach:

Multi-Platform Integration Challenges:

  • Different gene sets: Visium (~30k genes) vs MERFISH (~300 genes)
  • Different spatial resolutions: Visium (55μm spots) vs MERFISH (subcellular)
  • Different data distributions: UMI counts vs molecule counts
  • Different noise characteristics: Platform-specific technical artifacts

Multi-Modal Integration Steps:

  1. Gene intersection: Use only shared genes for integration
  2. Resolution matching: Aggregate MERFISH to spot-level if needed
  3. Normalization: Apply platform-specific preprocessing
  4. Parameters: Use stronger regularization for platform differences

Let me load your MERFISH data and see what we’re working with.”

👤 You: “The MERFISH data is at /data/merfish_mouse_brain.h5ad. It has 248 genes and single-cell resolution.”

🤖 ChatSpatial: “Good! Let me analyze your MERFISH dataset and prepare for multi-platform integration.”

[ChatSpatial loads and analyzes MERFISH data]

🤖 ChatSpatial: “Here’s your MERFISH dataset analysis:

MERFISH Dataset:

  • 45,234 cells (single-cell resolution)
  • 248 genes (targeted gene panel)
  • Spatial resolution: ~0.1μm precision
  • Coverage area: 2.5mm × 2.5mm brain region

Integration Steps for Your Data:

  1. Gene matching: 186 genes overlap between Visium and MERFISH
  2. Spatial aggregation: Group MERFISH cells into 55μm pseudo-spots
  3. Resolution matching: Create matched spatial grids
  4. Parameters: Use theta=4.0, lambda=1.5 for platform differences

Multi-Platform Integration Parameters:

  • theta: 4.0 (stronger correction for platform effects)
  • lambda: 1.5 (increased regularization)
  • sigma: 0.08 (tighter clustering)
  • platform_weight: 0.3 (balance platform vs batch effects)

Ready to integrate across platforms? This will create a unified view of your brain tissue across technologies.”

👤 You: “Yes, let’s do the multi-platform integration!”

🤖 ChatSpatial: “Good! Let me perform the multi-platform integration now.”

[ChatSpatial runs enhanced multi-platform Harmony integration]

🤖 ChatSpatial: “Multi-platform integration successful!

Integration Results:

  • Total samples integrated: 4 (3 Visium + 1 MERFISH)
  • Shared genes used: 186 genes
  • Integration convergence: 15 iterations
  • Platform mixing achieved: Excellent

Quality Assessment:

  • Visual integration: Good mixing observed in UMAP embedding
  • Silhouette improvement: Batch effects reduced across platforms
  • Integration convergence: Successfully completed in 15 iterations
  • Spatial coherence: Spatial patterns preserved across technologies

Key Achievements:

Visium and MERFISH data now share common embedding space ✅ Spatial domains consistent across platforms ✅ Cell types properly aligned between technologies ✅ Biological patterns preserved while removing technical differences

Cross-Platform Results:

  • Cortical layering: Consistent across Visium and MERFISH
  • Cell type markers: Aligned between platforms
  • Spatial gradients: Maintained across technologies
  • Treatment effects: Assessable across all samples

This is functional multi-platform integration! Would you like to see comparative analyses or export the integrated data?”


🔬 Multi-Modal Guidelines

Platform Guidelines:

  • Gene intersection: Focus on shared measurements
  • Resolution matching: Harmonize spatial scales
  • Normalization: Apply platform-appropriate preprocessing
  • Validation: Check biological consistency across platforms

Step 6: Quality Control and Validation

💬 Integration Quality Assessment


👤 You: “This multi-platform integration looks amazing! But how can I be sure it’s working correctly and not removing important biological differences?”

🤖 ChatSpatial: “Good question! Proper validation is important for integration quality. Let me run available quality control checks on your integrated data.”

[ChatSpatial performs extensive integration validation]

🤖 ChatSpatial: “Here’s your integration quality report:

1. Available Integration Metrics:

Basic Quality Assessment:

  • Silhouette score improvement: Batch effects reduced significantly
  • Visual mixing assessment: Observable improvement in UMAP plots
  • Integration convergence: Successfully completed integration process
  • Spatial preservation: Maintained spatial patterns through integration

Current Limitations:

  • Advanced mixing metrics (kBET, LISI, mixing entropy): Not yet implemented
  • Quantitative mixing scores: Under development
  • Cross-platform correlation: Basic assessment only

2. Biological Preservation Assessment:

Cell Type Consistency:

  • Marker gene preservation: 94% of known markers maintained
  • Cell type purity: 0.87 (>0.8 = well-separated cell types)
  • Cross-sample correlation: 0.82 (consistent biology)

Spatial Structure Validation:

  • Spatial autocorrelation: 0.89 (spatial patterns preserved)
  • Domain coherence: 0.91 (spatial domains maintained)
  • Distance preservation: 0.84 (spatial relationships intact)

3. Known Biology Checks:

Anatomical Consistency:Cortical layers: Properly aligned across all samples ✅ Hippocampal structures: Consistent spatial organization ✅ White matter: Clear delineation maintained ✅ Vascular patterns: Preserved across platforms

Marker Gene Validation:Neuronal markers (Rbfox3, Tubb3): Consistent expression patterns ✅ Glial markers (Gfap, Olig2): Proper spatial distribution ✅ Regional markers (Emx1, Dlx2): Maintained spatial gradients

4. Potential Issues Detected:

⚠️ Minor over-correction in hippocampal CA1 region (LISI = 3.1, slightly high) ✅ No significant biology loss detected ✅ No inappropriate cell type mergingSpatial patterns well preserved

5. Integration Assessment: Functional

Recommendation: Your integration shows improvement and is ready for downstream analysis, with the understanding that advanced quality metrics are still under development.”

👤 You: “That’s reassuring! The integration seems to be working well for my analysis needs.”

🤖 ChatSpatial: “Good! You’re taking the right approach by focusing on whether the integration meets your research objectives.

Key Points to Remember:

  • Visual inspection is your primary validation tool right now
  • Silhouette score improvement indicates batch effect reduction
  • Spatial pattern preservation can be assessed through domain analysis
  • Biological markers should still show expected expression patterns

Moving Forward: Your integration is functional and ready for downstream analysis. The current implementation provides a working foundation for:

  • Spatial domain identification across samples
  • Cell type analysis without batch confounding
  • Treatment effect assessment
  • Comparative studies across datasets

Future Validation Tools: As advanced quality metrics become available, you will be able to perform more detailed validation, but the current integration should serve your research needs well.”


🔍 Validation Steps

Current Technical Metrics:

  • Silhouette score reduction: Batch effects decreased
  • Visual mixing improvement: Observable in UMAP plots
  • Integration convergence: Process completed successfully

Biological Validation:

  • Marker genes preserved: Known biology maintained through visual inspection
  • Spatial coherence: Spatial patterns intact
  • Cell type separation: Distinct cell types maintained in analysis

Step 7: Real Conversation Examples

💬 Integration Examples

Scenario 1: Over-Integration Detection


👤 You: “I integrated my data with theta=5.0 but now my cell types look merged together. What happened?”

🤖 ChatSpatial: “It sounds like you’ve over-integrated your data! High theta values can be too aggressive and merge distinct biological populations. Let me check your integration:

Over-Integration Signs:

  • Cell type markers: Neuronal and glial markers now co-expressed
  • Spatial domains: Distinct regions incorrectly merged
  • Biological variance: Reduced below expected levels
  • Known differences: Treatment effects may be masked

Solution: Reduce theta to 2.0-3.0 and re-integrate. This will preserve biological differences while removing batch effects.”

Scenario 2: Under-Integration Issues


👤 You: “My samples still cluster by batch after integration. Is Harmony not working?”

🤖 ChatSpatial: “This suggests the integration could be improved. Let me be honest about what’s possible:

Potential Causes:

  • Strong batch effects: May require multiple integration attempts
  • Data quality differences: Some samples may need better preprocessing
  • Platform differences: Different technologies can be challenging
  • Insufficient preprocessing: Data may need more thorough QC

Possible Solutions:

  1. Re-run integration: Sometimes a fresh run helps
  2. Check data quality: Ensure consistent preprocessing across samples
  3. Examine gene filtering: Use more stringent gene selection
  4. Assess adequacy: Current integration might still be usable

Current Limitation: Parameter tuning is not available yet, so we use default settings that handle most cases adequately.”

Scenario 3: Platform-Specific Integration


👤 You: “I’m trying to integrate Slide-seq and Visium data but getting poor results. Any specific advice?”

🤖 ChatSpatial: “Multi-platform integration is challenging! Let me be honest about the current capabilities:

Slide-seq vs Visium Challenges:

  • Resolution differences: Slide-seq (~10μm) vs Visium (~55μm)
  • Sensitivity differences: Different gene detection rates
  • Coverage differences: Different spatial sampling patterns

Current Integration Approach:

  • Gene intersection: Use genes detected in both platforms
  • Basic normalization: Apply standard preprocessing to both
  • Standard Harmony: Use default parameters (no platform-specific tuning available)
  • Quality assessment: Use visual inspection and silhouette scores

Current Expectations: Multi-platform integration with current tools may show partial success. Focus on whether the result is adequate for your analysis needs rather than perfect mixing.”


💡 Useful Conversation Tips

Useful Questions:

  • ✅ “Can you check if my integration preserved known biology?”
  • ✅ “Is the integration quality adequate for my analysis goals?”
  • ✅ “How can I validate this integration with available metrics?”
  • ✅ “Are there any obvious problems with my integration results?”

Current Expectations:

  • ✅ “Work with current capabilities and available metrics”
  • ✅ “Focus on whether integration serves your research needs”
  • ✅ “Understand that some batch structure may remain”
  • ✅ “Use visual inspection as primary validation method”

Step 8: Best Practices and Common Pitfalls

🎯 Integration Guidelines

Data Preparation Guidelines

Before Integration:

  1. Quality control each dataset individually before integration
  2. Use consistent preprocessing across all datasets
  3. Document batch information clearly in metadata
  4. Identify shared genes and filter appropriately

Current Implementation

Currently Available:

# Standard Harmony integration with hardcoded parameters
sigma = 0.1          # Fixed value (not adjustable)
max_iter = 10        # Default convergence setting
# Other parameters use Harmony library defaults

Current Status:

  • No manual parameter tuning is currently available
  • Standard defaults work for most spatial transcriptomics datasets
  • Focus on data quality rather than parameter optimization
  • Future versions will include parameter tuning capabilities

Best Practice with Current Tools: Ensure good data preprocessing since parameter tuning is not available yet.

⚠️ Common Problems and Solutions

Pitfall 1: Over-Integration

Warning Signs:

  • Cell types inappropriately merge
  • Known biological differences disappear
  • Marker genes lose specificity
  • Spatial patterns become homogeneous

Current Options:

  • Assess if integration is adequate for your analysis goals
  • Validate with known marker genes through visual inspection
  • Check spatial patterns in domain analysis
  • Re-run integration if results seem problematic

Pitfall 2: Platform Bias

Warning Signs:

  • One platform dominates the integration
  • Platform-specific patterns are lost
  • Unbalanced representation in embedding
  • Technical features become biological features

Current Options:

  • Balance sample sizes across platforms when possible
  • Apply consistent normalization to all datasets
  • Ensure robust quality control before integration
  • Focus on shared, high-quality genes

Pitfall 3: Incomplete Convergence

Warning Signs:

  • Integration doesn’t complete successfully
  • Repeated runs give different results
  • Unexpected patterns in final embedding

Current Options:

  • Check data quality - ensure consistent preprocessing
  • Verify sufficient shared genes between datasets
  • Re-run integration if convergence fails
  • Examine input data for systematic differences

Pitfall 4: Biological Over-Correction

Warning Signs:

  • Treatment effects disappear
  • Known disease signatures are lost
  • Cell type diversity decreases artificially
  • Spatial gradients become flattened

Current Options:

  • Assess if biological patterns are preserved through visual inspection
  • Validate with known biological markers
  • Consider if current integration meets research needs
  • Focus on ensuring good preprocessing before integration

🔧 Troubleshooting

Issue: “Integration is too slow”

💬 Conversation Example: 👤 You: “My Harmony integration is taking forever. Is there a way to speed it up?”

🤖 ChatSpatial: “Let me help with integration speed issues:

Speed Optimization Options:

  1. Reduce gene set: Use top 2000-3000 variable genes only
  2. Check data size: Very large datasets may take longer
  3. Ensure adequate memory: RAM limitations can slow processing
  4. Monitor convergence: Integration should complete within reasonable time

Current Status:

With default parameters, integration speed depends mainly on data size and quality. The process typically completes within minutes for standard datasets.”

Issue: “Results not reproducible”

💬 Conversation Example: 👤 You: “I get different results each time I run Harmony. How do I make it reproducible?”

🤖 ChatSpatial: “Reproducibility is important! Here’s what affects consistency:

Factors Affecting Reproducibility:

  • Random seed: Integration uses random initialization
  • Data preprocessing: Must be identical across runs
  • Software versions: Environment should be consistent
  • Data ordering: Input order can affect results

Current Approach:

The integration uses standard random seeding. For reproducibility, ensure identical preprocessing and consistent computational environment.”

Issue: “Integration removes treatment effects”

💬 Conversation Example: 👤 You: “After integration, I can’t see my treatment effects anymore. Did Harmony remove them?”

🤖 ChatSpatial: “This is an important concern! Let me address what’s possible:

Understanding the Issue:

  • Batch-treatment confounding: If treatments align with batches, integration may affect both
  • Current limitations: No parameter tuning available to address this specifically
  • Assessment needed: Visual inspection of treatment-specific markers

Current Approach:

  1. Examine integration results: Check if treatment markers are preserved
  2. Visual validation: Look for treatment effects in spatial patterns
  3. Consider adequacy: Assess if results still show biological differences
  4. Document limitations: Be transparent about integration effects on treatment signals

Current Assessment:

Current implementation may not preserve all treatment effects if they are confounded with batch variables.”

📊 Integration Quality Metrics

Current Metrics

MetricRangeGood ValueInterpretation
Silhouette (batch)-1 to 1<0.3Batch effect reduction
Visual mixingQualitativeImprovedObservable in UMAP plots
Integration convergenceBooleanTrueProcess completed successfully
Spatial coherenceQualitativePreservedVisual assessment of spatial patterns

Future Metrics

MetricStatusNotes
kBET acceptanceIn developmentQuantitative batch mixing assessment
LISI scoreIn developmentIntegration completeness measure
Mixing entropyPlannedLocal neighborhood mixing analysis
Platform LISIPlannedCross-platform integration quality

Step 9: Results for Publication

💬 Publication Preparation


👤 You: “My integration looks great! Now I need to create publication-quality figures and document the methods. Can you help?”

🤖 ChatSpatial: “Yes! Let me help you create publication materials for your integration analysis.

Publication Materials I’ll Generate:

1. Methods Text:

Batch effect correction and data integration used Harmony 
(Korsunsky et al., 2019) through ChatSpatial. Multiple spatial 
transcriptomics datasets were integrated using standard Harmony parameters 
with sigma=0.1 and max_iter=10. Integration quality was assessed using 
silhouette score analysis and visual inspection of UMAP embeddings. 
Spatial patterns were maintained while achieving batch mixing.

2. Quality Control Figures:

  • Before/after UMAP plots showing batch correction
  • Integration quality metrics visualization
  • Spatial domain consistency across samples
  • Marker gene preservation validation plots

3. Supplementary Materials:

  • Parameter optimization results
  • Cross-platform validation metrics
  • Technical replication analysis
  • Method comparison benchmarks

4. Statistical Reporting:

  • Integration convergence statistics
  • Batch effect quantification
  • Biological preservation metrics
  • Cross-validation results

Would you like me to generate specific figures or focus on particular aspects?”

👤 You: “Yes, please create the main integration figure showing before/after results and quality metrics.”

🤖 ChatSpatial: “Good! I’ll create a complete integration figure.”

[ChatSpatial generates publication-quality multi-panel figure]

🤖 ChatSpatial: “Here’s your integration figure:

Figure Components:

Panel A: UMAP before integration

  • Clear batch separation visible
  • Samples cluster by technical factors
  • Note: “Strong batch effects (kBET=0.12)”

Panel B: UMAP after Harmony integration

  • Excellent sample mixing
  • Biological structure preserved
  • Note: “Successful integration (kBET=0.91)”

Panel C: Integration quality metrics

  • Bar plot showing all QC metrics
  • Before/after comparisons
  • Statistical significance indicators

Panel D: Spatial domain consistency

  • Side-by-side spatial maps
  • Shared domains across samples
  • Color-coded anatomical regions

Panel E: Marker gene validation

  • Heatmap of key marker genes
  • Expression preserved post-integration
  • Cell type-specific patterns maintained

Figure Caption (Draft): ‘Multi-sample spatial transcriptomics integration using Harmony. (A) UMAP before integration shows strong batch effects with samples clustering by technical factors rather than biological similarity. (B) UMAP after Harmony integration shows good batch mixing while preserving biological structure. (C) Integration quality metrics show improvement across all measures. (D) Spatial domain analysis shows consistent anatomical organization across integrated samples. (E) Marker gene expression validation confirms preservation of cell type-specific signatures after integration. Integration parameters: θ=3.0, λ=1.0, σ=0.05.’

Figure Statistics:

  • Resolution: 300 DPI (publication-ready)
  • Format: PDF with vector graphics
  • Color scheme: Colorblind-friendly
  • Size: Optimized for journal specifications

This figure demonstrates successful integration with scientific accuracy.”


📋 Publication Requirements

Methods Documentation:

  • Parameter choices justified
  • Quality metrics reported
  • Validation methods described
  • Software versions documented

Results Presentation:

  • Before/after visualizations
  • Quantitative quality metrics
  • Biological validation
  • Statistical significance testing

Reproducibility:

  • Code availability
  • Parameter documentation
  • Data accessibility
  • Environment specifications

Next Steps and Advanced Applications

🚀 Additional Integration Methods

Want to Learn More? Try these additional integrations:

Additional Integration Questions:

  • **“Can you integrate temporal samples to study development?””
  • “How do I integrate data with different gene panels?”
  • “Can you perform conditional integration preserving treatment effects?”
  • “How do I integrate single-cell and spatial data together?”

Multi-Modal Examples:

  • **“Integrate spatial transcriptomics with imaging data”
  • “Combine proteomics and transcriptomics spatial data”
  • “Harmonize different spatial resolution datasets”
  • “Create cross-species integration workflows”

📚 Additional Learning

Related Tutorials:

Additional Topics:

  • Conditional Integration: Preserve specific biological factors
  • Cross-Species Integration: Harmonize data across organisms
  • Temporal Integration: Handle time-series spatial data
  • Multi-Platform Benchmarking: Compare integration methods

💡 Tips for Success

Integration Approach:

  • Start conservative: Under-integration is better than over-integration
  • Validate thoroughly: Biology should make sense
  • Document methods: Reproducibility is important
  • Test parameters: What works for others may not work for your data (when available)

Quality First:

  • Take time to check integration quality
  • Validate with known biological markers
  • Test multiple parameter combinations (when available)
  • Always check spatial coherence preservation

Communication Tips:

  • Ask ChatSpatial specific questions about concerns
  • Request validation of integration decisions
  • Seek help interpreting quality metrics
  • Get guidance on parameter optimization

Conclusion

You have learned about batch effect correction and data integration using ChatSpatial’s current Harmony implementation. You can now:

Identify and assess batch effects in multi-sample datasets ✅ Apply basic Harmony integration with standard defaults
Understand current implementation limitationsValidate integration quality with available metrics ✅ Use visual inspection for integration assessment ✅ Create functional results for downstream analysis

Note: Current integration uses well-tested defaults that work for most datasets. Focus on data quality and visual validation since parameter tuning is not yet available.

Key Point: The current implementation has limitations but provides a working foundation for batch integration. Future versions will include additional optimization tools.

Good luck with your integration analysis.