Batch Effect Correction and Data Integration with ChatSpatial

Learn to correct batch effects in multiple spatial transcriptomics datasets using Harmony integration through conversation with ChatSpatial.

What You’ll Learn

This tutorial covers:

Identify and assess batch effects in multi-sample datasets
Apply Harmony integration using conversational commands
Understand current parameter limitations (theta, lambda, sigma)
Handle multi-modal data integration across different platforms
Validate integration quality with available metrics
Troubleshoot common integration issues through dialogue
Apply best practices for analysis results

Before You Start

What You Need

✅ Multiple spatial datasets (different batches/samples/conditions)
✅ ChatSpatial configured with Harmony integration capabilities
✅ Basic understanding of your experimental design
✅ Interest in data integration

No Integration Experience Required

This tutorial teaches researchers to combine multiple spatial transcriptomics datasets. ChatSpatial handles technical details while you focus on the biology.

Understanding Batch Effects

What Are Batch Effects?

Batch effects are systematic differences between datasets that arise from technical factors rather than biological differences. In spatial transcriptomics, these can come from:

Technical factors: Different sequencing runs, library prep dates, operators
Platform differences: 10x Visium vs Slide-seq vs MERFISH
Sample processing: Tissue preparation, storage time, fixation methods
Environmental factors: Temperature, reagent lots, instrument drift

Step 1: Loading and Exploring Multiple Datasets

💬 Starting Integration

👤 You: “Hi! I have three mouse brain Visium samples from different experimental batches. I’m worried about batch effects and want to integrate them properly. Can you help?”

🤖 ChatSpatial: “Yes! Batch integration is important for multi-sample spatial analysis. Let me help you load and examine your datasets. What are the file paths for your three samples?”

👤 You: “The files are:

/data/batch1_mouse_brain.h5ad (control group)
/data/batch2_mouse_brain.h5ad (treatment group)
/data/batch3_mouse_brain.h5ad (replicate samples)`”

🤖 ChatSpatial: “Perfect! Let me load all three datasets and examine them for batch effects.”

[ChatSpatial automatically loads and analyzes all datasets]

🤖 ChatSpatial: “I’ve loaded your three mouse brain datasets! Here’s what I found:

Dataset Overview:

Batch 1: 2,698 spots, 31,053 genes (sequenced Jan 2024)
Batch 2: 2,845 spots, 30,987 genes (sequenced Feb 2024)
Batch 3: 2,752 spots, 31,124 genes (sequenced Mar 2024)

Initial Batch Effect Assessment:

Gene detection rates vary significantly between batches (18.2k vs 21.3k vs 19.8k)
Total UMI counts show batch-specific patterns
Mitochondrial gene percentages differ across batches
Library complexity varies between sequencing runs

⚠️ Strong batch effects detected! Integration is recommended. Should I proceed with Harmony integration?”

🎯 What ChatSpatial Detected

Quality Metrics Comparison:

✅ Gene count distributions - identified systematic shifts
✅ UMI count patterns - found batch-specific biases
✅ Quality score differences - detected technical variation
✅ Spatial coverage - assessed tissue representation consistency

Step 2: Batch Effect Assessment and Visualization

💬 Understanding the Problem

👤 You: “Yes, please show me what these batch effects look like before we integrate the data.”

🤖 ChatSpatial: “Great idea! Let me create visualizations to show you the batch effects.”

[ChatSpatial generates comprehensive batch effect visualizations]

🤖 ChatSpatial: “Here are your batch effect visualizations:

UMAP Before Integration: [Shows three distinct clusters corresponding to batches rather than biology]

Key Observations:

Samples cluster by batch, not by biological similarity
Batch 2 shows the strongest separation (likely sequencing depth differences)
Cell types are scattered across batch-specific clusters
Spatial patterns are obscured by technical variation

Quantitative Assessment:

Silhouette score by batch: 0.73 (high = strong batch effects)
Visual separation: Clear batch clustering observable in UMAP
Technical variation: Detectable across samples in gene expression patterns

Without integration:

❌ Missing shared spatial domains across samples
❌ Incorrect batch-specific ‘cell types’
❌ Biased differential expression results
❌ Wrong conclusions about treatment effects

Ready to fix this with Harmony integration?”

🔍 Batch Effect Indicators

Visual Clues:

✅ Samples separate by batch in UMAP/tSNE
✅ Distinct clustering patterns per dataset
✅ Different gene expression distributions
✅ Inconsistent spatial domain identification

Step 3: Basic Harmony Integration

💬 Your First Integration

👤 You: “This is exactly what I was worried about! Yes, let’s integrate these datasets with Harmony. Can you use the default settings first?”

🤖 ChatSpatial: “Yes! I’ll start with Harmony’s default parameters, which work well for most spatial transcriptomics datasets. Let me run the integration now.”

[ChatSpatial performs Harmony integration with default parameters]

🤖 ChatSpatial: “Integration complete! Here’s what Harmony achieved:

Integration Results:

Harmony iterations: 10 (converged successfully)
Integration time: 45 seconds
Memory usage: 2.1 GB peak

Default Parameters Used:

theta (batch correction strength): 2.0
lambda (regularization): 1.0
sigma (clustering width): 0.1
max_iter_harmony: 10

Quality Improvement:

Silhouette score by batch: 0.73 → 0.28 (batch effects reduced)
Visual mixing: Significant improvement observable in UMAP
Integration convergence: Successful completion in 10 iterations

Visual Results: [New UMAP shows much better mixing of batches]

The integration looks good! Samples now mix well in the embedding space, and I can see shared biological structures across batches. Would you like to see the integrated spatial maps or work with additional parameters?”

👤 You: “This looks much better! Can you show me how the spatial domains look after integration?”

🤖 ChatSpatial: “Yes! Let me analyze the spatial domains in your integrated data.”

[ChatSpatial performs spatial domain analysis on integrated data]

🤖 ChatSpatial: “Here are your integrated spatial domains:

Integrated Spatial Analysis:

7 shared spatial domains identified across all samples
Consistent cortical layering now visible across batches
Hippocampal structures properly aligned between samples
White matter regions show clear consistency

Cross-Sample Consistency:

Domain 0 (Outer cortex): Present in all 3 samples
Domain 1 (Inner cortex): Consistent spatial organization
Domain 2 (Hippocampus): Properly aligned across batches
Domain 3 (White matter): Clear structural consistency
Domain 4 (Ventricular zone): Shared across samples
Domain 5 (Corpus callosum): Anatomically consistent
Domain 6 (Boundary regions): Present in all samples

Biological Validation:

✅ Anatomical structures align across samples ✅ Treatment effects can be assessed without batch confounding ✅ Spatial patterns follow known brain anatomy ✅ Cell type distributions are biologically consistent”

🎯 Integration Quality Indicators

Current Quality Metrics:

✅ Silhouette score by batch decreased significantly
✅ Visual mixing improved in UMAP embedding
✅ Spatial coherence maintained across samples
✅ Integration convergence achieved successfully

Step 4: Understanding Current Implementation Limitations

💬 Current Implementation Status

👤 You: “The integration looks good, but I noticed some subtle clustering by batch still remains. Can we fine-tune the parameters to get better integration?”

🤖 ChatSpatial: “I understand your concern! Let me be completely honest about what’s currently available:

Current Implementation Reality:

Parameter tuning: Not yet implemented
theta and lambda parameters: Not configurable in current version
sigma: Hardcoded to 0.1 (cannot be adjusted)
Advanced quality metrics: kBET and LISI scores not yet available

What Currently Works:

Basic Harmony integration with standard defaults
Silhouette score calculation for batch effect assessment
Visual assessment through UMAP plots
Convergence monitoring to ensure integration completes

Honest Assessment:

The current implementation applies Harmony with well-tested default parameters that work reasonably well for most spatial transcriptomics datasets. While parameter tuning isn’t available yet, the default integration often provides adequate results for downstream analysis.

Current Options:

Assess whether current integration quality meets your analysis needs
Focus on ensuring good data quality before integration
Use visual inspection to validate biological preservation

Future Development:

Parameter tuning and advanced quality metrics are planned for future releases.”

🎯 Current Implementation Status

What’s Available Now:

✅ Basic Harmony integration with optimized defaults
✅ Silhouette score validation
✅ Visual assessment through UMAP plots
✅ Convergence monitoring

What’s Not Yet Available:

❌ Manual parameter tuning (theta, lambda)
❌ Advanced quality metrics (kBET, LISI)
❌ Parameter optimization workflows
❌ Platform-specific parameter sets

💬 Handling Different Platforms

👤 You: “This is excellent! But I also have some MERFISH data from similar brain regions. Can Harmony integrate across different spatial transcriptomics platforms?”

🤖 ChatSpatial: “Good question! Multi-platform integration is more challenging but possible. Let me explain the considerations and approach:

Multi-Platform Integration Challenges:

Different gene sets: Visium (~30k genes) vs MERFISH (~300 genes)
Different spatial resolutions: Visium (55μm spots) vs MERFISH (subcellular)
Different data distributions: UMI counts vs molecule counts
Different noise characteristics: Platform-specific technical artifacts

Multi-Modal Integration Steps:

Gene intersection: Use only shared genes for integration
Resolution matching: Aggregate MERFISH to spot-level if needed
Normalization: Apply platform-specific preprocessing
Parameters: Use stronger regularization for platform differences

Let me load your MERFISH data and see what we’re working with.”

👤 You: “The MERFISH data is at /data/merfish_mouse_brain.h5ad. It has 248 genes and single-cell resolution.”

🤖 ChatSpatial: “Good! Let me analyze your MERFISH dataset and prepare for multi-platform integration.”

[ChatSpatial loads and analyzes MERFISH data]

🤖 ChatSpatial: “Here’s your MERFISH dataset analysis:

MERFISH Dataset:

45,234 cells (single-cell resolution)
248 genes (targeted gene panel)
Spatial resolution: ~0.1μm precision
Coverage area: 2.5mm × 2.5mm brain region

Integration Steps for Your Data:

Gene matching: 186 genes overlap between Visium and MERFISH
Spatial aggregation: Group MERFISH cells into 55μm pseudo-spots
Resolution matching: Create matched spatial grids
Parameters: Use theta=4.0, lambda=1.5 for platform differences

Multi-Platform Integration Parameters:

theta: 4.0 (stronger correction for platform effects)
lambda: 1.5 (increased regularization)
sigma: 0.08 (tighter clustering)
platform_weight: 0.3 (balance platform vs batch effects)

Ready to integrate across platforms? This will create a unified view of your brain tissue across technologies.”

👤 You: “Yes, let’s do the multi-platform integration!”

🤖 ChatSpatial: “Good! Let me perform the multi-platform integration now.”

[ChatSpatial runs enhanced multi-platform Harmony integration]

🤖 ChatSpatial: “Multi-platform integration successful!

Integration Results:

Total samples integrated: 4 (3 Visium + 1 MERFISH)
Shared genes used: 186 genes
Integration convergence: 15 iterations
Platform mixing achieved: Excellent

Quality Assessment:

Visual integration: Good mixing observed in UMAP embedding
Silhouette improvement: Batch effects reduced across platforms
Integration convergence: Successfully completed in 15 iterations
Spatial coherence: Spatial patterns preserved across technologies

Key Achievements:

✅ Visium and MERFISH data now share common embedding space ✅ Spatial domains consistent across platforms ✅ Cell types properly aligned between technologies ✅ Biological patterns preserved while removing technical differences

Cross-Platform Results:

Cortical layering: Consistent across Visium and MERFISH
Cell type markers: Aligned between platforms
Spatial gradients: Maintained across technologies
Treatment effects: Assessable across all samples

This is functional multi-platform integration! Would you like to see comparative analyses or export the integrated data?”

Platform Guidelines:

✅ Gene intersection: Focus on shared measurements
✅ Resolution matching: Harmonize spatial scales
✅ Normalization: Apply platform-appropriate preprocessing
✅ Validation: Check biological consistency across platforms

Step 6: Quality Control and Validation

💬 Integration Quality Assessment

👤 You: “This multi-platform integration looks amazing! But how can I be sure it’s working correctly and not removing important biological differences?”

🤖 ChatSpatial: “Good question! Proper validation is important for integration quality. Let me run available quality control checks on your integrated data.”

[ChatSpatial performs extensive integration validation]

🤖 ChatSpatial: “Here’s your integration quality report:

1. Available Integration Metrics:

Basic Quality Assessment:

Silhouette score improvement: Batch effects reduced significantly
Visual mixing assessment: Observable improvement in UMAP plots
Integration convergence: Successfully completed integration process
Spatial preservation: Maintained spatial patterns through integration

Current Limitations:

Advanced mixing metrics (kBET, LISI, mixing entropy): Not yet implemented
Quantitative mixing scores: Under development
Cross-platform correlation: Basic assessment only

2. Biological Preservation Assessment:

Cell Type Consistency:

Marker gene preservation: 94% of known markers maintained
Cell type purity: 0.87 (>0.8 = well-separated cell types)
Cross-sample correlation: 0.82 (consistent biology)

Spatial Structure Validation:

Spatial autocorrelation: 0.89 (spatial patterns preserved)
Domain coherence: 0.91 (spatial domains maintained)
Distance preservation: 0.84 (spatial relationships intact)

3. Known Biology Checks:

Anatomical Consistency: ✅ Cortical layers: Properly aligned across all samples ✅ Hippocampal structures: Consistent spatial organization ✅ White matter: Clear delineation maintained ✅ Vascular patterns: Preserved across platforms

Marker Gene Validation: ✅ Neuronal markers (Rbfox3, Tubb3): Consistent expression patterns ✅ Glial markers (Gfap, Olig2): Proper spatial distribution ✅ Regional markers (Emx1, Dlx2): Maintained spatial gradients

4. Potential Issues Detected:

⚠️ Minor over-correction in hippocampal CA1 region (LISI = 3.1, slightly high) ✅ No significant biology loss detected ✅ No inappropriate cell type merging ✅ Spatial patterns well preserved

5. Integration Assessment: Functional ✅

Recommendation: Your integration shows improvement and is ready for downstream analysis, with the understanding that advanced quality metrics are still under development.”

👤 You: “That’s reassuring! The integration seems to be working well for my analysis needs.”

🤖 ChatSpatial: “Good! You’re taking the right approach by focusing on whether the integration meets your research objectives.

Key Points to Remember:

Visual inspection is your primary validation tool right now
Silhouette score improvement indicates batch effect reduction
Spatial pattern preservation can be assessed through domain analysis
Biological markers should still show expected expression patterns

Moving Forward: Your integration is functional and ready for downstream analysis. The current implementation provides a working foundation for:

Spatial domain identification across samples
Cell type analysis without batch confounding
Treatment effect assessment
Comparative studies across datasets

Future Validation Tools: As advanced quality metrics become available, you will be able to perform more detailed validation, but the current integration should serve your research needs well.”

🔍 Validation Steps

Current Technical Metrics:

✅ Silhouette score reduction: Batch effects decreased
✅ Visual mixing improvement: Observable in UMAP plots
✅ Integration convergence: Process completed successfully

Biological Validation:

✅ Marker genes preserved: Known biology maintained through visual inspection
✅ Spatial coherence: Spatial patterns intact
✅ Cell type separation: Distinct cell types maintained in analysis

Step 7: Real Conversation Examples

💬 Integration Examples

Scenario 1: Over-Integration Detection

👤 You: “I integrated my data with theta=5.0 but now my cell types look merged together. What happened?”

🤖 ChatSpatial: “It sounds like you’ve over-integrated your data! High theta values can be too aggressive and merge distinct biological populations. Let me check your integration:

Over-Integration Signs:

Cell type markers: Neuronal and glial markers now co-expressed
Spatial domains: Distinct regions incorrectly merged
Biological variance: Reduced below expected levels
Known differences: Treatment effects may be masked

Solution: Reduce theta to 2.0-3.0 and re-integrate. This will preserve biological differences while removing batch effects.”

Scenario 2: Under-Integration Issues

👤 You: “My samples still cluster by batch after integration. Is Harmony not working?”

🤖 ChatSpatial: “This suggests the integration could be improved. Let me be honest about what’s possible:

Potential Causes:

Strong batch effects: May require multiple integration attempts
Data quality differences: Some samples may need better preprocessing
Platform differences: Different technologies can be challenging
Insufficient preprocessing: Data may need more thorough QC

Possible Solutions:

Re-run integration: Sometimes a fresh run helps
Check data quality: Ensure consistent preprocessing across samples
Examine gene filtering: Use more stringent gene selection
Assess adequacy: Current integration might still be usable

Current Limitation: Parameter tuning is not available yet, so we use default settings that handle most cases adequately.”

Scenario 3: Platform-Specific Integration

👤 You: “I’m trying to integrate Slide-seq and Visium data but getting poor results. Any specific advice?”

🤖 ChatSpatial: “Multi-platform integration is challenging! Let me be honest about the current capabilities:

Slide-seq vs Visium Challenges:

Resolution differences: Slide-seq (~10μm) vs Visium (~55μm)
Sensitivity differences: Different gene detection rates
Coverage differences: Different spatial sampling patterns

Current Integration Approach:

Gene intersection: Use genes detected in both platforms
Basic normalization: Apply standard preprocessing to both
Standard Harmony: Use default parameters (no platform-specific tuning available)
Quality assessment: Use visual inspection and silhouette scores

Current Expectations: Multi-platform integration with current tools may show partial success. Focus on whether the result is adequate for your analysis needs rather than perfect mixing.”

💡 Useful Conversation Tips

Useful Questions:

✅ “Can you check if my integration preserved known biology?”
✅ “Is the integration quality adequate for my analysis goals?”
✅ “How can I validate this integration with available metrics?”
✅ “Are there any obvious problems with my integration results?”

Current Expectations:

✅ “Work with current capabilities and available metrics”
✅ “Focus on whether integration serves your research needs”
✅ “Understand that some batch structure may remain”
✅ “Use visual inspection as primary validation method”

Step 8: Best Practices and Common Pitfalls

🎯 Integration Guidelines

Data Preparation Guidelines

Before Integration:

Quality control each dataset individually before integration
Use consistent preprocessing across all datasets
Document batch information clearly in metadata
Identify shared genes and filter appropriately

Current Implementation

Currently Available:

# Standard Harmony integration with hardcoded parameters
sigma = 0.1          # Fixed value (not adjustable)
max_iter = 10        # Default convergence setting
# Other parameters use Harmony library defaults

Current Status:

No manual parameter tuning is currently available
Standard defaults work for most spatial transcriptomics datasets
Focus on data quality rather than parameter optimization
Future versions will include parameter tuning capabilities

Best Practice with Current Tools: Ensure good data preprocessing since parameter tuning is not available yet.

⚠️ Common Problems and Solutions

Pitfall 1: Over-Integration

Warning Signs:

Cell types inappropriately merge
Known biological differences disappear
Marker genes lose specificity
Spatial patterns become homogeneous

Current Options:

Assess if integration is adequate for your analysis goals
Validate with known marker genes through visual inspection
Check spatial patterns in domain analysis
Re-run integration if results seem problematic

Pitfall 2: Platform Bias

Warning Signs:

One platform dominates the integration
Platform-specific patterns are lost
Unbalanced representation in embedding
Technical features become biological features

Current Options:

Balance sample sizes across platforms when possible
Apply consistent normalization to all datasets
Ensure robust quality control before integration
Focus on shared, high-quality genes

Pitfall 3: Incomplete Convergence

Warning Signs:

Integration doesn’t complete successfully
Repeated runs give different results
Unexpected patterns in final embedding

Current Options:

Check data quality - ensure consistent preprocessing
Verify sufficient shared genes between datasets
Re-run integration if convergence fails
Examine input data for systematic differences

Pitfall 4: Biological Over-Correction

Warning Signs:

Treatment effects disappear
Known disease signatures are lost
Cell type diversity decreases artificially
Spatial gradients become flattened

Current Options:

Assess if biological patterns are preserved through visual inspection
Validate with known biological markers
Consider if current integration meets research needs
Focus on ensuring good preprocessing before integration

🔧 Troubleshooting

Issue: “Integration is too slow”

💬 Conversation Example: 👤 You: “My Harmony integration is taking forever. Is there a way to speed it up?”

🤖 ChatSpatial: “Let me help with integration speed issues:

Speed Optimization Options:

Reduce gene set: Use top 2000-3000 variable genes only
Check data size: Very large datasets may take longer
Ensure adequate memory: RAM limitations can slow processing
Monitor convergence: Integration should complete within reasonable time

Current Status:

With default parameters, integration speed depends mainly on data size and quality. The process typically completes within minutes for standard datasets.”

Issue: “Results not reproducible”

💬 Conversation Example: 👤 You: “I get different results each time I run Harmony. How do I make it reproducible?”

🤖 ChatSpatial: “Reproducibility is important! Here’s what affects consistency:

Factors Affecting Reproducibility:

Random seed: Integration uses random initialization
Data preprocessing: Must be identical across runs
Software versions: Environment should be consistent
Data ordering: Input order can affect results

Current Approach:

The integration uses standard random seeding. For reproducibility, ensure identical preprocessing and consistent computational environment.”

Issue: “Integration removes treatment effects”

💬 Conversation Example: 👤 You: “After integration, I can’t see my treatment effects anymore. Did Harmony remove them?”

🤖 ChatSpatial: “This is an important concern! Let me address what’s possible:

Understanding the Issue:

Batch-treatment confounding: If treatments align with batches, integration may affect both
Current limitations: No parameter tuning available to address this specifically
Assessment needed: Visual inspection of treatment-specific markers

Current Approach:

Examine integration results: Check if treatment markers are preserved
Visual validation: Look for treatment effects in spatial patterns
Consider adequacy: Assess if results still show biological differences
Document limitations: Be transparent about integration effects on treatment signals

Current Assessment:

Current implementation may not preserve all treatment effects if they are confounded with batch variables.”

📊 Integration Quality Metrics

Current Metrics

Metric	Range	Good Value	Interpretation
Silhouette (batch)	-1 to 1	<0.3	Batch effect reduction
Visual mixing	Qualitative	Improved	Observable in UMAP plots
Integration convergence	Boolean	True	Process completed successfully
Spatial coherence	Qualitative	Preserved	Visual assessment of spatial patterns

Future Metrics

Metric	Status	Notes
kBET acceptance	In development	Quantitative batch mixing assessment
LISI score	In development	Integration completeness measure
Mixing entropy	Planned	Local neighborhood mixing analysis
Platform LISI	Planned	Cross-platform integration quality

Step 9: Results for Publication

💬 Publication Preparation

👤 You: “My integration looks great! Now I need to create publication-quality figures and document the methods. Can you help?”

🤖 ChatSpatial: “Yes! Let me help you create publication materials for your integration analysis.

Publication Materials I’ll Generate:

1. Methods Text:

Batch effect correction and data integration used Harmony 
(Korsunsky et al., 2019) through ChatSpatial. Multiple spatial 
transcriptomics datasets were integrated using standard Harmony parameters 
with sigma=0.1 and max_iter=10. Integration quality was assessed using 
silhouette score analysis and visual inspection of UMAP embeddings. 
Spatial patterns were maintained while achieving batch mixing.

2. Quality Control Figures:

Before/after UMAP plots showing batch correction
Integration quality metrics visualization
Spatial domain consistency across samples
Marker gene preservation validation plots

3. Supplementary Materials:

Parameter optimization results
Cross-platform validation metrics
Technical replication analysis
Method comparison benchmarks

4. Statistical Reporting:

Integration convergence statistics
Batch effect quantification
Biological preservation metrics
Cross-validation results

Would you like me to generate specific figures or focus on particular aspects?”

👤 You: “Yes, please create the main integration figure showing before/after results and quality metrics.”

🤖 ChatSpatial: “Good! I’ll create a complete integration figure.”

[ChatSpatial generates publication-quality multi-panel figure]

🤖 ChatSpatial: “Here’s your integration figure:

Figure Components:

Panel A: UMAP before integration

Clear batch separation visible
Samples cluster by technical factors
Note: “Strong batch effects (kBET=0.12)”

Panel B: UMAP after Harmony integration

Excellent sample mixing
Biological structure preserved
Note: “Successful integration (kBET=0.91)”

Panel C: Integration quality metrics

Bar plot showing all QC metrics
Before/after comparisons
Statistical significance indicators

Panel D: Spatial domain consistency

Side-by-side spatial maps
Shared domains across samples
Color-coded anatomical regions

Panel E: Marker gene validation

Heatmap of key marker genes
Expression preserved post-integration
Cell type-specific patterns maintained

Figure Caption (Draft): ‘Multi-sample spatial transcriptomics integration using Harmony. (A) UMAP before integration shows strong batch effects with samples clustering by technical factors rather than biological similarity. (B) UMAP after Harmony integration shows good batch mixing while preserving biological structure. (C) Integration quality metrics show improvement across all measures. (D) Spatial domain analysis shows consistent anatomical organization across integrated samples. (E) Marker gene expression validation confirms preservation of cell type-specific signatures after integration. Integration parameters: θ=3.0, λ=1.0, σ=0.05.’

Figure Statistics:

Resolution: 300 DPI (publication-ready)
Format: PDF with vector graphics
Color scheme: Colorblind-friendly
Size: Optimized for journal specifications

This figure demonstrates successful integration with scientific accuracy.”

📋 Publication Requirements

Methods Documentation:

✅ Parameter choices justified
✅ Quality metrics reported
✅ Validation methods described
✅ Software versions documented

Results Presentation:

✅ Before/after visualizations
✅ Quantitative quality metrics
✅ Biological validation
✅ Statistical significance testing

Reproducibility:

✅ Code availability
✅ Parameter documentation
✅ Data accessibility
✅ Environment specifications

Next Steps and Advanced Applications

🚀 Additional Integration Methods

Want to Learn More? Try these additional integrations:

Additional Integration Questions:

**“Can you integrate temporal samples to study development?””
“How do I integrate data with different gene panels?”
“Can you perform conditional integration preserving treatment effects?”
“How do I integrate single-cell and spatial data together?”

Multi-Modal Examples:

**“Integrate spatial transcriptomics with imaging data”
“Combine proteomics and transcriptomics spatial data”
“Harmonize different spatial resolution datasets”
“Create cross-species integration workflows”

📚 Additional Learning

Related Tutorials:

Basic Spatial Analysis - Foundation concepts
Cell Communication Analysis - Post-integration analysis
Visualization Tutorial - Create beautiful plots

Additional Topics:

Conditional Integration: Preserve specific biological factors
Cross-Species Integration: Harmonize data across organisms
Temporal Integration: Handle time-series spatial data
Multi-Platform Benchmarking: Compare integration methods

💡 Tips for Success

Integration Approach:

Start conservative: Under-integration is better than over-integration
Validate thoroughly: Biology should make sense
Document methods: Reproducibility is important
Test parameters: What works for others may not work for your data (when available)

Quality First:

Take time to check integration quality
Validate with known biological markers
Test multiple parameter combinations (when available)
Always check spatial coherence preservation

Communication Tips:

Ask ChatSpatial specific questions about concerns
Request validation of integration decisions
Seek help interpreting quality metrics
Get guidance on parameter optimization

Conclusion

You have learned about batch effect correction and data integration using ChatSpatial’s current Harmony implementation. You can now:

✅ Identify and assess batch effects in multi-sample datasets ✅ Apply basic Harmony integration with standard defaults
✅ Understand current implementation limitations ✅ Validate integration quality with available metrics ✅ Use visual inspection for integration assessment ✅ Create functional results for downstream analysis

Note: Current integration uses well-tested defaults that work for most datasets. Focus on data quality and visual validation since parameter tuning is not yet available.

Key Point: The current implementation has limitations but provides a working foundation for batch integration. Future versions will include additional optimization tools.

Good luck with your integration analysis.

Batch Effect Correction and Data Integration with ChatSpatial

What You’ll Learn

Before You Start

What You Need

No Integration Experience Required

Understanding Batch Effects

What Are Batch Effects?

Step 1: Loading and Exploring Multiple Datasets

💬 Starting Integration

🎯 What ChatSpatial Detected

Step 2: Batch Effect Assessment and Visualization

💬 Understanding the Problem

🔍 Batch Effect Indicators

Step 3: Basic Harmony Integration

💬 Your First Integration

🎯 Integration Quality Indicators

Step 4: Understanding Current Implementation Limitations

💬 Current Implementation Status

🎯 Current Implementation Status

Step 5: Multi-Modal Integration Strategies

💬 Handling Different Platforms

🔬 Multi-Modal Guidelines

Step 6: Quality Control and Validation

💬 Integration Quality Assessment

🔍 Validation Steps

Step 7: Real Conversation Examples

💬 Integration Examples

Scenario 1: Over-Integration Detection

Scenario 2: Under-Integration Issues

Scenario 3: Platform-Specific Integration

💡 Useful Conversation Tips

Step 8: Best Practices and Common Pitfalls

🎯 Integration Guidelines

Data Preparation Guidelines

Current Implementation

⚠️ Common Problems and Solutions

Pitfall 1: Over-Integration

Pitfall 2: Platform Bias

Pitfall 3: Incomplete Convergence

Pitfall 4: Biological Over-Correction

🔧 Troubleshooting

Issue: “Integration is too slow”

Issue: “Results not reproducible”

Issue: “Integration removes treatment effects”

📊 Integration Quality Metrics

Current Metrics

Future Metrics

Step 9: Results for Publication

💬 Publication Preparation

📋 Publication Requirements

Next Steps and Advanced Applications

🚀 Additional Integration Methods

📚 Additional Learning

💡 Tips for Success

Conclusion