Batch Effect Correction and Data Integration with ChatSpatial
Learn to correct batch effects in multiple spatial transcriptomics datasets using Harmony integration through conversation with ChatSpatial.
What You’ll Learn
This tutorial covers:
- Identify and assess batch effects in multi-sample datasets
- Apply Harmony integration using conversational commands
- Understand current parameter limitations (theta, lambda, sigma)
- Handle multi-modal data integration across different platforms
- Validate integration quality with available metrics
- Troubleshoot common integration issues through dialogue
- Apply best practices for analysis results
Before You Start
What You Need
- ✅ Multiple spatial datasets (different batches/samples/conditions)
- ✅ ChatSpatial configured with Harmony integration capabilities
- ✅ Basic understanding of your experimental design
- ✅ Interest in data integration
No Integration Experience Required
This tutorial teaches researchers to combine multiple spatial transcriptomics datasets. ChatSpatial handles technical details while you focus on the biology.
Understanding Batch Effects
What Are Batch Effects?
Batch effects are systematic differences between datasets that arise from technical factors rather than biological differences. In spatial transcriptomics, these can come from:
- Technical factors: Different sequencing runs, library prep dates, operators
- Platform differences: 10x Visium vs Slide-seq vs MERFISH
- Sample processing: Tissue preparation, storage time, fixation methods
- Environmental factors: Temperature, reagent lots, instrument drift
Step 1: Loading and Exploring Multiple Datasets
💬 Starting Integration
👤 You: “Hi! I have three mouse brain Visium samples from different experimental batches. I’m worried about batch effects and want to integrate them properly. Can you help?”
🤖 ChatSpatial: “Yes! Batch integration is important for multi-sample spatial analysis. Let me help you load and examine your datasets. What are the file paths for your three samples?”
👤 You: “The files are:
/data/batch1_mouse_brain.h5ad(control group)/data/batch2_mouse_brain.h5ad(treatment group)/data/batch3_mouse_brain.h5ad(replicate samples)`”
🤖 ChatSpatial: “Perfect! Let me load all three datasets and examine them for batch effects.”
[ChatSpatial automatically loads and analyzes all datasets]
🤖 ChatSpatial: “I’ve loaded your three mouse brain datasets! Here’s what I found:
Dataset Overview:
- Batch 1: 2,698 spots, 31,053 genes (sequenced Jan 2024)
- Batch 2: 2,845 spots, 30,987 genes (sequenced Feb 2024)
- Batch 3: 2,752 spots, 31,124 genes (sequenced Mar 2024)
Initial Batch Effect Assessment:
- Gene detection rates vary significantly between batches (18.2k vs 21.3k vs 19.8k)
- Total UMI counts show batch-specific patterns
- Mitochondrial gene percentages differ across batches
- Library complexity varies between sequencing runs
⚠️ Strong batch effects detected! Integration is recommended. Should I proceed with Harmony integration?”
🎯 What ChatSpatial Detected
Quality Metrics Comparison:
- ✅ Gene count distributions - identified systematic shifts
- ✅ UMI count patterns - found batch-specific biases
- ✅ Quality score differences - detected technical variation
- ✅ Spatial coverage - assessed tissue representation consistency
Step 2: Batch Effect Assessment and Visualization
💬 Understanding the Problem
👤 You: “Yes, please show me what these batch effects look like before we integrate the data.”
🤖 ChatSpatial: “Great idea! Let me create visualizations to show you the batch effects.”
[ChatSpatial generates comprehensive batch effect visualizations]
🤖 ChatSpatial: “Here are your batch effect visualizations:
UMAP Before Integration: [Shows three distinct clusters corresponding to batches rather than biology]
Key Observations:
- Samples cluster by batch, not by biological similarity
- Batch 2 shows the strongest separation (likely sequencing depth differences)
- Cell types are scattered across batch-specific clusters
- Spatial patterns are obscured by technical variation
Quantitative Assessment:
- Silhouette score by batch: 0.73 (high = strong batch effects)
- Visual separation: Clear batch clustering observable in UMAP
- Technical variation: Detectable across samples in gene expression patterns
Without integration:
- ❌ Missing shared spatial domains across samples
- ❌ Incorrect batch-specific ‘cell types’
- ❌ Biased differential expression results
- ❌ Wrong conclusions about treatment effects
Ready to fix this with Harmony integration?”
🔍 Batch Effect Indicators
Visual Clues:
- ✅ Samples separate by batch in UMAP/tSNE
- ✅ Distinct clustering patterns per dataset
- ✅ Different gene expression distributions
- ✅ Inconsistent spatial domain identification
Step 3: Basic Harmony Integration
💬 Your First Integration
👤 You: “This is exactly what I was worried about! Yes, let’s integrate these datasets with Harmony. Can you use the default settings first?”
🤖 ChatSpatial: “Yes! I’ll start with Harmony’s default parameters, which work well for most spatial transcriptomics datasets. Let me run the integration now.”
[ChatSpatial performs Harmony integration with default parameters]
🤖 ChatSpatial: “Integration complete! Here’s what Harmony achieved:
Integration Results:
- Harmony iterations: 10 (converged successfully)
- Integration time: 45 seconds
- Memory usage: 2.1 GB peak
Default Parameters Used:
- theta (batch correction strength): 2.0
- lambda (regularization): 1.0
- sigma (clustering width): 0.1
- max_iter_harmony: 10
Quality Improvement:
- Silhouette score by batch: 0.73 → 0.28 (batch effects reduced)
- Visual mixing: Significant improvement observable in UMAP
- Integration convergence: Successful completion in 10 iterations
Visual Results: [New UMAP shows much better mixing of batches]
The integration looks good! Samples now mix well in the embedding space, and I can see shared biological structures across batches. Would you like to see the integrated spatial maps or work with additional parameters?”
👤 You: “This looks much better! Can you show me how the spatial domains look after integration?”
🤖 ChatSpatial: “Yes! Let me analyze the spatial domains in your integrated data.”
[ChatSpatial performs spatial domain analysis on integrated data]
🤖 ChatSpatial: “Here are your integrated spatial domains:
Integrated Spatial Analysis:
- 7 shared spatial domains identified across all samples
- Consistent cortical layering now visible across batches
- Hippocampal structures properly aligned between samples
- White matter regions show clear consistency
Cross-Sample Consistency:
- Domain 0 (Outer cortex): Present in all 3 samples
- Domain 1 (Inner cortex): Consistent spatial organization
- Domain 2 (Hippocampus): Properly aligned across batches
- Domain 3 (White matter): Clear structural consistency
- Domain 4 (Ventricular zone): Shared across samples
- Domain 5 (Corpus callosum): Anatomically consistent
- Domain 6 (Boundary regions): Present in all samples
Biological Validation:
✅ Anatomical structures align across samples ✅ Treatment effects can be assessed without batch confounding ✅ Spatial patterns follow known brain anatomy ✅ Cell type distributions are biologically consistent”
🎯 Integration Quality Indicators
Current Quality Metrics:
- ✅ Silhouette score by batch decreased significantly
- ✅ Visual mixing improved in UMAP embedding
- ✅ Spatial coherence maintained across samples
- ✅ Integration convergence achieved successfully
Step 4: Understanding Current Implementation Limitations
💬 Current Implementation Status
👤 You: “The integration looks good, but I noticed some subtle clustering by batch still remains. Can we fine-tune the parameters to get better integration?”
🤖 ChatSpatial: “I understand your concern! Let me be completely honest about what’s currently available:
Current Implementation Reality:
- Parameter tuning: Not yet implemented
- theta and lambda parameters: Not configurable in current version
- sigma: Hardcoded to 0.1 (cannot be adjusted)
- Advanced quality metrics: kBET and LISI scores not yet available
What Currently Works:
- Basic Harmony integration with standard defaults
- Silhouette score calculation for batch effect assessment
- Visual assessment through UMAP plots
- Convergence monitoring to ensure integration completes
Honest Assessment:
The current implementation applies Harmony with well-tested default parameters that work reasonably well for most spatial transcriptomics datasets. While parameter tuning isn’t available yet, the default integration often provides adequate results for downstream analysis.
Current Options:
- Assess whether current integration quality meets your analysis needs
- Focus on ensuring good data quality before integration
- Use visual inspection to validate biological preservation
Future Development:
Parameter tuning and advanced quality metrics are planned for future releases.”
🎯 Current Implementation Status
What’s Available Now:
- ✅ Basic Harmony integration with optimized defaults
- ✅ Silhouette score validation
- ✅ Visual assessment through UMAP plots
- ✅ Convergence monitoring
What’s Not Yet Available:
- ❌ Manual parameter tuning (theta, lambda)
- ❌ Advanced quality metrics (kBET, LISI)
- ❌ Parameter optimization workflows
- ❌ Platform-specific parameter sets
Step 5: Multi-Modal Integration Strategies
💬 Handling Different Platforms
👤 You: “This is excellent! But I also have some MERFISH data from similar brain regions. Can Harmony integrate across different spatial transcriptomics platforms?”
🤖 ChatSpatial: “Good question! Multi-platform integration is more challenging but possible. Let me explain the considerations and approach:
Multi-Platform Integration Challenges:
- Different gene sets: Visium (~30k genes) vs MERFISH (~300 genes)
- Different spatial resolutions: Visium (55μm spots) vs MERFISH (subcellular)
- Different data distributions: UMI counts vs molecule counts
- Different noise characteristics: Platform-specific technical artifacts
Multi-Modal Integration Steps:
- Gene intersection: Use only shared genes for integration
- Resolution matching: Aggregate MERFISH to spot-level if needed
- Normalization: Apply platform-specific preprocessing
- Parameters: Use stronger regularization for platform differences
Let me load your MERFISH data and see what we’re working with.”
👤 You: “The MERFISH data is at /data/merfish_mouse_brain.h5ad. It has 248 genes and single-cell resolution.”
🤖 ChatSpatial: “Good! Let me analyze your MERFISH dataset and prepare for multi-platform integration.”
[ChatSpatial loads and analyzes MERFISH data]
🤖 ChatSpatial: “Here’s your MERFISH dataset analysis:
MERFISH Dataset:
- 45,234 cells (single-cell resolution)
- 248 genes (targeted gene panel)
- Spatial resolution: ~0.1μm precision
- Coverage area: 2.5mm × 2.5mm brain region
Integration Steps for Your Data:
- Gene matching: 186 genes overlap between Visium and MERFISH
- Spatial aggregation: Group MERFISH cells into 55μm pseudo-spots
- Resolution matching: Create matched spatial grids
- Parameters: Use theta=4.0, lambda=1.5 for platform differences
Multi-Platform Integration Parameters:
- theta: 4.0 (stronger correction for platform effects)
- lambda: 1.5 (increased regularization)
- sigma: 0.08 (tighter clustering)
- platform_weight: 0.3 (balance platform vs batch effects)
Ready to integrate across platforms? This will create a unified view of your brain tissue across technologies.”
👤 You: “Yes, let’s do the multi-platform integration!”
🤖 ChatSpatial: “Good! Let me perform the multi-platform integration now.”
[ChatSpatial runs enhanced multi-platform Harmony integration]
🤖 ChatSpatial: “Multi-platform integration successful!
Integration Results:
- Total samples integrated: 4 (3 Visium + 1 MERFISH)
- Shared genes used: 186 genes
- Integration convergence: 15 iterations
- Platform mixing achieved: Excellent
Quality Assessment:
- Visual integration: Good mixing observed in UMAP embedding
- Silhouette improvement: Batch effects reduced across platforms
- Integration convergence: Successfully completed in 15 iterations
- Spatial coherence: Spatial patterns preserved across technologies
Key Achievements:
✅ Visium and MERFISH data now share common embedding space ✅ Spatial domains consistent across platforms ✅ Cell types properly aligned between technologies ✅ Biological patterns preserved while removing technical differences
Cross-Platform Results:
- Cortical layering: Consistent across Visium and MERFISH
- Cell type markers: Aligned between platforms
- Spatial gradients: Maintained across technologies
- Treatment effects: Assessable across all samples
This is functional multi-platform integration! Would you like to see comparative analyses or export the integrated data?”
🔬 Multi-Modal Guidelines
Platform Guidelines:
- ✅ Gene intersection: Focus on shared measurements
- ✅ Resolution matching: Harmonize spatial scales
- ✅ Normalization: Apply platform-appropriate preprocessing
- ✅ Validation: Check biological consistency across platforms
Step 6: Quality Control and Validation
💬 Integration Quality Assessment
👤 You: “This multi-platform integration looks amazing! But how can I be sure it’s working correctly and not removing important biological differences?”
🤖 ChatSpatial: “Good question! Proper validation is important for integration quality. Let me run available quality control checks on your integrated data.”
[ChatSpatial performs extensive integration validation]
🤖 ChatSpatial: “Here’s your integration quality report:
1. Available Integration Metrics:
Basic Quality Assessment:
- Silhouette score improvement: Batch effects reduced significantly
- Visual mixing assessment: Observable improvement in UMAP plots
- Integration convergence: Successfully completed integration process
- Spatial preservation: Maintained spatial patterns through integration
Current Limitations:
- Advanced mixing metrics (kBET, LISI, mixing entropy): Not yet implemented
- Quantitative mixing scores: Under development
- Cross-platform correlation: Basic assessment only
2. Biological Preservation Assessment:
Cell Type Consistency:
- Marker gene preservation: 94% of known markers maintained
- Cell type purity: 0.87 (>0.8 = well-separated cell types)
- Cross-sample correlation: 0.82 (consistent biology)
Spatial Structure Validation:
- Spatial autocorrelation: 0.89 (spatial patterns preserved)
- Domain coherence: 0.91 (spatial domains maintained)
- Distance preservation: 0.84 (spatial relationships intact)
3. Known Biology Checks:
Anatomical Consistency: ✅ Cortical layers: Properly aligned across all samples ✅ Hippocampal structures: Consistent spatial organization ✅ White matter: Clear delineation maintained ✅ Vascular patterns: Preserved across platforms
Marker Gene Validation: ✅ Neuronal markers (Rbfox3, Tubb3): Consistent expression patterns ✅ Glial markers (Gfap, Olig2): Proper spatial distribution ✅ Regional markers (Emx1, Dlx2): Maintained spatial gradients
4. Potential Issues Detected:
⚠️ Minor over-correction in hippocampal CA1 region (LISI = 3.1, slightly high) ✅ No significant biology loss detected ✅ No inappropriate cell type merging ✅ Spatial patterns well preserved
5. Integration Assessment: Functional ✅
Recommendation: Your integration shows improvement and is ready for downstream analysis, with the understanding that advanced quality metrics are still under development.”
👤 You: “That’s reassuring! The integration seems to be working well for my analysis needs.”
🤖 ChatSpatial: “Good! You’re taking the right approach by focusing on whether the integration meets your research objectives.
Key Points to Remember:
- Visual inspection is your primary validation tool right now
- Silhouette score improvement indicates batch effect reduction
- Spatial pattern preservation can be assessed through domain analysis
- Biological markers should still show expected expression patterns
Moving Forward: Your integration is functional and ready for downstream analysis. The current implementation provides a working foundation for:
- Spatial domain identification across samples
- Cell type analysis without batch confounding
- Treatment effect assessment
- Comparative studies across datasets
Future Validation Tools: As advanced quality metrics become available, you will be able to perform more detailed validation, but the current integration should serve your research needs well.”
🔍 Validation Steps
Current Technical Metrics:
- ✅ Silhouette score reduction: Batch effects decreased
- ✅ Visual mixing improvement: Observable in UMAP plots
- ✅ Integration convergence: Process completed successfully
Biological Validation:
- ✅ Marker genes preserved: Known biology maintained through visual inspection
- ✅ Spatial coherence: Spatial patterns intact
- ✅ Cell type separation: Distinct cell types maintained in analysis
Step 7: Real Conversation Examples
💬 Integration Examples
Scenario 1: Over-Integration Detection
👤 You: “I integrated my data with theta=5.0 but now my cell types look merged together. What happened?”
🤖 ChatSpatial: “It sounds like you’ve over-integrated your data! High theta values can be too aggressive and merge distinct biological populations. Let me check your integration:
Over-Integration Signs:
- Cell type markers: Neuronal and glial markers now co-expressed
- Spatial domains: Distinct regions incorrectly merged
- Biological variance: Reduced below expected levels
- Known differences: Treatment effects may be masked
Solution: Reduce theta to 2.0-3.0 and re-integrate. This will preserve biological differences while removing batch effects.”
Scenario 2: Under-Integration Issues
👤 You: “My samples still cluster by batch after integration. Is Harmony not working?”
🤖 ChatSpatial: “This suggests the integration could be improved. Let me be honest about what’s possible:
Potential Causes:
- Strong batch effects: May require multiple integration attempts
- Data quality differences: Some samples may need better preprocessing
- Platform differences: Different technologies can be challenging
- Insufficient preprocessing: Data may need more thorough QC
Possible Solutions:
- Re-run integration: Sometimes a fresh run helps
- Check data quality: Ensure consistent preprocessing across samples
- Examine gene filtering: Use more stringent gene selection
- Assess adequacy: Current integration might still be usable
Current Limitation: Parameter tuning is not available yet, so we use default settings that handle most cases adequately.”
Scenario 3: Platform-Specific Integration
👤 You: “I’m trying to integrate Slide-seq and Visium data but getting poor results. Any specific advice?”
🤖 ChatSpatial: “Multi-platform integration is challenging! Let me be honest about the current capabilities:
Slide-seq vs Visium Challenges:
- Resolution differences: Slide-seq (~10μm) vs Visium (~55μm)
- Sensitivity differences: Different gene detection rates
- Coverage differences: Different spatial sampling patterns
Current Integration Approach:
- Gene intersection: Use genes detected in both platforms
- Basic normalization: Apply standard preprocessing to both
- Standard Harmony: Use default parameters (no platform-specific tuning available)
- Quality assessment: Use visual inspection and silhouette scores
Current Expectations: Multi-platform integration with current tools may show partial success. Focus on whether the result is adequate for your analysis needs rather than perfect mixing.”
💡 Useful Conversation Tips
Useful Questions:
- ✅ “Can you check if my integration preserved known biology?”
- ✅ “Is the integration quality adequate for my analysis goals?”
- ✅ “How can I validate this integration with available metrics?”
- ✅ “Are there any obvious problems with my integration results?”
Current Expectations:
- ✅ “Work with current capabilities and available metrics”
- ✅ “Focus on whether integration serves your research needs”
- ✅ “Understand that some batch structure may remain”
- ✅ “Use visual inspection as primary validation method”
Step 8: Best Practices and Common Pitfalls
🎯 Integration Guidelines
Data Preparation Guidelines
Before Integration:
- Quality control each dataset individually before integration
- Use consistent preprocessing across all datasets
- Document batch information clearly in metadata
- Identify shared genes and filter appropriately
Current Implementation
Currently Available:
# Standard Harmony integration with hardcoded parameters
sigma = 0.1 # Fixed value (not adjustable)
max_iter = 10 # Default convergence setting
# Other parameters use Harmony library defaults
Current Status:
- No manual parameter tuning is currently available
- Standard defaults work for most spatial transcriptomics datasets
- Focus on data quality rather than parameter optimization
- Future versions will include parameter tuning capabilities
Best Practice with Current Tools: Ensure good data preprocessing since parameter tuning is not available yet.
⚠️ Common Problems and Solutions
Pitfall 1: Over-Integration
Warning Signs:
- Cell types inappropriately merge
- Known biological differences disappear
- Marker genes lose specificity
- Spatial patterns become homogeneous
Current Options:
- Assess if integration is adequate for your analysis goals
- Validate with known marker genes through visual inspection
- Check spatial patterns in domain analysis
- Re-run integration if results seem problematic
Pitfall 2: Platform Bias
Warning Signs:
- One platform dominates the integration
- Platform-specific patterns are lost
- Unbalanced representation in embedding
- Technical features become biological features
Current Options:
- Balance sample sizes across platforms when possible
- Apply consistent normalization to all datasets
- Ensure robust quality control before integration
- Focus on shared, high-quality genes
Pitfall 3: Incomplete Convergence
Warning Signs:
- Integration doesn’t complete successfully
- Repeated runs give different results
- Unexpected patterns in final embedding
Current Options:
- Check data quality - ensure consistent preprocessing
- Verify sufficient shared genes between datasets
- Re-run integration if convergence fails
- Examine input data for systematic differences
Pitfall 4: Biological Over-Correction
Warning Signs:
- Treatment effects disappear
- Known disease signatures are lost
- Cell type diversity decreases artificially
- Spatial gradients become flattened
Current Options:
- Assess if biological patterns are preserved through visual inspection
- Validate with known biological markers
- Consider if current integration meets research needs
- Focus on ensuring good preprocessing before integration
🔧 Troubleshooting
Issue: “Integration is too slow”
💬 Conversation Example: 👤 You: “My Harmony integration is taking forever. Is there a way to speed it up?”
🤖 ChatSpatial: “Let me help with integration speed issues:
Speed Optimization Options:
- Reduce gene set: Use top 2000-3000 variable genes only
- Check data size: Very large datasets may take longer
- Ensure adequate memory: RAM limitations can slow processing
- Monitor convergence: Integration should complete within reasonable time
Current Status:
With default parameters, integration speed depends mainly on data size and quality. The process typically completes within minutes for standard datasets.”
Issue: “Results not reproducible”
💬 Conversation Example: 👤 You: “I get different results each time I run Harmony. How do I make it reproducible?”
🤖 ChatSpatial: “Reproducibility is important! Here’s what affects consistency:
Factors Affecting Reproducibility:
- Random seed: Integration uses random initialization
- Data preprocessing: Must be identical across runs
- Software versions: Environment should be consistent
- Data ordering: Input order can affect results
Current Approach:
The integration uses standard random seeding. For reproducibility, ensure identical preprocessing and consistent computational environment.”
Issue: “Integration removes treatment effects”
💬 Conversation Example: 👤 You: “After integration, I can’t see my treatment effects anymore. Did Harmony remove them?”
🤖 ChatSpatial: “This is an important concern! Let me address what’s possible:
Understanding the Issue:
- Batch-treatment confounding: If treatments align with batches, integration may affect both
- Current limitations: No parameter tuning available to address this specifically
- Assessment needed: Visual inspection of treatment-specific markers
Current Approach:
- Examine integration results: Check if treatment markers are preserved
- Visual validation: Look for treatment effects in spatial patterns
- Consider adequacy: Assess if results still show biological differences
- Document limitations: Be transparent about integration effects on treatment signals
Current Assessment:
Current implementation may not preserve all treatment effects if they are confounded with batch variables.”
📊 Integration Quality Metrics
Current Metrics
| Metric | Range | Good Value | Interpretation |
|---|---|---|---|
| Silhouette (batch) | -1 to 1 | <0.3 | Batch effect reduction |
| Visual mixing | Qualitative | Improved | Observable in UMAP plots |
| Integration convergence | Boolean | True | Process completed successfully |
| Spatial coherence | Qualitative | Preserved | Visual assessment of spatial patterns |
Future Metrics
| Metric | Status | Notes |
|---|---|---|
| kBET acceptance | In development | Quantitative batch mixing assessment |
| LISI score | In development | Integration completeness measure |
| Mixing entropy | Planned | Local neighborhood mixing analysis |
| Platform LISI | Planned | Cross-platform integration quality |
Step 9: Results for Publication
💬 Publication Preparation
👤 You: “My integration looks great! Now I need to create publication-quality figures and document the methods. Can you help?”
🤖 ChatSpatial: “Yes! Let me help you create publication materials for your integration analysis.
Publication Materials I’ll Generate:
1. Methods Text:
Batch effect correction and data integration used Harmony
(Korsunsky et al., 2019) through ChatSpatial. Multiple spatial
transcriptomics datasets were integrated using standard Harmony parameters
with sigma=0.1 and max_iter=10. Integration quality was assessed using
silhouette score analysis and visual inspection of UMAP embeddings.
Spatial patterns were maintained while achieving batch mixing.
2. Quality Control Figures:
- Before/after UMAP plots showing batch correction
- Integration quality metrics visualization
- Spatial domain consistency across samples
- Marker gene preservation validation plots
3. Supplementary Materials:
- Parameter optimization results
- Cross-platform validation metrics
- Technical replication analysis
- Method comparison benchmarks
4. Statistical Reporting:
- Integration convergence statistics
- Batch effect quantification
- Biological preservation metrics
- Cross-validation results
Would you like me to generate specific figures or focus on particular aspects?”
👤 You: “Yes, please create the main integration figure showing before/after results and quality metrics.”
🤖 ChatSpatial: “Good! I’ll create a complete integration figure.”
[ChatSpatial generates publication-quality multi-panel figure]
🤖 ChatSpatial: “Here’s your integration figure:
Figure Components:
Panel A: UMAP before integration
- Clear batch separation visible
- Samples cluster by technical factors
- Note: “Strong batch effects (kBET=0.12)”
Panel B: UMAP after Harmony integration
- Excellent sample mixing
- Biological structure preserved
- Note: “Successful integration (kBET=0.91)”
Panel C: Integration quality metrics
- Bar plot showing all QC metrics
- Before/after comparisons
- Statistical significance indicators
Panel D: Spatial domain consistency
- Side-by-side spatial maps
- Shared domains across samples
- Color-coded anatomical regions
Panel E: Marker gene validation
- Heatmap of key marker genes
- Expression preserved post-integration
- Cell type-specific patterns maintained
Figure Caption (Draft): ‘Multi-sample spatial transcriptomics integration using Harmony. (A) UMAP before integration shows strong batch effects with samples clustering by technical factors rather than biological similarity. (B) UMAP after Harmony integration shows good batch mixing while preserving biological structure. (C) Integration quality metrics show improvement across all measures. (D) Spatial domain analysis shows consistent anatomical organization across integrated samples. (E) Marker gene expression validation confirms preservation of cell type-specific signatures after integration. Integration parameters: θ=3.0, λ=1.0, σ=0.05.’
Figure Statistics:
- Resolution: 300 DPI (publication-ready)
- Format: PDF with vector graphics
- Color scheme: Colorblind-friendly
- Size: Optimized for journal specifications
This figure demonstrates successful integration with scientific accuracy.”
📋 Publication Requirements
Methods Documentation:
- ✅ Parameter choices justified
- ✅ Quality metrics reported
- ✅ Validation methods described
- ✅ Software versions documented
Results Presentation:
- ✅ Before/after visualizations
- ✅ Quantitative quality metrics
- ✅ Biological validation
- ✅ Statistical significance testing
Reproducibility:
- ✅ Code availability
- ✅ Parameter documentation
- ✅ Data accessibility
- ✅ Environment specifications
Next Steps and Advanced Applications
🚀 Additional Integration Methods
Want to Learn More? Try these additional integrations:
Additional Integration Questions:
- **“Can you integrate temporal samples to study development?””
- “How do I integrate data with different gene panels?”
- “Can you perform conditional integration preserving treatment effects?”
- “How do I integrate single-cell and spatial data together?”
Multi-Modal Examples:
- **“Integrate spatial transcriptomics with imaging data”
- “Combine proteomics and transcriptomics spatial data”
- “Harmonize different spatial resolution datasets”
- “Create cross-species integration workflows”
📚 Additional Learning
Related Tutorials:
- Basic Spatial Analysis - Foundation concepts
- Cell Communication Analysis - Post-integration analysis
- Visualization Tutorial - Create beautiful plots
Additional Topics:
- Conditional Integration: Preserve specific biological factors
- Cross-Species Integration: Harmonize data across organisms
- Temporal Integration: Handle time-series spatial data
- Multi-Platform Benchmarking: Compare integration methods
💡 Tips for Success
Integration Approach:
- Start conservative: Under-integration is better than over-integration
- Validate thoroughly: Biology should make sense
- Document methods: Reproducibility is important
- Test parameters: What works for others may not work for your data (when available)
Quality First:
- Take time to check integration quality
- Validate with known biological markers
- Test multiple parameter combinations (when available)
- Always check spatial coherence preservation
Communication Tips:
- Ask ChatSpatial specific questions about concerns
- Request validation of integration decisions
- Seek help interpreting quality metrics
- Get guidance on parameter optimization
Conclusion
You have learned about batch effect correction and data integration using ChatSpatial’s current Harmony implementation. You can now:
✅ Identify and assess batch effects in multi-sample datasets ✅ Apply basic Harmony integration with standard defaults
✅ Understand current implementation limitations ✅ Validate integration quality with available metrics ✅ Use visual inspection for integration assessment ✅ Create functional results for downstream analysis
Note: Current integration uses well-tested defaults that work for most datasets. Focus on data quality and visual validation since parameter tuning is not yet available.
Key Point: The current implementation has limitations but provides a working foundation for batch integration. Future versions will include additional optimization tools.
Good luck with your integration analysis.