
Contributing Guide
Chen Yang
2025-05-10
Source:vignettes/09-contributing-guide.Rmd
09-contributing-guide.Rmd
Contributing to mLLMCelltype
Thank you for your interest in contributing to mLLMCelltype! This guide will help you understand how to contribute to the project effectively.
Getting Started
Fork and Clone the Repository
- Fork the mLLMCelltype repository on GitHub
- Clone your fork to your local machine:
- Add the original repository as an upstream remote:
Setting Up the Development Environment
For R package development:
# Install required packages for development
install.packages(c("devtools", "roxygen2", "testthat", "knitr", "rmarkdown"))
# Install the package in development mode
devtools::install_dev("R")
Project Structure
The mLLMCelltype project has the following structure:
mLLMCelltype/
├── R/ # R package source code
│ ├── R/ # R functions
│ ├── man/ # Documentation
│ ├── tests/ # Unit tests
│ ├── vignettes/ # Package vignettes
│ └── DESCRIPTION # Package metadata
├── python/ # Python package source code
├── .github/ # GitHub workflows and templates
├── assets/ # Images and other assets
├── examples/ # Example notebooks and scripts
└── README.md # Project overview
Development Workflow
Creating a New Feature
- Create a new branch for your feature:
- Make your changes to the codebase
- Add and commit your changes:
- Push your changes to your fork:
- Create a pull request from your fork to the main repository
Code Style Guidelines
R Code Style
We follow the tidyverse style guide for R code:
- Use snake_case for variable and function names
- Use spaces around operators and after commas
- Use 2 spaces for indentation
- Limit line length to 80 characters
- Use roxygen2 for documentation
Example of properly formatted R code:
#' Annotate Cell Types
#'
#' This function annotates cell types based on marker genes.
#'
#' @param input A data frame containing marker genes.
#' @param tissue_name The name of the tissue.
#' @param model The LLM model to use.
#' @param api_key The API key for the LLM provider.
#'
#' @return A vector of cell type annotations.
#' @export
annotate_cell_types <- function(input, tissue_name, model, api_key) {
# Function implementation
results <- process_markers(input, top_n = 10)
for (i in seq_along(results)) {
if (is_valid_result(results[i])) {
results[i] <- clean_result(results[i])
}
}
return(results)
}
Testing
We use the testthat package for testing. Tests should be placed in
the R/tests/testthat/
directory.
To run tests:
devtools::test()
Example test file (test-annotate_cell_types.R
):
context("Cell type annotation")
test_that("annotate_cell_types returns expected format", {
# Setup test data
test_markers <- data.frame(
cluster = c(0, 0, 1, 1),
gene = c("CD3D", "CD3E", "CD19", "MS4A1"),
avg_log2FC = c(2.5, 2.3, 3.1, 2.8),
p_val_adj = c(0.001, 0.002, 0.001, 0.003)
)
# Mock the API response
mockery::stub(
annotate_cell_types,
"get_model_response",
function(...) c("T cells", "B cells")
)
# Run the function
result <- annotate_cell_types(
input = test_markers,
tissue_name = "test tissue",
model = "test-model",
api_key = "test-key"
)
# Assertions
expect_is(result, "character")
expect_length(result, 2)
expect_equal(result, c("T cells", "B cells"))
})
Contributing Areas
Adding Support for New LLM Models
To add support for a new LLM model:
- Identify the model provider and API endpoint
- Create a new processing function in
R/R/process_[provider].R
- Update the
get_provider()
function inR/R/get_provider.R
- Add the model to the supported models list
- Create tests for the new model
- Update documentation
Example of adding a new model:
# In process_newprovider.R
process_newprovider <- function(prompt, api_key) {
# Implementation for the new provider
url <- "https://api.newprovider.com/v1/completions"
headers <- c(
"Content-Type" = "application/json",
"Authorization" = paste("Bearer", api_key)
)
body <- list(
model = "newprovider-model",
prompt = prompt,
max_tokens = 1000,
temperature = 0.1
)
# Make API request using httr
response <- httr::POST(
url = url,
httr::add_headers(.headers = headers),
body = jsonlite::toJSON(body, auto_unbox = TRUE),
encode = "json"
)
# Check for HTTP errors
httr::stop_for_status(response)
# Parse the response
content <- httr::content(response, "text", encoding = "UTF-8")
parsed_response <- jsonlite::fromJSON(content)
result <- parsed_response$choices[[1]]$text
return(result)
}
# In get_provider.R
get_provider <- function(model) {
# Add to the model mapping
model_mapping <- list(
# Existing models...
"newprovider-model" = "newprovider"
)
provider <- model_mapping[[model]]
if (is.null(provider)) {
stop("Unsupported model: ", model)
}
return(provider)
}
Improving Documentation
Documentation improvements are always welcome:
- Update function documentation with roxygen2
- Improve vignettes with more examples and explanations
- Add tutorials for specific use cases
- Fix typos and clarify existing documentation
Adding New Features
Some ideas for new features:
- Integration with additional single-cell analysis frameworks
- Support for spatial transcriptomics data
- Interactive visualization tools
- Batch processing for large datasets
- Performance optimizations
Reporting Issues
When reporting issues, please include:
- A minimal reproducible example
- The version of mLLMCelltype you’re using
- The error message or unexpected behavior
- Your R session information (
sessionInfo()
)
Pull Request Process
- Ensure your code follows the style guidelines
- Add or update tests as necessary
- Update documentation to reflect your changes
- Ensure all tests pass
- Submit your pull request with a clear description of the changes
Code Review Process
All pull requests will be reviewed by the maintainers. The review process includes:
- Checking that the code follows style guidelines
- Verifying that tests pass
- Ensuring documentation is updated
- Evaluating the overall design and implementation
Release Process
mLLMCelltype follows semantic versioning (MAJOR.MINOR.PATCH):
- MAJOR version for incompatible API changes
- MINOR version for new functionality in a backward-compatible manner
- PATCH version for backward-compatible bug fixes
Community Guidelines
License
By contributing to mLLMCelltype, you agree that your contributions will be licensed under the same license as the project (MIT License).
Next Steps
Now that you know how to contribute to mLLMCelltype, you can:
- Review the version history to understand recent changes
- Explore advanced features to identify areas for improvement
- Check the FAQ to see common questions that might need better documentation