Contributing Guide

Contributing to mLLMCelltype

Thank you for your interest in contributing to mLLMCelltype! This guide will help you understand how to contribute to the project effectively.

Getting Started

Fork and Clone the Repository

Fork the mLLMCelltype repository on GitHub
Clone your fork to your local machine:

git clone https://github.com/YOUR-USERNAME/mLLMCelltype.git
cd mLLMCelltype

Add the original repository as an upstream remote:

git remote add upstream https://github.com/cafferychen777/mLLMCelltype.git

Setting Up the Development Environment

For R package development:

# Install required packages for development
install.packages(c("devtools", "roxygen2", "testthat", "knitr", "rmarkdown"))

# Install the package in development mode
devtools::install_dev("R")

Project Structure

The mLLMCelltype project has the following structure:

mLLMCelltype/
├── R/                  # R package source code
│   ├── R/              # R functions
│   ├── man/            # Documentation
│   ├── tests/          # Unit tests
│   ├── vignettes/      # Package vignettes
│   └── DESCRIPTION     # Package metadata
├── python/             # Python package source code
├── .github/            # GitHub workflows and templates
├── assets/             # Images and other assets
├── examples/           # Example notebooks and scripts
└── README.md           # Project overview

Development Workflow

Creating a New Feature

Create a new branch for your feature:

git checkout -b feature/your-feature-name

Make your changes to the codebase
Add and commit your changes:

git add .
git commit -m "Add your descriptive commit message here"

Push your changes to your fork:

git push origin feature/your-feature-name

Create a pull request from your fork to the main repository

Code Style Guidelines

R Code Style

We follow the tidyverse style guide for R code:

Use snake_case for variable and function names
Use spaces around operators and after commas
Use 2 spaces for indentation
Limit line length to 80 characters
Use roxygen2 for documentation

Example of properly formatted R code:

#' Annotate Cell Types
#'
#' This function annotates cell types based on marker genes.
#'
#' @param input A data frame containing marker genes.
#' @param tissue_name The name of the tissue.
#' @param model The LLM model to use.
#' @param api_key The API key for the LLM provider.
#'
#' @return A vector of cell type annotations.
#' @export
annotate_cell_types <- function(input, tissue_name, model, api_key) {
  # Function implementation
  results <- process_markers(input, top_n = 10)

  for (i in seq_along(results)) {
    if (is_valid_result(results[i])) {
      results[i] <- clean_result(results[i])
    }
  }

  return(results)
}

Documentation Guidelines

All functions should be documented using roxygen2 with the following sections:

Title (first line)
Description (paragraph after title)
@param for each parameter
@return for the return value
@examples for usage examples
@export if the function should be exported

Testing

We use the testthat package for testing. Tests should be placed in the R/tests/testthat/ directory.

To run tests:

devtools::test()

Example test file (test-annotate_cell_types.R):

context("Cell type annotation")

test_that("annotate_cell_types returns expected format", {
  # Setup test data
  test_markers <- data.frame(
    cluster = c(0, 0, 1, 1),
    gene = c("CD3D", "CD3E", "CD19", "MS4A1"),
    avg_log2FC = c(2.5, 2.3, 3.1, 2.8),
    p_val_adj = c(0.001, 0.002, 0.001, 0.003)
  )

  # Mock the API response
  mockery::stub(
    annotate_cell_types,
    "get_model_response",
    function(...) c("T cells", "B cells")
  )

  # Run the function
  result <- annotate_cell_types(
    input = test_markers,
    tissue_name = "test tissue",
    model = "test-model",
    api_key = "test-key"
  )

  # Assertions
  expect_is(result, "character")
  expect_length(result, 2)
  expect_equal(result, c("T cells", "B cells"))
})

Contributing Areas

Adding Support for New LLM Models

To add support for a new LLM model:

Identify the model provider and API endpoint
Create a new processing function in R/R/process_[provider].R
Update the get_provider() function in R/R/get_provider.R
Add the model to the supported models list
Create tests for the new model
Update documentation

Example of adding a new model:

# In process_newprovider.R
process_newprovider <- function(prompt, api_key) {
  # Implementation for the new provider
  url <- "https://api.newprovider.com/v1/completions"

  headers <- c(
    "Content-Type" = "application/json",
    "Authorization" = paste("Bearer", api_key)
  )

  body <- list(
    model = "newprovider-model",
    prompt = prompt,
    max_tokens = 1000,
    temperature = 0.1
  )

  # Make API request using httr
  response <- httr::POST(
    url = url,
    httr::add_headers(.headers = headers),
    body = jsonlite::toJSON(body, auto_unbox = TRUE),
    encode = "json"
  )

  # Check for HTTP errors
  httr::stop_for_status(response)

  # Parse the response
  content <- httr::content(response, "text", encoding = "UTF-8")
  parsed_response <- jsonlite::fromJSON(content)
  result <- parsed_response$choices[[1]]$text

  return(result)
}

# In get_provider.R
get_provider <- function(model) {
  # Add to the model mapping
  model_mapping <- list(
    # Existing models...
    "newprovider-model" = "newprovider"
  )

  provider <- model_mapping[[model]]
  if (is.null(provider)) {
    stop("Unsupported model: ", model)
  }

  return(provider)
}

Improving Documentation

Documentation improvements are always welcome:

Update function documentation with roxygen2
Improve vignettes with more examples and explanations
Add tutorials for specific use cases
Fix typos and clarify existing documentation

Adding New Features

Some ideas for new features:

Integration with additional single-cell analysis frameworks
Support for spatial transcriptomics data
Interactive visualization tools
Batch processing for large datasets
Performance optimizations

Reporting Issues

When reporting issues, please include:

A minimal reproducible example
The version of mLLMCelltype you’re using
The error message or unexpected behavior
Your R session information (sessionInfo())

Pull Request Process

Ensure your code follows the style guidelines
Add or update tests as necessary
Update documentation to reflect your changes
Ensure all tests pass
Submit your pull request with a clear description of the changes

Code Review Process

All pull requests will be reviewed by the maintainers. The review process includes:

Checking that the code follows style guidelines
Verifying that tests pass
Ensuring documentation is updated
Evaluating the overall design and implementation

Release Process

mLLMCelltype follows semantic versioning (MAJOR.MINOR.PATCH):

MAJOR version for incompatible API changes
MINOR version for new functionality in a backward-compatible manner
PATCH version for backward-compatible bug fixes

Community Guidelines

Code of Conduct

We follow a code of conduct to ensure a welcoming and inclusive community:

Be respectful and inclusive
Be collaborative
Be open to feedback
Focus on the best solution for the community

Communication Channels

GitHub Issues: For bug reports, feature requests, and discussions
GitHub Discussions: For general questions and community discussions
Pull Requests: For code contributions

Acknowledgment

Contributors will be acknowledged in the package documentation and README.

License

By contributing to mLLMCelltype, you agree that your contributions will be licensed under the same license as the project (MIT License).

Next Steps

Now that you know how to contribute to mLLMCelltype, you can:

Review the version history to understand recent changes
Explore advanced features to identify areas for improvement
Check the FAQ to see common questions that might need better documentation

Chen Yang

2025-06-14