Skip to content

Latest commit

 

History

History
257 lines (186 loc) · 6.04 KB

File metadata and controls

257 lines (186 loc) · 6.04 KB

Contributing to ChatSEEK

Thank you for your interest in contributing to ChatSEEK! This document provides guidelines for contributing to the project.

Getting Started

Prerequisites

  • Python 3.8 or higher
  • Neo4j database (5.18.1+)
  • Git

Development Setup

  1. Clone the repository:

    git clone https://github.com/yourorg/chatseek.git
    cd chatseek
  2. Create a virtual environment:

    python -m venv .venv
    source .venv/bin/activate  # On Windows: .venv\Scripts\activate
  3. Install development dependencies:

    pip install -e ".[dev]"
  4. Set up environment variables:

    cp .env.example .env
    # Edit .env with your Neo4j and API credentials
  5. Run tests to verify setup:

    pytest tests/ -v

Development Workflow

1. Create a Feature Branch

git checkout -b feature/your-feature-name

2. Make Your Changes

  • Write clear, documented code
  • Follow existing code style (Black formatter, line length 100)
  • Add docstrings to all public functions and classes
  • Update relevant documentation

3. Run Tests

# Run all tests
pytest tests/ -v

# Run with coverage
pytest tests/ --cov=chatseek --cov-report=html

# Run specific test file
pytest tests/unit/test_query_engine.py -v

4. Format Code

# Format with Black
black chatseek/ tests/

# Check with Ruff
ruff check chatseek/ tests/

5. Commit Your Changes

Use clear, descriptive commit messages:

git add .
git commit -m "feat: Add support for new GEO template type"

Commit message format:

  • feat: - New feature
  • fix: - Bug fix
  • docs: - Documentation changes
  • test: - Test additions or modifications
  • refactor: - Code refactoring
  • chore: - Maintenance tasks

6. Push and Create Pull Request

git push origin feature/your-feature-name

Then create a pull request on GitHub with:

  • Clear description of changes
  • Reference to any related issues
  • Screenshots/examples if applicable

Code Style Guidelines

Python Code

  • Follow PEP 8 style guide
  • Use Black formatter (line length: 100)
  • Use type hints where appropriate
  • Write comprehensive docstrings

Example:

def extract_entities(query: str, llm: BaseChatModel) -> Dict[str, Any]:
    """
    Extract entities from a natural language query.

    Args:
        query: Natural language question
        llm: Language model instance for extraction

    Returns:
        Dictionary containing extracted entities with keys:
        - study: Study name (if present)
        - samples: List of sample UIDs
        - assay: Assay type

    Raises:
        ExtractionError: If entity extraction fails
    """
    # Implementation

Documentation

  • Update README.md if adding user-facing features
  • Add examples to examples/ directory for new features
  • Update ROADMAP.md if implementing planned features
  • Keep IMPLEMENTATION_STATUS.md current with progress

Testing Guidelines

Test Structure

tests/
├── unit/              # Unit tests for individual components
├── integration/       # Integration tests for workflows
└── fixtures/          # Test data and fixtures

Writing Tests

  • Write tests for all new functionality
  • Aim for 80%+ code coverage
  • Use descriptive test names
  • Mock external dependencies (Neo4j, LLMs)

Example:

def test_entity_extractor_identifies_study_name(mock_llm):
    """Test that entity extractor correctly identifies study names."""
    query = "Find samples in the GBM Study"
    extractor = EntityExtractor(mock_llm)

    result = extractor.extract(query)

    assert result["study"] == "GBM Study"
    assert result["intent"] == "find_samples"

Adding New Features

New GEO Templates

See docs/guides/CUSTOM_TEMPLATE_GUIDE.md for detailed instructions.

Quick overview:

  1. Create template in chatseek/geo/templates.py
  2. Define required fields and sections
  3. Add validation logic
  4. Write tests in tests/unit/test_geo_templates.py
  5. Add example to examples/geo_examples.py

New Query Types

  1. Add query template to chatseek/graphrag/query_builder.py
  2. Update entity extraction patterns in chatseek/graphrag/entity_extractor.py
  3. Add integration test
  4. Document in README.md

Project Structure

chatseek/
├── chatseek/          # Main package
│   ├── core/         # Core infrastructure (config, database)
│   ├── graphrag/     # GraphRAG query system
│   ├── geo/          # GEO submission system
│   ├── cli/          # Command-line interface
│   └── utils/        # Utility functions
├── tests/            # Test suite
├── examples/         # Example scripts
├── demos/            # Streamlit demo app
├── docs/             # Documentation
└── notebooks/        # Jupyter tutorials

Documentation Structure

  • User-facing docs: Root-level .md files
  • Guides: docs/guides/
  • Archived docs: docs/archive/
  • Code docs: Inline docstrings

Getting Help

  • Issues: Check existing issues or open a new one
  • Discussions: Use GitHub Discussions for questions
  • Documentation: Review README.md and QUICKSTART.md

Code Review Process

All contributions go through code review:

  1. Automated checks: Tests, coverage, linting must pass
  2. Manual review: Maintainer reviews code quality and design
  3. Feedback: Address any requested changes
  4. Merge: Once approved, PR is merged

Release Process

ChatSEEK uses semantic versioning (MAJOR.MINOR.PATCH):

  • MAJOR: Breaking API changes
  • MINOR: New features (backward compatible)
  • PATCH: Bug fixes

Releases are managed by project maintainers.

License

By contributing, you agree that your contributions will be licensed under the MIT License.

Questions?

Feel free to open an issue or reach out to the maintainers. We appreciate your contributions!


Thank you for helping make ChatSEEK better!