Thank you for your interest in contributing to ChatSEEK! This document provides guidelines for contributing to the project.
- Python 3.8 or higher
- Neo4j database (5.18.1+)
- Git
-
Clone the repository:
git clone https://github.com/yourorg/chatseek.git cd chatseek -
Create a virtual environment:
python -m venv .venv source .venv/bin/activate # On Windows: .venv\Scripts\activate
-
Install development dependencies:
pip install -e ".[dev]" -
Set up environment variables:
cp .env.example .env # Edit .env with your Neo4j and API credentials -
Run tests to verify setup:
pytest tests/ -v
git checkout -b feature/your-feature-name- Write clear, documented code
- Follow existing code style (Black formatter, line length 100)
- Add docstrings to all public functions and classes
- Update relevant documentation
# Run all tests
pytest tests/ -v
# Run with coverage
pytest tests/ --cov=chatseek --cov-report=html
# Run specific test file
pytest tests/unit/test_query_engine.py -v# Format with Black
black chatseek/ tests/
# Check with Ruff
ruff check chatseek/ tests/Use clear, descriptive commit messages:
git add .
git commit -m "feat: Add support for new GEO template type"Commit message format:
feat:- New featurefix:- Bug fixdocs:- Documentation changestest:- Test additions or modificationsrefactor:- Code refactoringchore:- Maintenance tasks
git push origin feature/your-feature-nameThen create a pull request on GitHub with:
- Clear description of changes
- Reference to any related issues
- Screenshots/examples if applicable
- Follow PEP 8 style guide
- Use Black formatter (line length: 100)
- Use type hints where appropriate
- Write comprehensive docstrings
Example:
def extract_entities(query: str, llm: BaseChatModel) -> Dict[str, Any]:
"""
Extract entities from a natural language query.
Args:
query: Natural language question
llm: Language model instance for extraction
Returns:
Dictionary containing extracted entities with keys:
- study: Study name (if present)
- samples: List of sample UIDs
- assay: Assay type
Raises:
ExtractionError: If entity extraction fails
"""
# Implementation- Update README.md if adding user-facing features
- Add examples to
examples/directory for new features - Update ROADMAP.md if implementing planned features
- Keep IMPLEMENTATION_STATUS.md current with progress
tests/
├── unit/ # Unit tests for individual components
├── integration/ # Integration tests for workflows
└── fixtures/ # Test data and fixtures
- Write tests for all new functionality
- Aim for 80%+ code coverage
- Use descriptive test names
- Mock external dependencies (Neo4j, LLMs)
Example:
def test_entity_extractor_identifies_study_name(mock_llm):
"""Test that entity extractor correctly identifies study names."""
query = "Find samples in the GBM Study"
extractor = EntityExtractor(mock_llm)
result = extractor.extract(query)
assert result["study"] == "GBM Study"
assert result["intent"] == "find_samples"See docs/guides/CUSTOM_TEMPLATE_GUIDE.md for detailed instructions.
Quick overview:
- Create template in
chatseek/geo/templates.py - Define required fields and sections
- Add validation logic
- Write tests in
tests/unit/test_geo_templates.py - Add example to
examples/geo_examples.py
- Add query template to
chatseek/graphrag/query_builder.py - Update entity extraction patterns in
chatseek/graphrag/entity_extractor.py - Add integration test
- Document in README.md
chatseek/
├── chatseek/ # Main package
│ ├── core/ # Core infrastructure (config, database)
│ ├── graphrag/ # GraphRAG query system
│ ├── geo/ # GEO submission system
│ ├── cli/ # Command-line interface
│ └── utils/ # Utility functions
├── tests/ # Test suite
├── examples/ # Example scripts
├── demos/ # Streamlit demo app
├── docs/ # Documentation
└── notebooks/ # Jupyter tutorials
- User-facing docs: Root-level
.mdfiles - Guides:
docs/guides/ - Archived docs:
docs/archive/ - Code docs: Inline docstrings
- Issues: Check existing issues or open a new one
- Discussions: Use GitHub Discussions for questions
- Documentation: Review
README.mdandQUICKSTART.md
All contributions go through code review:
- Automated checks: Tests, coverage, linting must pass
- Manual review: Maintainer reviews code quality and design
- Feedback: Address any requested changes
- Merge: Once approved, PR is merged
ChatSEEK uses semantic versioning (MAJOR.MINOR.PATCH):
- MAJOR: Breaking API changes
- MINOR: New features (backward compatible)
- PATCH: Bug fixes
Releases are managed by project maintainers.
By contributing, you agree that your contributions will be licensed under the MIT License.
Feel free to open an issue or reach out to the maintainers. We appreciate your contributions!
Thank you for helping make ChatSEEK better!