diff --git a/docs/changelog.md b/docs/changelog.md index 17292923..37dbfcc3 100644 --- a/docs/changelog.md +++ b/docs/changelog.md @@ -8,6 +8,18 @@ All notable changes to SynapseKit are documented here. --- +## Unreleased + +### Added +- **`GitLoader`** — load files from any Git repository (local path or remote URL) at a specific revision; glob pattern filtering; metadata includes path, commit hash, author, date; sync `load()` and async `aload()`; `pip install synapsekit[git]` +- **`GoogleSheetsLoader`** — load rows from a Google Sheets spreadsheet as Documents; service account auth via credentials file; auto-detects first sheet if none specified; header-based row-to-text formatting; sync `load()` and async `aload()`; `pip install synapsekit[gsheets]` +- **`JiraLoader`** — load Jira issues via JQL queries; full Atlassian Document Format (ADF) parsing; pagination; rate-limit retry; async `aload()` via httpx; optional `limit`; `pip install synapsekit[jira]` +- **`SupabaseLoader`** — load rows from a Supabase table as Documents; configurable text/metadata columns; env var auth (`SUPABASE_URL`, `SUPABASE_KEY`); sync `load()` and async `aload()`; `pip install synapsekit[supabase]` + +**Stats:** 1752 tests · 30 LLM providers · 46 tools · 33 loaders · 9 text splitters · 9 vector store backends + +--- + ## v1.5.0 — New Loaders, Tools & Providers **Released:** 2026-04-07 diff --git a/docs/intro.md b/docs/intro.md index 18cfddc9..975b295f 100644 --- a/docs/intro.md +++ b/docs/intro.md @@ -4,7 +4,7 @@ sidebar_position: 1 # Introduction -**SynapseKit** is an async-native Python framework for building LLM applications — RAG pipelines, tool-using agents, and graph workflows. Streaming-first, transparent API, 2 hard deps. 30 providers · 46 tools · 29 loaders · 9 vector stores. +**SynapseKit** is an async-native Python framework for building LLM applications — RAG pipelines, tool-using agents, and graph workflows. Streaming-first, transparent API, 2 hard deps. 30 providers · 46 tools · 33 loaders · 9 vector stores. It is designed from the ground up to be **async-native** and **streaming-first**. Every public API is `async`. Streaming tokens is the default, not an opt-in. There are no hidden chains, no magic callbacks, no global state. @@ -54,9 +54,9 @@ InMemoryVectorStore (built-in, `.npz` persistence), ChromaDB, FAISS, Qdrant, Pin → [Vector store docs](/docs/rag/vector-stores) -### 29 document loaders +### 33 document loaders -`TextLoader`, `StringLoader`, `PDFLoader`, `HTMLLoader`, `CSVLoader`, `JSONLoader`, `YAMLLoader`, `XMLLoader`, `DiscordLoader`, `SlackLoader`, `NotionLoader`, `GoogleDriveLoader`, `DirectoryLoader`, `WebLoader`, `ExcelLoader`, `PowerPointLoader`, `DocxLoader`, `MarkdownLoader`, `AudioLoader`, `VideoLoader`, `WikipediaLoader`, `ArXivLoader`, `EmailLoader`, `ImageLoader`, `ConfluenceLoader`, `RSSLoader`, `GCSLoader`, `SQLLoader`, `GitHubLoader`. +`TextLoader`, `StringLoader`, `PDFLoader`, `HTMLLoader`, `CSVLoader`, `JSONLoader`, `YAMLLoader`, `XMLLoader`, `DiscordLoader`, `SlackLoader`, `NotionLoader`, `GoogleDriveLoader`, `GoogleSheetsLoader`, `DirectoryLoader`, `WebLoader`, `ExcelLoader`, `PowerPointLoader`, `DocxLoader`, `MarkdownLoader`, `AudioLoader`, `VideoLoader`, `WikipediaLoader`, `ArXivLoader`, `EmailLoader`, `ImageLoader`, `ConfluenceLoader`, `RSSLoader`, `GCSLoader`, `SQLLoader`, `GitHubLoader`, `GitLoader`, `JiraLoader`, `SupabaseLoader`. → [Loader docs](/docs/rag/loaders) diff --git a/docs/rag/loaders.md b/docs/rag/loaders.md index e887fec2..78870a93 100644 --- a/docs/rag/loaders.md +++ b/docs/rag/loaders.md @@ -892,6 +892,150 @@ Includes retry with exponential back-off for rate limits (HTTP 429) and 5xx erro --- +## GitLoader + +Load files from a Git repository — local path or remote URL — at any revision. Supports glob pattern filtering. + +```bash +pip install synapsekit[git] +``` + +```python +from synapsekit import GitLoader + +# Local repo, all files at HEAD +loader = GitLoader("/path/to/repo") + +# Remote repo, specific revision, only Python files +loader = GitLoader( + repo="https://github.com/org/repo.git", + revision="v2.0.0", + glob_pattern="**/*.py", +) + +docs = loader.load() +# or +docs = await loader.aload() +``` + +| Parameter | Type | Default | Description | +|---|---|---|---| +| `repo` | `str` | required | Local path or remote URL | +| `revision` | `str` | `"HEAD"` | Git revision (branch, tag, commit hash) | +| `glob_pattern` | `str` | `"**/*"` | Glob filter for file paths | + +Each document's metadata includes `path`, `commit_hash`, `author`, and `date`. + +--- + +## GoogleSheetsLoader + +Load rows from a Google Sheets spreadsheet as Documents. Each row becomes one document; headers become field names. + +```bash +pip install synapsekit[gsheets] +``` + +```python +from synapsekit import GoogleSheetsLoader + +loader = GoogleSheetsLoader( + spreadsheet_id="1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs74OgVE2upms", + sheet_name="Sheet1", # optional — auto-detects first sheet + credentials_path="credentials.json", +) + +docs = loader.load() +# or +docs = await loader.aload() +``` + +| Parameter | Type | Default | Description | +|---|---|---|---| +| `spreadsheet_id` | `str` | required | Google Sheets document ID from the URL | +| `sheet_name` | `str \| None` | `None` | Sheet tab name; first sheet used if omitted | +| `credentials_path` | `str` | `"credentials.json"` | Path to service account credentials file | + +Row text format: `"ColumnA: value, ColumnB: value, ..."`. Metadata includes `source` URL, `sheet`, and `row` index. + +--- + +## JiraLoader + +Load Jira issues using a JQL query. Handles Atlassian Document Format (ADF) descriptions, pagination, and rate-limit retry automatically. + +```bash +pip install synapsekit[jira] +``` + +```python +from synapsekit import JiraLoader + +loader = JiraLoader( + url="https://your-domain.atlassian.net", + username="your-email@example.com", + api_token="your-api-token", + jql="project = MYPROJ AND status = Open", + limit=100, # optional +) + +# Async (recommended) +docs = await loader.aload() + +# Sync +docs = loader.load() +``` + +| Parameter | Type | Default | Description | +|---|---|---|---| +| `url` | `str` | required | Jira instance base URL | +| `username` | `str` | required | Jira account email | +| `api_token` | `str` | required | Jira API token | +| `jql` | `str` | required | JQL query string | +| `limit` | `int \| None` | `None` | Maximum number of issues to load | + +Each document includes the issue summary, description, and comments. Metadata includes `key`, `status`, `assignee`, `priority`, and `source`. + +--- + +## SupabaseLoader + +Load rows from a Supabase table as Documents. Supports column selection and environment variable auth. + +```bash +pip install synapsekit[supabase] +``` + +```python +from synapsekit import SupabaseLoader + +# All columns, credentials from env vars (SUPABASE_URL, SUPABASE_KEY) +loader = SupabaseLoader(table="articles") + +# Specific text and metadata columns +loader = SupabaseLoader( + table="articles", + supabase_url="https://xyz.supabase.co", + supabase_key="your-anon-key", + text_columns=["title", "content"], + metadata_columns=["id", "author", "created_at"], +) + +docs = loader.load() +# or +docs = await loader.aload() +``` + +| Parameter | Type | Default | Description | +|---|---|---|---| +| `table` | `str` | required | Supabase table name | +| `supabase_url` | `str \| None` | `SUPABASE_URL` env | Supabase project URL | +| `supabase_key` | `str \| None` | `SUPABASE_KEY` env | Supabase anon/service key | +| `text_columns` | `list[str] \| None` | `None` | Columns to include in document text; all columns used if omitted | +| `metadata_columns` | `list[str] \| None` | `None` | Columns to include in metadata | + +--- + ## Loading into the RAG facade All loaders return `List[Document]`, which you can pass directly to `add_documents()`: