diff --git a/docs/agents/tools.md b/docs/agents/tools.md index 05ee86c9..c0f99e59 100644 --- a/docs/agents/tools.md +++ b/docs/agents/tools.md @@ -1424,3 +1424,44 @@ result = await tool.run(action="list_products", limit=10) | `limit` | `10` | Max results for list endpoints (max 100) | Auth via `STRIPE_API_KEY` env var (use restricted read-only keys in production). + +## LinearTool + +Manage Linear issues via the Linear GraphQL API. Stdlib `urllib` only — no extra dependencies. + +```python +from synapsekit import LinearTool + +tool = LinearTool() +# set LINEAR_API_KEY env var + +# List issues for a team +result = await tool.run(action="list_issues", team_id="team-uuid") + +# Get a single issue +result = await tool.run(action="get_issue", issue_id="ISS-42") + +# Create an issue +result = await tool.run( + action="create_issue", + team_id="team-uuid", + title="Add dark mode", + description="Users have been asking for it", + priority=2, +) + +# Update an issue's state +result = await tool.run(action="update_issue", issue_id="ISS-42", status="state-uuid") +``` + +| Parameter | Default | Description | +|---|---|---| +| `action` | — | `list_issues`, `get_issue`, `create_issue`, `update_issue` (required) | +| `team_id` | — | Linear team ID (required for `list_issues` and `create_issue`) | +| `issue_id` | — | Linear issue ID (required for `get_issue` and `update_issue`) | +| `title` | — | Issue title (required for `create_issue`) | +| `description` | — | Issue body markdown | +| `priority` | `0` | `0` none, `1` urgent, `2` high, `3` medium, `4` low | +| `status` | — | New `stateId` for `update_issue` | + +Auth via constructor arg or `LINEAR_API_KEY` env var. diff --git a/docs/changelog.md b/docs/changelog.md index 20dff7ed..d94bba0c 100644 --- a/docs/changelog.md +++ b/docs/changelog.md @@ -17,8 +17,19 @@ All notable changes to SynapseKit are documented here. - **`CodeSplitter`** — split source code using language-aware separators; supports Python, JavaScript, TypeScript, Go, Rust, Java, C++; preserves logical structures (classes, functions); falls back to recursive character splitting - **`SentenceWindowSplitter`** — one chunk per sentence, padded with up to `window_size` surrounding sentences; `split_with_metadata()` adds `target_sentence` to each chunk's metadata; useful for retrieval systems that embed with context but score by target sentence - **`TwilioTool`** — send SMS and WhatsApp messages via the Twilio REST API; stdlib `urllib` only, no extra deps; auth via constructor args or env vars; automatic `whatsapp:` prefix handling for both sender and recipient; security warning logged on instantiation - -**Stats:** 1500 tests · 27 LLM providers · 43 tools · 26 loaders · 8 text splitters · 9 vector store backends +- **`NewsTool`** — fetch top headlines and search articles via NewsAPI; actions: `get_headlines`, `search`; stdlib urllib only; auth via constructor arg or `NEWS_API_KEY` env var +- **`WeatherTool`** — get current weather and short-term forecasts via OpenWeatherMap; actions: `current`, `forecast` (1–5 day); async-safe with `run_in_executor`; auth via `OPENWEATHERMAP_API_KEY` +- **`StripeTool`** — read-only Stripe data lookup: `get_customer`, `list_invoices`, `get_charge`, `list_products`; stdlib urllib only; auth via `STRIPE_API_KEY`; async-safe with `run_in_executor` +- **`LinearTool`** — manage Linear issues via the Linear GraphQL API; actions: `list_issues`, `get_issue`, `create_issue`, `update_issue`; stdlib urllib only, no extra deps; auth via constructor arg or `LINEAR_API_KEY` +- **`XaiLLM`** — xAI Grok LLM provider; OpenAI-compatible API; supports `grok-beta`, `grok-2`, `grok-2-mini`; streaming and tool calling; `pip install synapsekit[openai]` +- **`NovitaLLM`** — NovitaAI LLM provider; OpenAI-compatible API; supports Llama, Mistral, Qwen, and other open models; streaming and tool calling; `pip install synapsekit[openai]` +- **`WriterLLM`** — Writer (Palmyra) LLM provider; OpenAI-compatible API; supports `palmyra-x-004`, `palmyra-x-003-instruct`, `palmyra-med`, `palmyra-fin`; streaming and tool calling; `pip install synapsekit[openai]` +- **`HTMLTextSplitter`** — split HTML documents on block-level tags (h1–h6, p, div, section, article, li, blockquote, pre); strips tags to plain text; falls back to `RecursiveCharacterTextSplitter` for long sections; stdlib `html.parser` only +- **`GCSLoader`** — load files from Google Cloud Storage buckets as Documents; service account auth (file path or dict) or default credentials; prefix filtering, `max_files` limit, binary file handling; sync `load()` and async `aload()`; `pip install synapsekit[gcs]` +- **`SQLLoader`** — load rows from any SQLAlchemy-supported database (PostgreSQL, MySQL, SQLite, etc.) as Documents; configurable text/metadata columns; full SQL query support; sync `load()` and async `aload()`; `pip install synapsekit[sql]` +- **`GitHubLoader`** — load README, issues, pull requests, or repository files from GitHub via the REST API; retry with exponential back-off for rate limits and 5xx; optional token auth for higher rate limits; path filtering and limit for files; uses existing `httpx` dep; sync `load_sync()` and async `load()` + +**Stats:** 1715 tests · 30 LLM providers · 46 tools · 29 loaders · 9 text splitters · 9 vector store backends --- diff --git a/docs/intro.md b/docs/intro.md index 445d8e78..83c33e52 100644 --- a/docs/intro.md +++ b/docs/intro.md @@ -4,7 +4,7 @@ sidebar_position: 1 # Introduction -**SynapseKit** is an async-native Python framework for building LLM applications — RAG pipelines, tool-using agents, and graph workflows. Streaming-first, transparent API, 2 hard deps. 30 providers · 45 tools · 26 loaders · 9 vector stores. +**SynapseKit** is an async-native Python framework for building LLM applications — RAG pipelines, tool-using agents, and graph workflows. Streaming-first, transparent API, 2 hard deps. 30 providers · 46 tools · 29 loaders · 9 vector stores. It is designed from the ground up to be **async-native** and **streaming-first**. Every public API is `async`. Streaming tokens is the default, not an opt-in. There are no hidden chains, no magic callbacks, no global state. @@ -42,9 +42,9 @@ Full retrieval-augmented generation with chunking, embedding, vector search, BM2 → [RAG Pipeline docs](/docs/rag/pipeline) -### 27 LLM providers +### 30 LLM providers -OpenAI, Anthropic, Ollama, Cohere, Mistral, Gemini, AWS Bedrock, Azure OpenAI, Groq, DeepSeek, OpenRouter, Together, Fireworks, Perplexity, Cerebras, Vertex AI, Moonshot, Zhipu, Cloudflare, AI21 Labs, Databricks, Baidu ERNIE, llama.cpp, Minimax, Aleph Alpha, Hugging Face, SambaNova — all behind `BaseLLM`. Auto-detected from the model name. +OpenAI, Anthropic, Ollama, Cohere, Mistral, Gemini, AWS Bedrock, Azure OpenAI, Groq, DeepSeek, OpenRouter, Together, Fireworks, Perplexity, Cerebras, Vertex AI, Moonshot, Zhipu, Cloudflare, AI21 Labs, Databricks, Baidu ERNIE, llama.cpp, Minimax, Aleph Alpha, Hugging Face, SambaNova, xAI (Grok), NovitaAI, Writer (Palmyra) — all behind `BaseLLM`. Auto-detected from the model name. → [LLM Provider docs](/docs/llms/overview) @@ -54,9 +54,9 @@ InMemoryVectorStore (built-in, `.npz` persistence), ChromaDB, FAISS, Qdrant, Pin → [Vector store docs](/docs/rag/vector-stores) -### 26 document loaders +### 29 document loaders -`TextLoader`, `StringLoader`, `PDFLoader`, `HTMLLoader`, `CSVLoader`, `JSONLoader`, `YAMLLoader`, `XMLLoader`, `DiscordLoader`, `SlackLoader`, `NotionLoader`, `GoogleDriveLoader`, `DirectoryLoader`, `WebLoader`, `ExcelLoader`, `PowerPointLoader`, `DocxLoader`, `MarkdownLoader`, `AudioLoader`, `VideoLoader`, `WikipediaLoader`, `ArXivLoader`, `EmailLoader`, `ImageLoader`, `ConfluenceLoader`, `RSSLoader`. +`TextLoader`, `StringLoader`, `PDFLoader`, `HTMLLoader`, `CSVLoader`, `JSONLoader`, `YAMLLoader`, `XMLLoader`, `DiscordLoader`, `SlackLoader`, `NotionLoader`, `GoogleDriveLoader`, `DirectoryLoader`, `WebLoader`, `ExcelLoader`, `PowerPointLoader`, `DocxLoader`, `MarkdownLoader`, `AudioLoader`, `VideoLoader`, `WikipediaLoader`, `ArXivLoader`, `EmailLoader`, `ImageLoader`, `ConfluenceLoader`, `RSSLoader`, `GCSLoader`, `SQLLoader`, `GitHubLoader`. → [Loader docs](/docs/rag/loaders) @@ -65,7 +65,7 @@ InMemoryVectorStore (built-in, `.npz` persistence), ChromaDB, FAISS, Qdrant, Pin `ReActAgent` — Thought → Action → Observation loop, works with any LLM. `FunctionCallingAgent` — native `tool_calls` / `tool_use` for OpenAI, Anthropic, Gemini, and Mistral. `AgentExecutor` — unified runner, picks the right agent from config. -45 built-in tools: Calculator, PythonREPL, FileRead, FileWrite, FileList, WebSearch, DuckDuckGoSearch, SQL, HTTP, GraphQL, DateTime, Regex, JSONQuery, HumanInput, Wikipedia, Summarization, SentimentAnalysis, Translation, WebScraper, Shell, SQLSchemaInspection, PDFReader, ArxivSearch, TavilySearch, Email, GitHubAPI, PubMedSearch, VectorSearch, YouTubeSearch, Slack, Notion, Jira, BraveSearch, APIBuilder, GoogleCalendar, AWSLambda, ImageAnalysis, TextToSpeech, SpeechToText, BingSearch, WolframAlpha, GoogleSearch, Twilio, NewsTool, WeatherTool, StripeTool. +46 built-in tools: Calculator, PythonREPL, FileRead, FileWrite, FileList, WebSearch, DuckDuckGoSearch, SQL, HTTP, GraphQL, DateTime, Regex, JSONQuery, HumanInput, Wikipedia, Summarization, SentimentAnalysis, Translation, WebScraper, Shell, SQLSchemaInspection, PDFReader, ArxivSearch, TavilySearch, Email, GitHubAPI, PubMedSearch, VectorSearch, YouTubeSearch, Slack, Notion, Jira, BraveSearch, APIBuilder, GoogleCalendar, AWSLambda, ImageAnalysis, TextToSpeech, SpeechToText, BingSearch, WolframAlpha, GoogleSearch, Twilio, NewsTool, WeatherTool, StripeTool, LinearTool. → [Agent docs](/docs/agents/overview) diff --git a/docs/rag/loaders.md b/docs/rag/loaders.md index a63b74cb..e887fec2 100644 --- a/docs/rag/loaders.md +++ b/docs/rag/loaders.md @@ -795,6 +795,103 @@ Each feed entry becomes one `Document`. Metadata fields (`title`, `published`, ` --- +## GCSLoader + +Load files from a Google Cloud Storage bucket as Documents. Install with `pip install synapsekit[gcs]`. + +```python +from synapsekit import GCSLoader + +loader = GCSLoader( + bucket_name="my-bucket", + prefix="documents/", + credentials_path="service-account.json", + max_files=100, +) + +docs = await loader.aload() +``` + +| Parameter | Type | Description | +|---|---|---| +| `bucket_name` | `str` | GCS bucket name (required) | +| `prefix` | `str \| None` | Optional prefix filter (e.g. `"documents/"`) | +| `credentials_path` | `str \| None` | Path to a service account JSON file | +| `credentials_dict` | `dict \| None` | Service account credentials as a dict | +| `max_files` | `int \| None` | Maximum number of files to load | + +If neither `credentials_path` nor `credentials_dict` is provided, default application credentials are used. Binary files are kept with a placeholder string and their content type in metadata. + +--- + +## SQLLoader + +Load rows from any SQLAlchemy-supported database (PostgreSQL, MySQL, SQLite, etc.) as Documents. Install with `pip install synapsekit[sql]`. + +```python +from synapsekit import SQLLoader + +loader = SQLLoader( + connection_string="postgresql://user:pass@localhost/db", + query="SELECT id, title, body, author FROM articles WHERE published = true", + text_columns=["title", "body"], + metadata_columns=["id", "author"], +) + +docs = await loader.aload() +``` + +| Parameter | Type | Description | +|---|---|---| +| `connection_string` | `str` | SQLAlchemy database URL (required) | +| `query` | `str` | SQL query to execute (required) | +| `text_columns` | `list[str] \| None` | Columns concatenated into the document text. Defaults to all columns. | +| `metadata_columns` | `list[str] \| None` | Columns included in metadata. Defaults to all columns. | + +Each Document gets `metadata["source"] = "sql"` and `metadata["row_index"]` automatically. + +--- + +## GitHubLoader + +Load README, issues, pull requests, or repository files from a GitHub repository via the REST API. Uses the existing `httpx` dependency — no new install needed if you already have `synapsekit[web]`. + +```python +from synapsekit import GitHubLoader + +# README +loader = GitHubLoader(repo="SynapseKit/SynapseKit", content_type="readme") + +# Issues (filters out PRs automatically) +loader = GitHubLoader(repo="SynapseKit/SynapseKit", content_type="issues", limit=20) + +# Pull requests +loader = GitHubLoader(repo="SynapseKit/SynapseKit", content_type="prs", limit=10) + +# Repository files (recursive Git Trees API) +loader = GitHubLoader( + repo="SynapseKit/SynapseKit", + content_type="files", + path="src/synapsekit/llm/", + limit=50, + token="ghp_...", # optional but recommended for higher rate limits +) + +docs = await loader.load() +``` + +| Parameter | Type | Description | +|---|---|---| +| `repo` | `str` | Repository in `owner/repo` format (required) | +| `content_type` | `"readme" \| "issues" \| "prs" \| "files"` | What to load. Defaults to `"readme"`. | +| `token` | `str \| None` | GitHub token for higher rate limits | +| `path` | `str \| None` | Path prefix filter (only for `files`) | +| `limit` | `int \| None` | Maximum number of items to load | + +Includes retry with exponential back-off for rate limits (HTTP 429) and 5xx errors. + +--- + ## Loading into the RAG facade All loaders return `List[Document]`, which you can pass directly to `add_documents()`: