From 0e0d1d0a34332d1c369f8363a268f02b2e779828 Mon Sep 17 00:00:00 2001 From: Dominik Safaric Date: Thu, 5 Mar 2026 20:57:57 +0100 Subject: [PATCH 1/3] MCP 1.0 docs --- agentic/mcp.mdx | 21 ++ agentic/setup-guide.mdx | 139 ++++++++++++ agentic/tool-use.mdx | 224 ++++++++++++++++++ agentic/tutorials/databricks.mdx | 375 +++++++++++++++++++++++++++++++ docs.json | 15 ++ integrations/mcp.mdx | 327 --------------------------- 6 files changed, 774 insertions(+), 327 deletions(-) create mode 100644 agentic/mcp.mdx create mode 100644 agentic/setup-guide.mdx create mode 100644 agentic/tool-use.mdx create mode 100644 agentic/tutorials/databricks.mdx delete mode 100644 integrations/mcp.mdx diff --git a/agentic/mcp.mdx b/agentic/mcp.mdx new file mode 100644 index 0000000..e009e1f --- /dev/null +++ b/agentic/mcp.mdx @@ -0,0 +1,21 @@ +--- +title: "Model Context Protocol" +description: "" +--- + +Connect your AI tools to TabPFN using the [Model Context Protocol](https://modelcontextprotocol.io/) (MCP). + +## What is TabPFN MCP? + +TabPFN MCP is a remote MCP with OAuth that gives AI tools secure access to TabPFN-2.5, our SOTA tabular foundation model. Our MCP server is available at: + +``` +https://api.priorlabs.ai/mcp/server +``` + +It integrates with popular AI assistants like Claude, enabling you to run predictions using natural language. TabPFN MCP implements the latest +[MCP Authorization](https://modelcontextprotocol.io/specification/2025-06-18/basic/authorization) and [Streamable HTTP](https://modelcontextprotocol.io/specification/2025-06-18/basic/transports#streamable-http) specifications. + + +To use the TabPFN MCP you need a Prior Labs account. You can sign up or log in at [ux.priorlabs.ai](https://ux.priorlabs.ai). + diff --git a/agentic/setup-guide.mdx b/agentic/setup-guide.mdx new file mode 100644 index 0000000..680a2cb --- /dev/null +++ b/agentic/setup-guide.mdx @@ -0,0 +1,139 @@ +--- +title: "Setup Guide" +description: "" +--- + +Connect your AI client to TabPFN MCP and authorize access to run inference through a natural language interface. + +#### Claude Code + +```bash +# If you haven't, install Claude Code +npm install -g @anthropic-ai/claude-code + +# Navigate to your project +cd your-tabpfn-project + +# Add TabPFN MCP (general access) +claude mcp add --transport http tabpfn https://api.priorlabs.ai/mcp/server + +# Start coding with Claude +claude + +# Authenticate the MCP tools by typing /mcp +# This will trigger the OAuth flow +/mcp +``` + +#### Claude.ai and Claude for desktop + + + +1. Open Settings in the sidebar +2. Navigate to Connectors and select Add custom connector +3. Configure the connector: + - Name: TabPFN + - URL: https://api.priorlabs.ai/mcp/server + + +Custom connectors using remote MCP are available on Claude and Claude Desktop for users on Pro, Max, Team, and Enterprise plans. + + +Alternatively, you may add the MCP server by editing the Claude Desktop config file: + +1. Locate your Claude Desktop config file based on your operating system: +2. Get your API key from Prior Labs: + - Navigate to [ux.priorlabs.ai](https://ux.priorlabs.ai) + - Log in to your account (or sign up if you don't have one) + - Copy your API key from the dashboard +3. Edit the config file to add the TabPFN server: +```json +{ + "mcpServers": { + "tabpfn": { + "url": "https://api.priorlabs.ai/mcp/server", + "headers": { + "Authorization": "Bearer YOUR_API_KEY_HERE" + } + } + } + } +``` +4. Replace `YOUR_API_KEY_HERE` with your actual API key from step 2 +5. Save the config file and restart Claude Desktop for the changes to take effect + +#### ChatGPT + + + +Follow these steps to set up TabPFN as a connector in ChatGPT: + +1. Enable Developer mode: + - Go to Settings → Connectors → Advanced settings → Developer mode +2. Open ChatGPT settings +3. In the Connectors tab, `Create` a new connector: + - Give it a name: TabPFN + - MCP server URL: https://api.priorlabs.ai/mcp/server + - Authentication: OAuth +4. Click Create + + +Custom connectors using MCP are available on ChatGPT for Pro and Plus accounts on the web. + + +#### Codex CLI + +Codex CLI is OpenAI's local coding agent that can run directly from your terminal. + +```bash +# Install Codex +npm i -g @openai/codex + +# Add TabPFN MCP +codex mcp add tabpfn --url https://api.priorlabs.ai/mcp/server + +# Start Codex +codex +``` + +When adding the MCP server, Codex will detect OAuth support and open your browser to authorize the connection. + +#### Cursor + +To add TabPFN MCP to your Cursor environment, add the snippet below to your project-specific or global `.cursor/mcp.json` file manually. For more details, see the [Cursor documentation](https://docs.cursor.com/en/context/mcp). + +```json +{ + "mcpServers": { + "tabpfn": { + "url": "https://api.priorlabs.ai/mcp/server" + } + } +} +``` + +Once the server is added, Cursor will attempt to connect and display a Needs login prompt. Click on this prompt to authorize Cursor to access your Prior Labs account. + +#### n8n + +Watch the video below to learn how to integrate TabPFN with n8n workflows. + + \ No newline at end of file diff --git a/agentic/tool-use.mdx b/agentic/tool-use.mdx new file mode 100644 index 0000000..a501a8c --- /dev/null +++ b/agentic/tool-use.mdx @@ -0,0 +1,224 @@ +--- +title: "Tool Use" +description: "" +--- + +The TabPFN MCP Server exposes a number of tools for performing classification and regression on Prior Labs’ +managed GPU infrastructure: + +### `upload_dataset` + +Get a secure upload link for your dataset. We recommend this for most workflows: your data is sent directly to storage instead of through the chat, so TabPFN can handle larger datasets without running into context limits or long execution time. + + +Uploading the file to the link requires a sandbox execution environment and outbound network access. Not all MCP clients support this, and some may require a paid plan. + + +This tool returns a `dataset_id` and `upload_url` - valid for 60 minutes. Call this tool once for separately uploading the training set and test dataset. + +#### Required Parameters + + + Which kind of file you’re uploading. + + - `"train.csv"` — Training data (the data the model learns from) + - `"test.csv"` — Test data (the data you want predictions for) + + +`fit_and_predict_from_dataset` + +Fit the TabPFN-2.5 model on your pre-uploaded data and generate predictions. Use this tool when you want to fit a new model from scratch. + +Upload both CSVs with `upload_dataset` first, then pass the two dataset IDs here. + +#### Required Parameters + + + The dataset ID you got from `upload_dataset` for your **training** CSV. That file should include all input columns plus the column you want to predict (the target). + + + + The dataset ID you got from `upload_dataset` for your **test** CSV. It should have the same input columns as the training file, but not the target column. + + + + The name of the column you want to predict in the training file — e.g. `"price"` or `"churned"`. + + + + The type of the predictive task. + + - `"classification"` — Predicting a category or class, including probability distributions + - `"regression"` — Predicting a continuous value + + +#### Optional Parameters + + + Prediction output type, default `"preds"` for classification and `"mean"` for regression. + + +### `predict_from_dataset` + +Run predictions with a previously model, using a different test set. + +Use `upload_dataset` to upload your new test dataset file, then pass that `dataset_id` and the model ID you got from a previous `fit_and_predict_*` call. The test CSV must have the same set of features the model was fitted on. + +#### Required Parameters + + + The model ID of the previously model, from `fit_and_predict_from_dataset` or `fit_and_predict_inline`. + + + + The dataset ID of the test dataset predictions will be made from. + + + + The type of the predictive task. + + - `"classification"` — Predicting a category or class, including probability distributions + - `"regression"` — Predicting a continuous value + + +#### Optional Parameters + + + Prediction output type, default `"preds"` for classification and `"mean"` for regression. + + +### `fit_and_predict_inline` + +Use this tool when you want to fit a new model from scratch. It fits on your data and immediately returns predictions for your test set, along with a `model_id` for future reuse. + + +Best for small datasets that fit in the conversation without running into context limits. + + +#### Required Parameters + + + Training features as a 2D array where rows represent samples and columns represent features. + + - **Shape:** `(n_train_samples, n_features)` + - **Data types:** Numeric (int/float) or categorical (string) values + - **Flexibility:** Handles missing values, outliers, and mixed data types automatically + + + ```python + X_train = [ + [1.5, "red", 3], + [2.0, "blue", 4], + [1.8, "red", 5] + ] + ``` + + + + + Training targets as a 1D array that must align with `X_train` rows. + + - **Shape:** `(n_train_samples,)` + - **Classification:** Class labels (e.g., `[0, 1, 0]` or `["cat", "dog", "cat"]`) + - **Regression:** Numeric values (e.g., `[23.5, 45.2, 12.8]`) + + + + Test features as a 2D array for generating predictions. + + - **Shape:** `(n_test_samples, n_features)` + - **Critical:** Must have the **same number of features** as `X_train` + + + ```python + X_test = [ + [1.8, "red", 5], + [2.3, "green", 2] + ] + ``` + + + + + The type of the predictive task. + + - `"classification"` — Predicting a category or class, including probability distributions + - `"regression"` — Predicting a continuous value + + +#### Optional Parameters + + + Prediction output type, default `"preds"` for classification and `"mean"` for regression. + + +#### Returns + + + Unique identifier for the fitted model. **Save this value** to reuse the model later with the `predict` tool. + + + + Prediction results based on the specified `output_type`. Format varies by task type and output type. + + +### `predict` + +Generate new predictions using a previously fitted TabPFN model. + +Use this tool after calling `fit_and_predict` to make predictions on new data using an existing model. + + +Best for small datasets that fit in the conversation without running into context limits. + + +#### Required Parameters + + + Identifier of a previously fitted model, returned from `fit_and_predict`. + + + ```python + model_id = "9f1526b2-388b-4849-b965-6373d35f1a6b" + ``` + + + + + Test features as a 2D array for generating predictions. + + - **Shape:** `(n_test_samples, n_features)` + - **Critical:** Must have the **same number of features** the model was originally fitted on + + + ```python + X_test = [ + [1.2, "red", 7], + [3.4, "blue", 1] + ] + ``` + + + + + The type of the predictive task. + + - `"classification"` — Predicting a category or class, including probability distributions + - `"regression"` — Predicting a continuous value + + +#### Optional Parameters + + + Prediction output type, default `"preds"` for classification and `"mean"` for regression. + + +#### Returns + + + Echo of the model ID used for prediction. + + + + Prediction results based on the specified `output_type`. + \ No newline at end of file diff --git a/agentic/tutorials/databricks.mdx b/agentic/tutorials/databricks.mdx new file mode 100644 index 0000000..f69313c --- /dev/null +++ b/agentic/tutorials/databricks.mdx @@ -0,0 +1,375 @@ +--- +title: "Databricks" +description: "" +--- + +This tutorial shows you how to connect your Databricks Delta table to TabPFN's MCP server +and run a churn prediction pipeline with an AI agent. + +### Prerequisites + +- Databricks workspace with a SQL warehouse running +- Create a Delta table named `customer_analytics` +- Get your [Prior Labs API key](/api-reference/getting-started#1-get-your-access-token) + +### Overview +Here's what happens end-to-end: +1. Sample customer data is pulled from a Databricks Delta table using a SQL connector. +2. The data is split into train/test sets and saved to disk locally. +3. An agent - built with the [OpenAI Agents SDK](https://github.com/openai/openai-agents-python) - takes over. +4. You get a churn model evaluation summary printed to stdout. + +### Databricks Delta table + +If you don't have a `customer_analytics` table yet, you can create one from Databricks' built-in +TPC-DS sample data. Run this in a Databricks notebook or SQL editor: + +```sql query.sql expandable +CREATE OR REPLACE TABLE customer_analytics AS +SELECT + c.c_customer_sk, + DATEDIFF( + DATE('2003-01-02'), + MAKE_DATE(c.c_birth_year, c.c_birth_month, c.c_birth_day) + ) / 365 AS customer_age, + c.c_preferred_cust_flag, + MIN(d.d_date) AS first_purchase_date, + MAX(d.d_date) AS last_purchase_date, + DATEDIFF(DATE('2003-01-02'), MAX(d.d_date)) AS days_since_last_purchase, + COUNT(DISTINCT d.d_date) AS total_purchase_days, + COUNT(*) AS total_transactions, + SUM(s.ss_net_paid) AS lifetime_value, + AVG(s.ss_net_paid) AS avg_transaction_value, + CASE + WHEN DATEDIFF(DATE('2003-01-02'), MAX(d.d_date)) > 180 THEN 1 + ELSE 0 + END AS is_churned +FROM samples.tpcds_sf1.customer c +JOIN samples.tpcds_sf1.store_sales s ON c.c_customer_sk = s.ss_customer_sk +JOIN samples.tpcds_sf1.date_dim d ON s.ss_sold_date_sk = d.d_date_sk +WHERE d.d_year BETWEEN 1998 AND 2003 + AND s.ss_customer_sk IS NOT NULL +GROUP BY + c.c_customer_sk, c.c_preferred_cust_flag, + c.c_birth_year, c.c_birth_month, c.c_birth_day +``` + +### Example + +The following code block contains all necessary code to run the OpenAI Agent with Databricks +and the TabPFN MCP server. + +```python example.py +import asyncio +import json +import os +import tempfile + +import httpx +import pandas as pd +from databricks import sql as dbsql +from databricks.sdk import WorkspaceClient +from sklearn.metrics import ( + accuracy_score, + f1_score, + precision_score, + recall_score, + roc_auc_score, +) +from sklearn.model_selection import train_test_split + +from agents import Agent, Runner, function_tool, enable_verbose_stdout_logging +from agents.mcp import MCPServerStreamableHttp + + +# Enable verbose stdout logging +enable_verbose_stdout_logging() + +# --------------------------------------------------------------------------- +# Configuration - edit these to match your environment +# --------------------------------------------------------------------------- + +PRIOR_LABS_API_KEY = os.environ["PRIORLABS_API_KEY"] +DATABRICKS_PROFILE = "DEFAULT" # Databricks CLI profile (~/.databrickscfg) +TABLE = os.environ["DATABRICKS_TABLE"] +TARGET_COLUMN = "is_churned" +DROP_COLS = ["c_customer_sk", "first_purchase_date", "last_purchase_date"] +LIMIT = 1_000 # rows to fetch for training context +TEMP_DIR = tempfile.mkdtemp() + +# --------------------------------------------------------------------------- +# Step 1 - Fetch data from Databricks +# --------------------------------------------------------------------------- + +def _get_warehouse_http_path() -> str: + """Return the HTTP path of the first available SQL warehouse in your workspace. + + If you want to target a specific warehouse, replace this with a hardcoded path: + return "/sql/1.0/warehouses/" + """ + ws = WorkspaceClient(profile=DATABRICKS_PROFILE) + warehouses = list(ws.warehouses.list()) + if not warehouses: + raise RuntimeError("No SQL warehouses found in your Databricks workspace.") + return f"/sql/1.0/warehouses/{warehouses[0].id}" + + +def fetch_data() -> pd.DataFrame: + """Query the customer_analytics Delta table and return the result as a DataFrame. + + Credentials are read from your Databricks CLI profile. + """ + ws = WorkspaceClient(profile=DATABRICKS_PROFILE) + host = ws.config.host.replace("https://", "").rstrip("/") + + with dbsql.connect( + server_hostname=host, + http_path=_get_warehouse_http_path(), + access_token=ws.config.token, + ) as conn: + with conn.cursor() as cursor: + cursor.execute(f"SELECT * FROM {TABLE} LIMIT {LIMIT}") + return cursor.fetchall_arrow().to_pandas() + +# --------------------------------------------------------------------------- +# Step 2 - Preprocess and split +# --------------------------------------------------------------------------- + +def preprocess(df: pd.DataFrame) -> pd.DataFrame: + """Drop columns that shouldn't be used as features and remove all-null rows. + + DROP_COLS contains identifiers and date columns that would either leak information + or add noise - adjust it to match your table schema. + """ + return ( + df.drop(columns=[c for c in DROP_COLS if c in df.columns]) + .dropna(how="all") + ) + + +def split(df: pd.DataFrame) -> tuple[pd.DataFrame, pd.DataFrame, pd.DataFrame]: + """Split the dataset 80/20 into train and test sets. + + Returns three DataFrames: + - train_df: full training set including the target column + - test_features: test set with the target column removed (what TabPFN predicts on) + - test_labels: held-out ground truth for evaluation + + In production you'd typically do a time-based split to avoid data leakage, but an + 80/20 random split works fine for demonstration purposes. + """ + # Split the data into train and test sets + train, test = train_test_split(df, test_size=0.2, random_state=42) + train_df = train.reset_index(drop=True) + + # Drop the target column from the test set so that the agent can predict on it + test_features = test.drop(columns=[TARGET_COLUMN]).reset_index(drop=True) + test_labels = test[[TARGET_COLUMN]].reset_index(drop=True) + return train_df, test_features, test_labels + + +def save_splits( + train_df: pd.DataFrame, + test_features: pd.DataFrame, + test_labels: pd.DataFrame, +) -> tuple[str, str, str]: + """Write the three splits to CSV files in a temp directory and return their paths. + + The agent needs local file paths to upload the data to TabPFN via presigned URLs. + A temp directory keeps things clean - it's automatically removed when the process exits. + """ + train_path = os.path.join(TEMP_DIR, "train.csv") + test_path = os.path.join(TEMP_DIR, "test.csv") + labels_path = os.path.join(TEMP_DIR, "test_labels.csv") + train_df.to_csv(train_path, index=False) + test_features.to_csv(test_path, index=False) + test_labels.to_csv(labels_path, index=False) + return train_path, test_path, labels_path + +# --------------------------------------------------------------------------- +# Step 3 - Agent tools +# +# The agent has three local tools alongside the TabPFN MCP tools: +# - put_file_to_signed_url: uploads a CSV to the presigned URL TabPFN provides +# - save_predictions: persists the raw TabPFN output to disk +# - compute_metrics: evaluates churn-specific classification performance +# --------------------------------------------------------------------------- + +@function_tool +async def put_file_to_signed_url(upload_url: str, filepath: str) -> str: + """Upload a local CSV file to the presigned URL returned by TabPFN's upload_dataset tool. + + TabPFN's upload_dataset MCP tool returns a short-lived presigned URL pointing to cloud + storage. This tool does the actual HTTP PUT so the agent doesn't have to handle raw bytes. + Returns "ok" on success, or raises on HTTP error. + """ + with open(filepath, "rb") as f: + async with httpx.AsyncClient() as client: + resp = await client.put(upload_url, content=f.read()) + resp.raise_for_status() + return "ok" + + +@function_tool +def save_predictions(predictions_json: str) -> str: + """Persist the raw JSON output from fit_and_predict_from_dataset to a CSV file. + + TabPFN returns churn probabilities as a JSON array of [p_active, p_churned] pairs. + This tool parses that output and writes it to predictions.csv with columns class_0 + and class_1, where class_1 is the probability of churn. + + Returns `predictions_path=` on success, or an ERROR string on failure. + """ + try: + predictions = json.loads(predictions_json) + + # If the predictions are a list of probabilities, create a DataFrame with the columns + # class_0, class_1, ..., otherwise create a DataFrame with a single column prediction + n_classes = len(predictions[0]) if isinstance(predictions[0], list) else 1 + if n_classes > 1: + df = pd.DataFrame(predictions, columns=[f"class_{i}" for i in range(n_classes)]) + else: + df = pd.DataFrame({"prediction": predictions}) + + # Save the predictions to a CSV file + path = os.path.join(TEMP_DIR, "predictions.csv") + df.to_csv(path, index=False) + return f"predictions_path={path}" + except Exception as e: + return f"ERROR saving predictions: {type(e).__name__}: {e}" + + +@function_tool +def compute_metrics(predictions_path: str, labels_path: str) -> str: + """Compute churn-specific evaluation metrics and return them as a JSON string. + + Reads the probability outputs from predictions_path (columns class_0, class_1) and the + ground truth from labels_path, then computes: + + roc_auc - how well the model ranks churners vs. actives across all thresholds + accuracy - overall fraction of correctly classified customers + f1_churn - F1 score for the churned class specifically; more informative than + accuracy when churners are a small fraction of your customer base + precision_churn - of customers the model flags as churners, how many actually churned + recall_churn - of actual churners, how many the model successfully identified + + Returns a JSON metrics object, or an ERROR string if files can't be loaded. + """ + try: + probas = pd.read_csv(predictions_path).values + y_true = pd.read_csv(labels_path).iloc[:, 0].values + except Exception as e: + return f"ERROR loading files: {type(e).__name__}: {e}" + + try: + y_score = probas[:, 1] + y_pred = probas.argmax(axis=1) + metrics = { + "roc_auc": round(float(roc_auc_score(y_true, y_score)), 4), + "accuracy": round(float(accuracy_score(y_true, y_pred)), 4), + "f1_churn": round(float(f1_score(y_true, y_pred, pos_label=1)), 4), + "precision_churn": round(float(precision_score(y_true, y_pred, pos_label=1)), 4), + "recall_churn": round(float(recall_score(y_true, y_pred, pos_label=1)), 4), + } + except Exception as e: + return f"ERROR computing metrics: {type(e).__name__}: {e}" + + return json.dumps(metrics, indent=2) + +# --------------------------------------------------------------------------- +# Step 4 - Build the agent +# --------------------------------------------------------------------------- + +INSTRUCTIONS = """You are a churn prediction assistant. The training and test CSV files +have already been prepared locally. Execute the following steps without pausing: + +STEP 1 - Upload training data. +Call `upload_dataset` with filename="train.csv" → get upload_url and train_dataset_id. +Call `put_file_to_signed_url(upload_url=, filepath=)`. + +STEP 2 - Upload test data. +Call `upload_dataset` with filename="test.csv" → get upload_url and test_dataset_id. +Call `put_file_to_signed_url(upload_url=, filepath=)`. + +STEP 3 - Run prediction. +Call `fit_and_predict_from_dataset` with: + - train_dataset_id: from STEP 1 + - test_dataset_id: from STEP 2 + - target_column: "{target_column}" + - task_type: "classification" + - output_type: "probas" + +STEP 4 - Save predictions. +Call `save_predictions(predictions_json=)`. +Parse the result to extract predictions_path. + +STEP 5 - Evaluate. +Call `compute_metrics(predictions_path=, labels_path=)`. + +STEP 6 - Report. +Return a concise churn model evaluation summary using the metrics from STEP 5. +""" + + +def build_agent(tabpfn_server: MCPServerStreamableHttp) -> Agent: + """Construct the churn prediction agent. + + The agent has access to two TabPFN MCP tools (upload_dataset and + fit_and_predict_from_dataset) plus three local function tools for file upload, + saving predictions, and computing metrics. + """ + return Agent( + name="Churn Predictor", + instructions=INSTRUCTIONS.format(target_column=TARGET_COLUMN), + mcp_servers=[tabpfn_server], + tools=[put_file_to_signed_url, save_predictions, compute_metrics], + ) + +# --------------------------------------------------------------------------- +# Entry point +# --------------------------------------------------------------------------- + +async def main() -> None: + """Orchestrate the full pipeline: fetch → preprocess → split → agent → print results.""" + # Fetch the data using the Databricks SQL connector + df = fetch_data() + + # Drop columns that shouldn't be used as features and remove all-null rows + df = preprocess(df) + + # Split the data into train and test sets and save them to disk + train_df, test_features, test_labels = split(df) + train_path, test_path, labels_path = save_splits(train_df, test_features, test_labels) + + print(f"Data ready - Train: {len(train_df)} rows | Test: {len(test_features)} rows") + + # The TabPFN MCP server. The tool_filter limits the agent to only the two tools it needs. + async with MCPServerStreamableHttp( + name="TabPFN", + params={ + "url": "https://api.priorlabs.ai/mcp/server", + "headers": {"Authorization": f"Bearer {PRIOR_LABS_API_KEY}"}, + "timeout": 60, + "sse_read_timeout": 300, + }, + client_session_timeout_seconds=120, + require_approval="never", + tool_filter={"allowed_tool_names": ["upload_dataset", "fit_and_predict_from_dataset"]}, + ) as tabpfn_server: + + result = await Runner.run( + build_agent(tabpfn_server), + ( + # The instructions for the agent; change this to your own instructions + "Which customers are most at risk of churning, and how well does the model perform?\n\n" + f"train_path={train_path}\n" + f"test_path={test_path}\n" + f"test_labels_path={labels_path}" + ) + ) + + print(result.final_output) + + +asyncio.run(main()) +``` \ No newline at end of file diff --git a/docs.json b/docs.json index f8156cd..a4c762f 100644 --- a/docs.json +++ b/docs.json @@ -98,6 +98,21 @@ "capabilities/embeddings" ] }, + { + "group": "Agentic", + "icon": "robot", + "pages": [ + "agentic/mcp", + "agentic/setup-guide", + "agentic/tool-use", + { + "group": "Tutorials", + "pages": [ + "agentic/tutorials/databricks" + ] + } + ] + }, { "group": "Extensions", "icon": "puzzle-piece", diff --git a/integrations/mcp.mdx b/integrations/mcp.mdx deleted file mode 100644 index 51eed5e..0000000 --- a/integrations/mcp.mdx +++ /dev/null @@ -1,327 +0,0 @@ ---- -title: "MCP" -description: "" ---- - -Connect your AI tools to TabPFN using the [Model Context Protocol](https://modelcontextprotocol.io/) (MCP). - -## What is TabPFN MCP? - -TabPFN MCP is a remote MCP with OAuth that gives AI tools secure access to TabPFN-2.5, our SOTA tabular foundation model. Our MCP server is available at: - -``` -https://api.priorlabs.ai/mcp/server -``` - -It integrates with popular AI assistants like Claude, enabling you to run predictions using natural language. TabPFN MCP implements the latest -[MCP Authorization](https://modelcontextprotocol.io/specification/2025-06-18/basic/authorization) and [Streamable HTTP](https://modelcontextprotocol.io/specification/2025-06-18/basic/transports#streamable-http) specifications. - - -To use the TabPFN MCP you need a Prior Labs account. You can sign up or log in at [ux.priorlabs.ai](https://ux.priorlabs.ai). - - -## Available tools - -The TabPFN MCP Server exposes two core tools for performing classification and regression on Prior Labs’ managed GPU infrastructure: - -* `fit_and_predict` - Fit and generate predictions in a single step using TabPFN-2.5. -* `predict` - Generate predictions using a previously fitted model. - -## Setup - -Connect your AI client to TabPFN MCP and authorize access to run inference through a natural language interface. - -#### Claude Code - -```bash -# If you haven't, install Claude Code -npm install -g @anthropic-ai/claude-code - -# Navigate to your project -cd your-tabpfn-project - -# Add TabPFN MCP (general access) -claude mcp add --transport http tabpfn https://api.priorlabs.ai/mcp/server - -# Start coding with Claude -claude - -# Authenticate the MCP tools by typing /mcp -# This will trigger the OAuth flow -/mcp -``` - -#### Claude.ai and Claude for desktop - - - -1. Open Settings in the sidebar -2. Navigate to Connectors and select Add custom connector -3. Configure the connector: - - Name: TabPFN - - URL: https://api.priorlabs.ai/mcp/server - - -Custom connectors using remote MCP are available on Claude and Claude Desktop for users on Pro, Max, Team, and Enterprise plans. - - -Alternatively, you may add the MCP server by editing the Claude Desktop config file: - -1. Locate your Claude Desktop config file based on your operating system: -2. Get your API key from Prior Labs: - - Navigate to [ux.priorlabs.ai](https://ux.priorlabs.ai) - - Log in to your account (or sign up if you don't have one) - - Copy your API key from the dashboard -3. Edit the config file to add the TabPFN server: -```json -{ - "mcpServers": { - "tabpfn": { - "url": "https://api.priorlabs.ai/mcp/server", - "headers": { - "Authorization": "Bearer YOUR_API_KEY_HERE" - } - } - } - } -``` -4. Replace `YOUR_API_KEY_HERE` with your actual API key from step 2 -5. Save the config file and restart Claude Desktop for the changes to take effect - -#### ChatGPT - - - -Follow these steps to set up TabPFN as a connector in ChatGPT: - -1. Enable Developer mode: - - Go to Settings → Connectors → Advanced settings → Developer mode -2. Open ChatGPT settings -3. In the Connectors tab, `Create` a new connector: - - Give it a name: TabPFN - - MCP server URL: https://api.priorlabs.ai/mcp/server - - Authentication: OAuth -4. Click Create - - -Custom connectors using MCP are available on ChatGPT for Pro and Plus accounts on the web. - - -#### Codex CLI - -Codex CLI is OpenAI's local coding agent that can run directly from your terminal. - -```bash -# Install Codex -npm i -g @openai/codex - -# Add TabPFN MCP -codex mcp add tabpfn --url https://api.priorlabs.ai/mcp/server - -# Start Codex -codex -``` - -When adding the MCP server, Codex will detect OAuth support and open your browser to authorize the connection. - -#### Cursor - -To add TabPFN MCP to your Cursor environment, add the snippet below to your project-specific or global `.cursor/mcp.json` file manually. For more details, see the [Cursor documentation](https://docs.cursor.com/en/context/mcp). - -```json -{ - "mcpServers": { - "tabpfn": { - "url": "https://api.priorlabs.ai/mcp/server" - } - } -} -``` - -Once the server is added, Cursor will attempt to connect and display a Needs login prompt. Click on this prompt to authorize Cursor to access your Prior Labs account. - -#### n8n - -Watch the video below to learn how to integrate TabPFN with n8n workflows. - - - -## Tool Reference - -### `fit_and_predict` - - -Fit the TabPFN-2.5 model on your data and generate predictions. - - -Use this tool when you want to fit a new model from scratch. It fits on your data and immediately returns predictions for your test set, along with a `model_id` for future reuse. - -#### Required Parameters - - - Training features as a 2D array where rows represent samples and columns represent features. - - - **Shape:** `(n_train_samples, n_features)` - - **Data types:** Numeric (int/float) or categorical (string) values - - **Flexibility:** Handles missing values, outliers, and mixed data types automatically - - - ```python - X_train = [ - [1.5, "red", 3], - [2.0, "blue", 4], - [1.8, "red", 5] - ] - ``` - - - - - Training targets as a 1D array that must align with `X_train` rows. - - - **Shape:** `(n_train_samples,)` - - **Classification:** Class labels (e.g., `[0, 1, 0]` or `["cat", "dog", "cat"]`) - - **Regression:** Numeric values (e.g., `[23.5, 45.2, 12.8]`) - - - - Test features as a 2D array for generating predictions. - - - **Shape:** `(n_test_samples, n_features)` - - **Critical:** Must have the **same number of features** as `X_train` - - - ```python - X_test = [ - [1.8, "red", 5], - [2.3, "green", 2] - ] - ``` - - - - - Prediction task type. Must be specified explicitly. - - - `"classification"` - For classification tasks. - - `"regression"` - For regression tasks. - - -#### Optional Parameters - - - Format of the prediction output. - - - `"preds"` - Predicted class labels (classification) or mean values (regression) - - `"probas"` - Class probability distributions (**classification only**) - - - `output_type="probas"` is only valid for classification tasks and will return an error for regression. - - - - ```python - # For a binary classification problem - predictions = [ - [0.8, 0.2], # 80% probability class 0, 20% class 1 - [0.3, 0.7] # 30% probability class 0, 70% class 1 - ] - ``` - - - -#### Returns - - - Unique identifier for the fitted model. **Save this value** to reuse the model later with the `predict` tool. - - - - Prediction results based on the specified `output_type`. Format varies by task type and output type. - - ---- - -### `predict` - - -Generate new predictions using a previously fitted TabPFN model. - - -Use this tool after calling `fit_and_predict` to make predictions on new data using an existing model. - -#### Required Parameters - - - Identifier of a previously fitted model, returned from `fit_and_predict`. - - - ```python - model_id = "9f1526b2-388b-4849-b965-6373d35f1a6b" - ``` - - - - - Test features as a 2D array for generating predictions. - - - **Shape:** `(n_test_samples, n_features)` - - **Critical:** Must have the **same number of features** the model was originally fitted on - - - ```python - X_test = [ - [1.2, "red", 7], - [3.4, "blue", 1] - ] - ``` - - - - - Must match the task type the model was fitted for. - - - `"classification"` - For classification tasks - - `"regression"` - For regression tasks - - -#### Optional Parameters - - - Format of the prediction output. - - - `"preds"` - Predicted labels (classification) or values (regression) - - `"probas"` - Probability distributions (**classification models only**) - - - `output_type="probas"` only works with classification models. - - - -#### Returns - - - Echo of the model ID used for prediction. - - - - Prediction results based on the specified `output_type`. - From 78ea5cb03000d0c13f600d1823cd2e268aff291e Mon Sep 17 00:00:00 2001 From: Dominik Safaric Date: Thu, 5 Mar 2026 21:09:15 +0100 Subject: [PATCH 2/3] Minor fixes --- agentic/mcp.mdx | 8 ++++++++ agentic/tool-use.mdx | 38 +++++++++++++++++++------------------- 2 files changed, 27 insertions(+), 19 deletions(-) diff --git a/agentic/mcp.mdx b/agentic/mcp.mdx index e009e1f..0129119 100644 --- a/agentic/mcp.mdx +++ b/agentic/mcp.mdx @@ -19,3 +19,11 @@ It integrates with popular AI assistants like Claude, enabling you to run predic To use the TabPFN MCP you need a Prior Labs account. You can sign up or log in at [ux.priorlabs.ai](https://ux.priorlabs.ai). + +## Getting Started + + + + A churn prediction pipeline with an AI agent using TabPFN MCP and Databricks Delta tables. + + diff --git a/agentic/tool-use.mdx b/agentic/tool-use.mdx index a501a8c..8f4d81a 100644 --- a/agentic/tool-use.mdx +++ b/agentic/tool-use.mdx @@ -3,33 +3,33 @@ title: "Tool Use" description: "" --- -The TabPFN MCP Server exposes a number of tools for performing classification and regression on Prior Labs’ +The TabPFN MCP server exposes a number of tools for performing classification and regression on Prior Labs’ managed GPU infrastructure: ### `upload_dataset` -Get a secure upload link for your dataset. We recommend this for most workflows: your data is sent directly to storage instead of through the chat, so TabPFN can handle larger datasets without running into context limits or long execution time. +Get a secure upload URL for your dataset. We recommend this for most workflows: your data is sent directly to cloud storage instead of through the chat, so the agent can handle larger datasets without running into context limits or long execution time. -Uploading the file to the link requires a sandbox execution environment and outbound network access. Not all MCP clients support this, and some may require a paid plan. +Uploading the file to the URL requires a sandbox execution environment and outbound network access. Not all MCP clients support this, and some may require a paid plan. -This tool returns a `dataset_id` and `upload_url` - valid for 60 minutes. Call this tool once for separately uploading the training set and test dataset. +This tool returns a `dataset_id` and `upload_url` - valid for 60 minutes. Call this tool separately for uploading the training set and test dataset. #### Required Parameters - Which kind of file you’re uploading. + The kind of dataset you're uploading. - - `"train.csv"` — Training data (the data the model learns from) - - `"test.csv"` — Test data (the data you want predictions for) + - `"train.csv"` — Training data + - `"test.csv"` — Test data, predictions will be generated for `fit_and_predict_from_dataset` Fit the TabPFN-2.5 model on your pre-uploaded data and generate predictions. Use this tool when you want to fit a new model from scratch. -Upload both CSVs with `upload_dataset` first, then pass the two dataset IDs here. +Upload both dataset CSV files with `upload_dataset` first, then pass the two dataset IDs here. #### Required Parameters @@ -42,14 +42,14 @@ Upload both CSVs with `upload_dataset` first, then pass the two dataset IDs here - The name of the column you want to predict in the training file — e.g. `"price"` or `"churned"`. + The name of the target column in the training dataset — e.g. `"price"` or `"churned"`. The type of the predictive task. - - `"classification"` — Predicting a category or class, including probability distributions - - `"regression"` — Predicting a continuous value + - `"classification"` — Predict a category or class, including probability distributions + - `"regression"` — Predict a continuous value #### Optional Parameters @@ -62,12 +62,12 @@ Upload both CSVs with `upload_dataset` first, then pass the two dataset IDs here Run predictions with a previously model, using a different test set. -Use `upload_dataset` to upload your new test dataset file, then pass that `dataset_id` and the model ID you got from a previous `fit_and_predict_*` call. The test CSV must have the same set of features the model was fitted on. +Use `upload_dataset` to upload your new test dataset file, then pass that `dataset_id` and the `model_id` you got from a previous `fit_and_predict_*` call. The test CSV must have the same set of features the model was fitted on. #### Required Parameters - The model ID of the previously model, from `fit_and_predict_from_dataset` or `fit_and_predict_inline`. + The model ID of the previously fitted model, from `fit_and_predict_from_dataset` or `fit_and_predict_inline`. @@ -77,8 +77,8 @@ Use `upload_dataset` to upload your new test dataset file, then pass that `datas The type of the predictive task. - - `"classification"` — Predicting a category or class, including probability distributions - - `"regression"` — Predicting a continuous value + - `"classification"` — Predict a category or class, including probability distributions + - `"regression"` — Predict a continuous value #### Optional Parameters @@ -142,8 +142,8 @@ Best for small datasets that fit in the conversation without running into contex The type of the predictive task. - - `"classification"` — Predicting a category or class, including probability distributions - - `"regression"` — Predicting a continuous value + - `"classification"` — Predict a category or class, including probability distributions + - `"regression"` — Predict a continuous value #### Optional Parameters @@ -203,8 +203,8 @@ Best for small datasets that fit in the conversation without running into contex The type of the predictive task. - - `"classification"` — Predicting a category or class, including probability distributions - - `"regression"` — Predicting a continuous value + - `"classification"` — Predict a category or class, including probability distributions + - `"regression"` — Predict a continuous value #### Optional Parameters From 5f645b2d63502b45da928ef8dfec5626618263e2 Mon Sep 17 00:00:00 2001 From: Dominik Safaric Date: Thu, 5 Mar 2026 21:32:20 +0100 Subject: [PATCH 3/3] Polishing docs --- agentic/mcp.mdx | 15 ++++++------- agentic/setup-guide.mdx | 4 +--- agentic/tool-use.mdx | 38 ++++++++++++++++++++++++-------- agentic/tutorials/databricks.mdx | 20 +++++++++++++++-- 4 files changed, 55 insertions(+), 22 deletions(-) diff --git a/agentic/mcp.mdx b/agentic/mcp.mdx index 0129119..a139247 100644 --- a/agentic/mcp.mdx +++ b/agentic/mcp.mdx @@ -1,13 +1,9 @@ --- title: "Model Context Protocol" -description: "" +description: "Connect AI tools to TabPFN using the Model Context Protocol (MCP) for natural language predictions on tabular data." --- -Connect your AI tools to TabPFN using the [Model Context Protocol](https://modelcontextprotocol.io/) (MCP). - -## What is TabPFN MCP? - -TabPFN MCP is a remote MCP with OAuth that gives AI tools secure access to TabPFN-2.5, our SOTA tabular foundation model. Our MCP server is available at: +TabPFN MCP is a remote [MCP]((https://modelcontextprotocol.io/)) with OAuth that gives AI tools secure access to TabPFN-2.5, our SOTA tabular foundation model. Our MCP server is available at: ``` https://api.priorlabs.ai/mcp/server @@ -22,8 +18,11 @@ To use the TabPFN MCP you need a Prior Labs account. You can sign up or log in a ## Getting Started - + + + Connect Claude, ChatGPT, Cursor, Codex CLI, or n8n to the TabPFN MCP server. + A churn prediction pipeline with an AI agent using TabPFN MCP and Databricks Delta tables. - + \ No newline at end of file diff --git a/agentic/setup-guide.mdx b/agentic/setup-guide.mdx index 680a2cb..58ad257 100644 --- a/agentic/setup-guide.mdx +++ b/agentic/setup-guide.mdx @@ -1,10 +1,8 @@ --- title: "Setup Guide" -description: "" +description: "Step-by-step instructions for connecting Claude, ChatGPT, Cursor, Codex CLI, and n8n to the TabPFN MCP server." --- -Connect your AI client to TabPFN MCP and authorize access to run inference through a natural language interface. - #### Claude Code ```bash diff --git a/agentic/tool-use.mdx b/agentic/tool-use.mdx index 8f4d81a..c02024f 100644 --- a/agentic/tool-use.mdx +++ b/agentic/tool-use.mdx @@ -1,6 +1,6 @@ --- title: "Tool Use" -description: "" +description: "Reference for all tools exposed by the TabPFN MCP server." --- The TabPFN MCP server exposes a number of tools for performing classification and regression on Prior Labs’ @@ -19,13 +19,13 @@ This tool returns a `dataset_id` and `upload_url` - valid for 60 minutes. Call t #### Required Parameters - The kind of dataset you're uploading. + The filename for the dataset. Must be `train.csv` for training data or `test.csv` for test data. - `"train.csv"` — Training data - `"test.csv"` — Test data, predictions will be generated for -`fit_and_predict_from_dataset` +### `fit_and_predict_from_dataset` Fit the TabPFN-2.5 model on your pre-uploaded data and generate predictions. Use this tool when you want to fit a new model from scratch. @@ -58,9 +58,19 @@ Upload both dataset CSV files with `upload_dataset` first, then pass the two dat Prediction output type, default `"preds"` for classification and `"mean"` for regression. +#### Returns + + + Unique ID for the fitted model. Save this value to reuse the model later with predict tools. + + + + Prediction results in the format specified by `output_type`. For classification with `"preds"`, returns a 1D array of class labels. With `"probas"`, returns a 2D array of class probabilities. For regression with `"mean"`, returns a 1D array of predicted values. + + ### `predict_from_dataset` -Run predictions with a previously model, using a different test set. +Run predictions with a previously fitted model on a new test set. Use `upload_dataset` to upload your new test dataset file, then pass that `dataset_id` and the `model_id` you got from a previous `fit_and_predict_*` call. The test CSV must have the same set of features the model was fitted on. @@ -87,6 +97,16 @@ Use `upload_dataset` to upload your new test dataset file, then pass that `datas Prediction output type, default `"preds"` for classification and `"mean"` for regression. +#### Returns + + + Echo of the model ID used for prediction. + + + + Prediction results in the format specified by `output_type`. For classification with `"preds"`, returns a 1D array of class labels. With `"probas"`, returns a 2D array of class probabilities. For regression with `"mean"`, returns a 1D array of predicted values. + + ### `fit_and_predict_inline` Use this tool when you want to fit a new model from scratch. It fits on your data and immediately returns predictions for your test set, along with a `model_id` for future reuse. @@ -155,18 +175,18 @@ Best for small datasets that fit in the conversation without running into contex #### Returns - Unique identifier for the fitted model. **Save this value** to reuse the model later with the `predict` tool. + Unique ID for the fitted model. Save this value to reuse the model later with predict tools. - Prediction results based on the specified `output_type`. Format varies by task type and output type. + Prediction results in the format specified by `output_type`. For classification with `"preds"`, returns a 1D array of class labels. With `"probas"`, returns a 2D array of class probabilities. For regression with `"mean"`, returns a 1D array of predicted values. ### `predict` Generate new predictions using a previously fitted TabPFN model. -Use this tool after calling `fit_and_predict` to make predictions on new data using an existing model. +Use this tool after calling `fit_and_predict_*` to make predictions on new data using an existing model. Best for small datasets that fit in the conversation without running into context limits. @@ -175,7 +195,7 @@ Best for small datasets that fit in the conversation without running into contex #### Required Parameters - Identifier of a previously fitted model, returned from `fit_and_predict`. + ID of a previously fitted model, returned from `fit_and_predict_*`. ```python @@ -220,5 +240,5 @@ Best for small datasets that fit in the conversation without running into contex - Prediction results based on the specified `output_type`. + Prediction results in the format specified by `output_type`. Same structure as `fit_and_predict_inline`. \ No newline at end of file diff --git a/agentic/tutorials/databricks.mdx b/agentic/tutorials/databricks.mdx index f69313c..5cadbce 100644 --- a/agentic/tutorials/databricks.mdx +++ b/agentic/tutorials/databricks.mdx @@ -1,6 +1,6 @@ --- title: "Databricks" -description: "" +description: "Build a customer churn prediction pipeline using an AI agent, TabPFN MCP, and Databricks Delta tables." --- This tutorial shows you how to connect your Databricks Delta table to TabPFN's MCP server @@ -11,6 +11,10 @@ and run a churn prediction pipeline with an AI agent. - Databricks workspace with a SQL warehouse running - Create a Delta table named `customer_analytics` - Get your [Prior Labs API key](/api-reference/getting-started#1-get-your-access-token) +- Install the required Python packages: +```bash +pip install openai-agents databricks-sdk databricks-sql-connector httpx pandas scikit-learn +``` ### Overview Here's what happens end-to-end: @@ -372,4 +376,16 @@ async def main() -> None: asyncio.run(main()) -``` \ No newline at end of file +``` + +### Running the script + +Set the required environment variables and run the script: +```bash +export PRIORLABS_API_KEY="your-prior-labs-api-key" +export DATABRICKS_TABLE="catalog.schema.customer_analytics" + +python example.py +``` + +On success, the agent prints a churn model evaluation summary with ROC-AUC, accuracy, F1, precision, and recall for the churned class. \ No newline at end of file