Skip to content

feat(mcp): proxy Kapa knowledge MCP server for inline Airbyte doc search#988

Draft
Aaron ("AJ") Steers (aaronsteers) wants to merge 1 commit intomainfrom
devin/1772232983-kapa-mcp-proxy
Draft

feat(mcp): proxy Kapa knowledge MCP server for inline Airbyte doc search#988
Aaron ("AJ") Steers (aaronsteers) wants to merge 1 commit intomainfrom
devin/1772232983-kapa-mcp-proxy

Conversation

@aaronsteers
Copy link
Contributor

@aaronsteers Aaron ("AJ") Steers (aaronsteers) commented Feb 27, 2026

Summary

Mounts Kapa's hosted MCP server as a proxy into the PyAirbyte MCP server (airbyte-mcp), so agents get inline Airbyte documentation search alongside native PyAirbyte tools — without users needing to install a separate MCP server.

When KAPA_API_KEY is set, a FastMCP.as_proxy() proxy is created and mounted with a kapa prefix. When not set, this is a no-op. The Kapa server URL defaults to https://airbyte.mcp.kapa.ai and is configurable via KAPA_MCP_SERVER_URL.

Follows the Kapa proxy tutorial pattern. Companion PR for connector-builder-mcp: https://github.com/airbytehq/connector-builder-mcp/pull/209

Refs: https://github.com/airbytehq/airbyte-internal-issues/issues/15904

Review & Testing Checklist for Human

  • Verify FastMCP.as_proxy is lazy at startup: _mount_kapa_proxy runs at module load time. Confirm that FastMCP.as_proxy() + mount() do NOT eagerly connect to the Kapa server — otherwise a misconfigured or unreachable URL with KAPA_API_KEY set could block/crash server startup.
  • Test with a real KAPA_API_KEY: Only the no-op path (no key) was verified. The actual proxy mount path was not tested end-to-end with real Kapa credentials. Recommended test: set KAPA_API_KEY and use poe mcp-tool-test (or equivalent) to call the proxied Kapa tool and verify it returns results.
  • Confirm prefix="kapa" namespacing: Verify that Kapa's tools appear with a kapa_ prefix (e.g., kapa_search_knowledge_base) and don't collide with existing PyAirbyte tools.
  • Rate limit awareness: Kapa has 40 req/hr and 200 req/day limits per API key. All proxied requests share the server's key. No client-side rate limiting is added. Decide if that's acceptable for initial rollout.

Notes

Summary by CodeRabbit

  • New Features
    • Added optional Kapa knowledge base integration to the Airbyte MCP server. When environment variables are configured with the necessary API credentials and server URL, the server automatically mounts the Kapa knowledge proxy during startup, providing enhanced knowledge capabilities. If credentials are not configured, the server operates with standard functionality unaffected.

Mount Kapa's hosted MCP server as a proxy when KAPA_API_KEY is set,
giving agents inline access to Airbyte documentation search alongside
native PyAirbyte MCP tools.

- Mount Kapa proxy with 'kapa' prefix to namespace tools
- Gracefully skip mounting if KAPA_API_KEY is not configured
- Configurable via KAPA_MCP_SERVER_URL and KAPA_API_KEY env vars

Refs: airbytehq/airbyte-internal-issues#15904
Co-Authored-By: AJ Steers <aj@airbyte.io>
@devin-ai-integration
Copy link
Contributor

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@github-actions
Copy link

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

💡 Show Tips and Tricks

Testing This PyAirbyte Version

You can test this version of PyAirbyte using the following:

# Run PyAirbyte CLI from this branch:
uvx --from 'git+https://github.com/airbytehq/PyAirbyte.git@devin/1772232983-kapa-mcp-proxy' pyairbyte --help

# Install PyAirbyte from this branch for development:
pip install 'git+https://github.com/airbytehq/PyAirbyte.git@devin/1772232983-kapa-mcp-proxy'

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

  • /fix-pr - Fixes most formatting and linting issues
  • /uv-lock - Updates uv.lock file
  • /test-pr - Runs tests with the updated PyAirbyte
  • /prerelease - Builds and publishes a prerelease version to PyPI
📚 Show Repo Guidance

Helpful Resources

Community Support

Questions? Join the #pyairbyte channel in our Slack workspace.

📝 Edit this welcome message.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 27, 2026

📝 Walkthrough

Walkthrough

This change conditionally integrates a Kapa knowledge MCP proxy into the Airbyte MCP server during initialization. When the KAPA_API_KEY environment variable is present, a proxy is mounted using a configurable server URL and API key-based authentication headers. If the variable is absent, the function logs a diagnostic message.

Changes

Cohort / File(s) Summary
Kapa MCP Proxy Integration
airbyte/mcp/server.py
Added conditional mounting of Kapa knowledge MCP proxy with environment-based configuration, authentication via Authorization header, and diagnostic logging for startup validation.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~15 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main change: adding a Kapa knowledge MCP server proxy for inline Airbyte documentation search within the MCP server.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch devin/1772232983-kapa-mcp-proxy

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
airbyte/mcp/server.py (2)

99-107: Consider documenting rate limits in the docstring?

The PR notes mention Kapa's rate limits (40 req/hr, 200 req/day). Since there's no client-side rate limiting implemented, it might be helpful for future maintainers to have a quick note in this docstring about those limits, so they understand the constraint if users report issues with throttling. Just a thought!

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@airbyte/mcp/server.py` around lines 99 - 107, The _mount_kapa_proxy docstring
should mention Kapa's documented rate limits and lack of client-side throttling
so maintainers know to expect server-side throttling; update the
_mount_kapa_proxy(app: FastMCP) docstring to add a short note stating Kapa
enforces ~40 requests/hour and ~200 requests/day and that this code does not
implement client-side rate limiting (include a brief TODO suggesting adding a
retry/backoff or local rate limiter in the future).

109-114: Consider using logging instead of print for the diagnostic message?

This message will appear on every server startup when KAPA_API_KEY isn't set. For users who never intend to use Kapa integration, this might be a bit noisy. Would using logging.debug() or logging.info() give better control over verbosity, wdyt?

That said, I see the existing code uses print(..., file=sys.stderr) consistently throughout the file, so this does follow the established pattern.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@airbyte/mcp/server.py` around lines 109 - 114, Replace the stderr print used
when kapa_api_key is missing with a logging call so verbosity can be controlled;
in the module initialize a logger (e.g., logger = logging.getLogger(__name__))
if not already present and change the print in the kapa_api_key check to
logger.debug(...) or logger.info(...), keeping the same message text and return
path; update any imports if needed to include logging and ensure the
logging-based message is used in place of the print in the kaps API key check
(reference the kapa_api_key existence check in server.py).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@airbyte/mcp/server.py`:
- Around line 117-130: Wrap the FastMCP.as_proxy(...) creation and
app.mount(kapa_proxy, prefix="kapa") call inside a try/except that catches
Exception (e.g., in _mount_kapa_proxy), so failures creating or mounting the
Kapa proxy do not crash server startup; on exception log a warning including the
exception details (use the module/server logger or logging.warning) and skip
mounting (return/continue) so the server starts without Kapa. Ensure you
reference FastMCP.as_proxy and app.mount in the handler so only the optional
integration is protected.

---

Nitpick comments:
In `@airbyte/mcp/server.py`:
- Around line 99-107: The _mount_kapa_proxy docstring should mention Kapa's
documented rate limits and lack of client-side throttling so maintainers know to
expect server-side throttling; update the _mount_kapa_proxy(app: FastMCP)
docstring to add a short note stating Kapa enforces ~40 requests/hour and ~200
requests/day and that this code does not implement client-side rate limiting
(include a brief TODO suggesting adding a retry/backoff or local rate limiter in
the future).
- Around line 109-114: Replace the stderr print used when kapa_api_key is
missing with a logging call so verbosity can be controlled; in the module
initialize a logger (e.g., logger = logging.getLogger(__name__)) if not already
present and change the print in the kapa_api_key check to logger.debug(...) or
logger.info(...), keeping the same message text and return path; update any
imports if needed to include logging and ensure the logging-based message is
used in place of the print in the kaps API key check (reference the kapa_api_key
existence check in server.py).

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fa52519 and 9d71455.

📒 Files selected for processing (1)
  • airbyte/mcp/server.py

Comment on lines +117 to +130
kapa_proxy = FastMCP.as_proxy(
{
"mcpServers": {
"kapa": {
"url": kapa_url,
"transport": "http",
"headers": {
"Authorization": f"Bearer {kapa_api_key}",
},
}
}
}
)
app.mount(kapa_proxy, prefix="kapa")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat airbyte/mcp/server.py | head -150

Repository: airbytehq/PyAirbyte

Length of output: 5110


🏁 Script executed:

# Check for fastmcp dependency version
find . -name "*.txt" -o -name "*.toml" -o -name "*.cfg" -o -name "*.in" | xargs grep -l "fastmcp" 2>/dev/null | head -5

Repository: airbytehq/PyAirbyte

Length of output: 79


🏁 Script executed:

# Search for any existing error handling patterns in the file
rg "try:" airbyte/mcp/server.py -A 5 | head -50

Repository: airbytehq/PyAirbyte

Length of output: 320


🏁 Script executed:

grep -A 2 -B 2 "fastmcp" pyproject.toml | head -20

Repository: airbytehq/PyAirbyte

Length of output: 957


🏁 Script executed:

# Check if there are any other usages of FastMCP or as_proxy in the codebase
rg "FastMCP|as_proxy" --type py

Repository: airbytehq/PyAirbyte

Length of output: 1791


🏁 Script executed:

# Look for any test files related to mcp
find . -path "*/test*" -name "*mcp*" -type f | head -10

Repository: airbytehq/PyAirbyte

Length of output: 154


🌐 Web query:

FastMCP 2.11.3 as_proxy lazy initialization documentation

💡 Result:

FastMCP 2.11.3 (released 2025-08-11) is documented as having proxy support via FastMCP.as_proxy(), but the docs/API reference do not describe an explicit lazy= / “lazy initialization” option for as_proxy itself. [1][2]

What is documented (and effectively gives you “lazy” backend startup semantics) is that proxying is driven by a client factory / disconnected client, so the backend session/handshake happens when a request is made (e.g., tools/list, tools/call), not at proxy object construction:

  • Recommended pattern: pass a disconnected ProxyClient(...) into FastMCP.as_proxy(...) (fresh isolated backend session per request). [3]
  • For explicit control, build a proxy with a client_factory (fresh or shared client) via FastMCPProxy / create_proxy(...). [3][2]

If what you mean by “lazy initialization” is delaying the MCP initialize handshake of a Client, that is documented via auto_initialize=False (but that applies to using Client, not a special as_proxy flag). [4]

Key docs to read:

  • Proxy servers guide (covers FastMCP.as_proxy, ProxyClient, session behavior, and FastMCPProxy + client_factory). [3]
  • SDK reference for create_proxy(...) (recommended helper) and as_proxy(...) signature. [2][1]
  • Client “auto initialization” control (auto_initialize=False). [4]

Sources
[1] https://fastmcp.wiki/en/python-sdk/fastmcp-server-server
[2] https://fastmcp.mintlify.app/python-sdk/fastmcp-server-server
[3] https://fastmcp.wiki/en/servers/proxy
[4] https://fastmcp.mintlify.app/clients/client


🏁 Script executed:

# Check test files for any hints about error handling or proxy behavior
cat tests/unit_tests/test_mcp_connector_registry.py | head -50

Repository: airbytehq/PyAirbyte

Length of output: 1603


🏁 Script executed:

# Search for any environment setup or initialization patterns in tests
rg "try|except|error" tests/unit_tests/test_mcp*.py

Repository: airbytehq/PyAirbyte

Length of output: 1012


Add error handling around proxy creation and mounting to prevent optional dependency from breaking server startup.

Since the Kapa integration is optional and _mount_kapa_proxy() is invoked at module level, if FastMCP.as_proxy() or app.mount() throws (invalid config, network issues, etc.), the entire server fails to start. Worth wrapping in try/except to gracefully skip Kapa when unavailable, wdyt?

🛡️ Proposed defensive wrapper
+    try:
         kapa_proxy = FastMCP.as_proxy(
             {
                 "mcpServers": {
                     "kapa": {
                         "url": kapa_url,
                         "transport": "http",
                         "headers": {
                             "Authorization": f"Bearer {kapa_api_key}",
                         },
                     }
                 }
             }
         )
         app.mount(kapa_proxy, prefix="kapa")
         print(
             f"Kapa knowledge proxy mounted from {kapa_url}.",
             file=sys.stderr,
         )
+    except Exception as ex:
+        print(
+            f"Failed to mount Kapa knowledge proxy: {ex}",
+            file=sys.stderr,
+        )

Note: FastMCP.as_proxy() in v2.11.3 uses a lazy proxy pattern—the backend session/handshake happens when a request is made (not at construction)—so as_proxy() itself won't trigger eager connections. However, the app.mount() call should still be protected since it's at initialization time and failures there would crash startup.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@airbyte/mcp/server.py` around lines 117 - 130, Wrap the FastMCP.as_proxy(...)
creation and app.mount(kapa_proxy, prefix="kapa") call inside a try/except that
catches Exception (e.g., in _mount_kapa_proxy), so failures creating or mounting
the Kapa proxy do not crash server startup; on exception log a warning including
the exception details (use the module/server logger or logging.warning) and skip
mounting (return/continue) so the server starts without Kapa. Ensure you
reference FastMCP.as_proxy and app.mount in the handler so only the optional
integration is protected.

@github-actions
Copy link

PyTest Results (Fast Tests Only, No Creds)

343 tests  ±0   343 ✅ ±0   5m 42s ⏱️ +3s
  1 suites ±0     0 💤 ±0 
  1 files   ±0     0 ❌ ±0 

Results for commit 9d71455. ± Comparison against base commit fa52519.

@github-actions
Copy link

PyTest Results (Full)

413 tests  ±0   396 ✅ ±0   25m 18s ⏱️ +19s
  1 suites ±0    17 💤 ±0 
  1 files   ±0     0 ❌ ±0 

Results for commit 9d71455. ± Comparison against base commit fa52519.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant