feat(mcp): proxy Kapa knowledge MCP server for inline Airbyte doc search#988
feat(mcp): proxy Kapa knowledge MCP server for inline Airbyte doc search#988Aaron ("AJ") Steers (aaronsteers) wants to merge 1 commit intomainfrom
Conversation
Mount Kapa's hosted MCP server as a proxy when KAPA_API_KEY is set, giving agents inline access to Airbyte documentation search alongside native PyAirbyte MCP tools. - Mount Kapa proxy with 'kapa' prefix to namespace tools - Gracefully skip mounting if KAPA_API_KEY is not configured - Configurable via KAPA_MCP_SERVER_URL and KAPA_API_KEY env vars Refs: airbytehq/airbyte-internal-issues#15904 Co-Authored-By: AJ Steers <aj@airbyte.io>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. 💡 Show Tips and TricksTesting This PyAirbyte VersionYou can test this version of PyAirbyte using the following: # Run PyAirbyte CLI from this branch:
uvx --from 'git+https://github.com/airbytehq/PyAirbyte.git@devin/1772232983-kapa-mcp-proxy' pyairbyte --help
# Install PyAirbyte from this branch for development:
pip install 'git+https://github.com/airbytehq/PyAirbyte.git@devin/1772232983-kapa-mcp-proxy'PR Slash CommandsAirbyte Maintainers can execute the following slash commands on your PR:
📚 Show Repo GuidanceHelpful ResourcesCommunity SupportQuestions? Join the #pyairbyte channel in our Slack workspace. |
📝 WalkthroughWalkthroughThis change conditionally integrates a Kapa knowledge MCP proxy into the Airbyte MCP server during initialization. When the Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~15 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
airbyte/mcp/server.py (2)
99-107: Consider documenting rate limits in the docstring?The PR notes mention Kapa's rate limits (40 req/hr, 200 req/day). Since there's no client-side rate limiting implemented, it might be helpful for future maintainers to have a quick note in this docstring about those limits, so they understand the constraint if users report issues with throttling. Just a thought!
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@airbyte/mcp/server.py` around lines 99 - 107, The _mount_kapa_proxy docstring should mention Kapa's documented rate limits and lack of client-side throttling so maintainers know to expect server-side throttling; update the _mount_kapa_proxy(app: FastMCP) docstring to add a short note stating Kapa enforces ~40 requests/hour and ~200 requests/day and that this code does not implement client-side rate limiting (include a brief TODO suggesting adding a retry/backoff or local rate limiter in the future).
109-114: Consider using logging instead of print for the diagnostic message?This message will appear on every server startup when
KAPA_API_KEYisn't set. For users who never intend to use Kapa integration, this might be a bit noisy. Would usinglogging.debug()orlogging.info()give better control over verbosity, wdyt?That said, I see the existing code uses
print(..., file=sys.stderr)consistently throughout the file, so this does follow the established pattern.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@airbyte/mcp/server.py` around lines 109 - 114, Replace the stderr print used when kapa_api_key is missing with a logging call so verbosity can be controlled; in the module initialize a logger (e.g., logger = logging.getLogger(__name__)) if not already present and change the print in the kapa_api_key check to logger.debug(...) or logger.info(...), keeping the same message text and return path; update any imports if needed to include logging and ensure the logging-based message is used in place of the print in the kaps API key check (reference the kapa_api_key existence check in server.py).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@airbyte/mcp/server.py`:
- Around line 117-130: Wrap the FastMCP.as_proxy(...) creation and
app.mount(kapa_proxy, prefix="kapa") call inside a try/except that catches
Exception (e.g., in _mount_kapa_proxy), so failures creating or mounting the
Kapa proxy do not crash server startup; on exception log a warning including the
exception details (use the module/server logger or logging.warning) and skip
mounting (return/continue) so the server starts without Kapa. Ensure you
reference FastMCP.as_proxy and app.mount in the handler so only the optional
integration is protected.
---
Nitpick comments:
In `@airbyte/mcp/server.py`:
- Around line 99-107: The _mount_kapa_proxy docstring should mention Kapa's
documented rate limits and lack of client-side throttling so maintainers know to
expect server-side throttling; update the _mount_kapa_proxy(app: FastMCP)
docstring to add a short note stating Kapa enforces ~40 requests/hour and ~200
requests/day and that this code does not implement client-side rate limiting
(include a brief TODO suggesting adding a retry/backoff or local rate limiter in
the future).
- Around line 109-114: Replace the stderr print used when kapa_api_key is
missing with a logging call so verbosity can be controlled; in the module
initialize a logger (e.g., logger = logging.getLogger(__name__)) if not already
present and change the print in the kapa_api_key check to logger.debug(...) or
logger.info(...), keeping the same message text and return path; update any
imports if needed to include logging and ensure the logging-based message is
used in place of the print in the kaps API key check (reference the kapa_api_key
existence check in server.py).
| kapa_proxy = FastMCP.as_proxy( | ||
| { | ||
| "mcpServers": { | ||
| "kapa": { | ||
| "url": kapa_url, | ||
| "transport": "http", | ||
| "headers": { | ||
| "Authorization": f"Bearer {kapa_api_key}", | ||
| }, | ||
| } | ||
| } | ||
| } | ||
| ) | ||
| app.mount(kapa_proxy, prefix="kapa") |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
🧩 Analysis chain
🏁 Script executed:
cat airbyte/mcp/server.py | head -150Repository: airbytehq/PyAirbyte
Length of output: 5110
🏁 Script executed:
# Check for fastmcp dependency version
find . -name "*.txt" -o -name "*.toml" -o -name "*.cfg" -o -name "*.in" | xargs grep -l "fastmcp" 2>/dev/null | head -5Repository: airbytehq/PyAirbyte
Length of output: 79
🏁 Script executed:
# Search for any existing error handling patterns in the file
rg "try:" airbyte/mcp/server.py -A 5 | head -50Repository: airbytehq/PyAirbyte
Length of output: 320
🏁 Script executed:
grep -A 2 -B 2 "fastmcp" pyproject.toml | head -20Repository: airbytehq/PyAirbyte
Length of output: 957
🏁 Script executed:
# Check if there are any other usages of FastMCP or as_proxy in the codebase
rg "FastMCP|as_proxy" --type pyRepository: airbytehq/PyAirbyte
Length of output: 1791
🏁 Script executed:
# Look for any test files related to mcp
find . -path "*/test*" -name "*mcp*" -type f | head -10Repository: airbytehq/PyAirbyte
Length of output: 154
🌐 Web query:
FastMCP 2.11.3 as_proxy lazy initialization documentation
💡 Result:
FastMCP 2.11.3 (released 2025-08-11) is documented as having proxy support via FastMCP.as_proxy(), but the docs/API reference do not describe an explicit lazy= / “lazy initialization” option for as_proxy itself. [1][2]
What is documented (and effectively gives you “lazy” backend startup semantics) is that proxying is driven by a client factory / disconnected client, so the backend session/handshake happens when a request is made (e.g., tools/list, tools/call), not at proxy object construction:
- Recommended pattern: pass a disconnected
ProxyClient(...)intoFastMCP.as_proxy(...)(fresh isolated backend session per request). [3] - For explicit control, build a proxy with a
client_factory(fresh or shared client) viaFastMCPProxy/create_proxy(...). [3][2]
If what you mean by “lazy initialization” is delaying the MCP initialize handshake of a Client, that is documented via auto_initialize=False (but that applies to using Client, not a special as_proxy flag). [4]
Key docs to read:
- Proxy servers guide (covers
FastMCP.as_proxy,ProxyClient, session behavior, andFastMCPProxy+client_factory). [3] - SDK reference for
create_proxy(...)(recommended helper) andas_proxy(...)signature. [2][1] - Client “auto initialization” control (
auto_initialize=False). [4]
Sources
[1] https://fastmcp.wiki/en/python-sdk/fastmcp-server-server
[2] https://fastmcp.mintlify.app/python-sdk/fastmcp-server-server
[3] https://fastmcp.wiki/en/servers/proxy
[4] https://fastmcp.mintlify.app/clients/client
🏁 Script executed:
# Check test files for any hints about error handling or proxy behavior
cat tests/unit_tests/test_mcp_connector_registry.py | head -50Repository: airbytehq/PyAirbyte
Length of output: 1603
🏁 Script executed:
# Search for any environment setup or initialization patterns in tests
rg "try|except|error" tests/unit_tests/test_mcp*.pyRepository: airbytehq/PyAirbyte
Length of output: 1012
Add error handling around proxy creation and mounting to prevent optional dependency from breaking server startup.
Since the Kapa integration is optional and _mount_kapa_proxy() is invoked at module level, if FastMCP.as_proxy() or app.mount() throws (invalid config, network issues, etc.), the entire server fails to start. Worth wrapping in try/except to gracefully skip Kapa when unavailable, wdyt?
🛡️ Proposed defensive wrapper
+ try:
kapa_proxy = FastMCP.as_proxy(
{
"mcpServers": {
"kapa": {
"url": kapa_url,
"transport": "http",
"headers": {
"Authorization": f"Bearer {kapa_api_key}",
},
}
}
}
)
app.mount(kapa_proxy, prefix="kapa")
print(
f"Kapa knowledge proxy mounted from {kapa_url}.",
file=sys.stderr,
)
+ except Exception as ex:
+ print(
+ f"Failed to mount Kapa knowledge proxy: {ex}",
+ file=sys.stderr,
+ )Note: FastMCP.as_proxy() in v2.11.3 uses a lazy proxy pattern—the backend session/handshake happens when a request is made (not at construction)—so as_proxy() itself won't trigger eager connections. However, the app.mount() call should still be protected since it's at initialization time and failures there would crash startup.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@airbyte/mcp/server.py` around lines 117 - 130, Wrap the FastMCP.as_proxy(...)
creation and app.mount(kapa_proxy, prefix="kapa") call inside a try/except that
catches Exception (e.g., in _mount_kapa_proxy), so failures creating or mounting
the Kapa proxy do not crash server startup; on exception log a warning including
the exception details (use the module/server logger or logging.warning) and skip
mounting (return/continue) so the server starts without Kapa. Ensure you
reference FastMCP.as_proxy and app.mount in the handler so only the optional
integration is protected.
Summary
Mounts Kapa's hosted MCP server as a proxy into the PyAirbyte MCP server (
airbyte-mcp), so agents get inline Airbyte documentation search alongside native PyAirbyte tools — without users needing to install a separate MCP server.When
KAPA_API_KEYis set, aFastMCP.as_proxy()proxy is created and mounted with akapaprefix. When not set, this is a no-op. The Kapa server URL defaults tohttps://airbyte.mcp.kapa.aiand is configurable viaKAPA_MCP_SERVER_URL.Follows the Kapa proxy tutorial pattern. Companion PR for connector-builder-mcp: https://github.com/airbytehq/connector-builder-mcp/pull/209
Refs: https://github.com/airbytehq/airbyte-internal-issues/issues/15904
Review & Testing Checklist for Human
FastMCP.as_proxyis lazy at startup:_mount_kapa_proxyruns at module load time. Confirm thatFastMCP.as_proxy()+mount()do NOT eagerly connect to the Kapa server — otherwise a misconfigured or unreachable URL withKAPA_API_KEYset could block/crash server startup.KAPA_API_KEY: Only the no-op path (no key) was verified. The actual proxy mount path was not tested end-to-end with real Kapa credentials. Recommended test: setKAPA_API_KEYand usepoe mcp-tool-test(or equivalent) to call the proxied Kapa tool and verify it returns results.prefix="kapa"namespacing: Verify that Kapa's tools appear with akapa_prefix (e.g.,kapa_search_knowledge_base) and don't collide with existing PyAirbyte tools.Notes
Summary by CodeRabbit