Skip to content

ci: add MCP tool name validation against live servers#95

Open
jordigilh wants to merge 3 commits into
RHEcosystemAppEng:mainfrom
jordigilh:feat/mcp-tool-validation-ci
Open

ci: add MCP tool name validation against live servers#95
jordigilh wants to merge 3 commits into
RHEcosystemAppEng:mainfrom
jordigilh:feat/mcp-tool-validation-ci

Conversation

@jordigilh
Copy link
Copy Markdown
Contributor

Summary

  • Adds a CI check that starts each pack's MCP servers (from mcps.json) via podman and cross-references the allowed-tools declared in SKILL.md frontmatter against the actual tools exposed by the server
  • Catches tool name mismatches (e.g., pod_list vs pods_list) that pass all static linters but silently break skills at runtime
  • Uses a Kind cluster to provide a valid kubeconfig for MCP server startup

Motivation

While working on PRs #79 and #80, we discovered 6 tool name mismatches between our SKILL.md declarations and the actual MCP server tool registry. These went undetected by the existing skill-linter and compliance-check because they only validate static structure, not runtime tool availability.

SKILL.md declared Actual MCP tool
pod_list pods_list
pod_logs pods_log
get_metric_names (doesn't exist)
get_metric_metadata (doesn't exist)
get_series (doesn't exist)
query prometheus_query

Components

  • scripts/validate-mcp-tools.sh — Starts container-based MCP servers via podman, sends JSON-RPC initialize + tools/list, and validates each skill's allowed-tools
  • .github/workflows/mcp-tool-validation.yml — GitHub Actions workflow using Kind cluster + podman, triggers on changes to mcps.json or SKILL.md files

Test plan

  • Tested locally against rh-developer pack — correctly detected 6 mismatches in incident-triage (pre-fix) and passed debug-scc/debug-rbac (post-fix)
  • Verify workflow runs successfully in CI with Kind cluster
  • Verify credential-gated MCP servers (github, lightspeed-mcp) are gracefully skipped

Made with Cursor

jordigilh and others added 2 commits May 8, 2026 13:28
Adds a CI check that starts each pack's MCP servers (from mcps.json)
and cross-references the `allowed-tools` declared in SKILL.md
frontmatter against the actual tools exposed by the server.

This catches tool name mismatches (e.g., pod_list vs pods_list) that
would silently break skills at runtime but pass all static linters.

Components:
- scripts/validate-mcp-tools.sh: bash script that starts container-
  based MCP servers via podman, sends JSON-RPC initialize + tools/list,
  and validates each skill's allowed-tools against the response
- .github/workflows/mcp-tool-validation.yml: GitHub Actions workflow
  using Kind cluster + podman, triggers on changes to mcps.json or
  SKILL.md files

Co-authored-by: Cursor <cursoragent@cursor.com>
Replace bash script with a pure-Python implementation for cross-OS
portability. Key improvements:

- Response-based MCP communication via subprocess + select (no sleeps)
- Explicit image pulling with error handling
- JSON-RPC pagination support (follows nextCursor)
- Levenshtein-based "did you mean?" suggestions for mismatched tools
- Graceful handling of servers that exit immediately (missing creds)
- File path and line number in error output
- Logging of skipped non-container MCP servers

Workflow hardening:
- Pin all GitHub Actions to SHA
- Add permissions: contents: read
- Add concurrency group with cancel-in-progress
- Add timeout-minutes: 10
- Explicit KUBECONFIG capture from Kind
- Add workflow_dispatch pack input for manual runs
- Inline pack detection (remove dependency on detect-changed-packs.sh)

Co-authored-by: Cursor <cursoragent@cursor.com>
@dmartinol
Copy link
Copy Markdown
Collaborator

Hey @jordigilh thanks for your contribution to proactively catch these mismtaches!
To avoid risky changes, we decided to postpone commits that could affect the skill's execution post-Summit, hop e you can understand.

Apart from that, 2 comments on this awesome PR:
1- we should improve the suggestion mechanism to suggest matching names in case of missed toolset or wrong toolset or tool name. Maybe using semantic search with a small embedding model we can avoid coding complex match logic?
2- try to extend the check for non containerized commands (uvx, npm)

@jordigilh
Copy link
Copy Markdown
Contributor Author

Thanks @dmartinol — totally understand the post-Summit freeze. No rush on merging this one.

On your two suggestions:

  1. Semantic search for suggestions — interesting idea. The current Levenshtein approach catches simple typos (pod_listpods_list) but misses semantic gaps like queryprometheus_query. An embedding model would handle that better. I'll explore lightweight options (e.g., a small sentence-transformer or even TF-IDF over tool names + descriptions) as a follow-up so we don't add heavy dependencies to CI.

  2. Non-containerized commands (npx, uvx) — agreed, this would future-proof the check. Right now no skills declare allowed-tools referencing tools from npx-based servers, so there's zero false-positive risk from skipping them. But as more skills adopt allowed-tools, we'll want coverage. I'll look into adding Node.js to the CI environment and handling the different startup patterns.

Happy to iterate on both after Summit. Let me know if there's anything else you'd like adjusted in the meantime.

@dmartinol
Copy link
Copy Markdown
Collaborator

2. Non-containerized commands (npx, uvx) — agreed, this would future-proof the check. Right now no skills declare allowed-tools referencing tools from npx-based servers, so there's zero false-positive risk from skipping them. But as more skills adopt allowed-tools, we'll want coverage. I'll look into adding Node.js to the CI environment and handling the different startup patterns.

Thanks for being available to extend this solution! And yes, we'll also extend the coverage of allowed-tools fields early next week.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls also add this step under the validate make target

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 527a2f7. Here's what this commit adds:

  1. validate_mcp_tools.py added to the main validate target — runs automatically with make validate.

  2. Graceful prerequisite skip — the script checks for podman on PATH and a valid KUBECONFIG at startup. If either is missing it prints a SKIP: message and exits 0, so make validate (and the compliance-check CI workflow that calls it) never breaks in environments without container tooling.

  3. Standalone validate-mcp-tools target — for running the check in isolation, with optional PACK= filter: make validate-mcp-tools PACK=rh-developer.

  4. Improved suggestions — alongside Levenshtein distance, the suggestion engine now uses substring/prefix matching and component overlap (e.g. pods-listpods_list) to catch more mismatches.

@jordigilh
Copy link
Copy Markdown
Contributor Author

@dmartinol @r2dedios Thanks for the feedback on integrating this into the validate make target. Before pushing changes, I'd like to get your input on the approach since validate_mcp_tools.py has heavier prerequisites (podman + KUBECONFIG/Kind) than the existing validation steps.

Option A: Add it to make validate with graceful skip

The script would detect missing prerequisites (no podman, no KUBECONFIG) and exit 0 with a warning instead of failing. This keeps make validate and compliance-check.yml working as-is — the MCP step simply auto-skips when the infra isn't there. Locally, devs with podman get the full check; those without get a skip message.

  • Pro: Single make validate command always works everywhere
  • Con: Silent skipping can mask the fact that MCP validation didn't actually run

Option B: Standalone target + keep workflows separate

Add make validate-mcp-tools as its own target (supports PACK=rh-developer), but don't add it to make validate. The existing compliance-check.yml stays fast and lightweight (static checks only), while mcp-tool-validation.yml handles the runtime validation with its own podman/Kind setup and targeted triggers (mcps.json, SKILL.md changes).

  • Pro: Clean separation of static vs. runtime validation; no extra CI minutes on docs-only PRs
  • Con: make validate doesn't cover MCP tools — devs need to know to run make validate-mcp-tools separately

I'm leaning toward Option B since MCP validation is fundamentally a runtime check (starts containers, JSON-RPC handshake) while the other validate steps are static. But happy to go with either — what's your preference?

@dmartinol
Copy link
Copy Markdown
Collaborator

I'm leaning toward Option B since MCP validation is fundamentally a runtime check (starts containers, JSON-RPC handshake) while the other validate steps are static. But happy to go with either — what's your preference?

Hey @jordigilh, good catch!
Whatever the selected option (initially I opted for A, because I did not experience long waiting time during my tests, but B is fine as well), the important is to provide the developer with tools to identify the mismatches and to help him with the fix (e.g. extend the suggestion range).

BTW: other tools we should be considering are the CLI tools that are sometimes referred in the SKILL.md, like oc, but those are harder to be found with a python script.

- Add validate_mcp_tools.py to the main `validate` Makefile target
- Add standalone `validate-mcp-tools` target with PACK= filter support
- Add graceful prerequisite detection: script exits 0 with a SKIP
  message when podman or KUBECONFIG is unavailable, so `make validate`
  never breaks in environments without container tooling
- Improve tool name suggestions with substring/prefix matching and
  component overlap scoring alongside the existing Levenshtein distance

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants