fix: Debug and enhance Exgentic A2A runner#10
Open
yoavkatz wants to merge 36 commits intokagenti:mainfrom
Open
fix: Debug and enhance Exgentic A2A runner#10yoavkatz wants to merge 36 commits intokagenti:mainfrom
yoavkatz wants to merge 36 commits intokagenti:mainfrom
Conversation
Implement complete test harness for Exgentic benchmarks following the flow described in kagenti/kagenti#963 Key features: - MCP client using official Python SDK with streamable HTTP transport - Sequential session processing with full lifecycle management - A2A protocol integration for agent communication - OpenTelemetry instrumentation for metrics and tracing - Comprehensive configuration and documentation Components: - mcp_client.py: MCP protocol client for Exgentic server - exgentic_adapter.py: High-level adapter for session management - runner.py: Main orchestration with telemetry - config.py: Configuration management - prompt.py: Prompt builder with session_id injection - otel.py: OpenTelemetry setup - a2a_client.py: A2A protocol client (from appworld_a2a_runner) Testing: - Successfully connects to Exgentic MCP server (tau2 benchmark) - Verified session creation with 114 available tasks - Proper error handling and logging configuration Documentation: - README.md: Complete usage guide - QUICKSTART.md: Quick start for Kagenti cluster - Architecture and implementation docs Signed-off-by: Yoav Katz <katz@il.ibm.com>
Changes: - Add list_tasks() method to MCPClient to fetch all available task IDs - Add get_task_ids() method to ExgenticAdapter - Update iterate_sessions() to accept task_ids list and respect max_tasks - Update create_session() to accept optional task_id parameter - Update runner to fetch task IDs first, then iterate over them - Remove debug exit(99) statement - Improve logging to show progress (task X/Y) This ensures we know the total number of tasks upfront and can properly limit processing with max_tasks configuration. Signed-off-by: Yoav Katz <katz@il.ibm.com>
Remove all '# Made with ...' comments from Python files for cleaner code. Signed-off-by: Yoav Katz <katz@il.ibm.com>
The agent card may advertise an internal URL (e.g., 0.0.0.0:8000) that is not accessible from outside the pod. This change ensures we always use the configured A2A_BASE_URL (e.g., localhost:8080 via port-forward) instead of the URL from the agent card. This fixes the 404 error when connecting to agents behind port-forwards or proxies. Signed-off-by: Yoav Katz <katz@il.ibm.com>
- Fix syntax errors in run-with-port-forward.sh: * Add missing comment symbol on line 36 * Fix unclosed quote on line 40 * Replace parentheses in echo statements to avoid syntax errors * Update service names to match actual cluster services - Configure A2A endpoint to use root path (/) instead of /v1/chat - Enable OTEL trace collection to local Jaeger instance (localhost:4317) - Enhance OTEL instrumentation: * Add full prompt text to span attributes (prompt.text) * Add full response text to span attributes (response.text) * Improve visibility of inputs/outputs in Jaeger traces - Improve prompt instructions: * Add explicit instruction to call submit MCP tool when asked - Enhance logging: * Add evaluation result details to session evaluation logs Signed-off-by: Yoav Katz <katz@il.ibm.com>
- Add AGENT_SERVICE and BENCHMARK_SERVICE to example.env - Update run-with-port-forward.sh to read service names from .env - Use default values if environment variables are not set - Improves configurability and makes it easier to switch between different deployments Signed-off-by: Yoav Katz <katz@il.ibm.com>
- Add MAX_PARALLEL_SESSIONS configuration parameter (default: 1) - Implement ThreadPoolExecutor for concurrent session execution - Add thread-safe result collection with mutex lock - Display max parallel sessions in run summary - Maintain backward compatibility with sequential processing (max_parallel_sessions=1) - Support abort_on_failure in parallel mode by canceling remaining futures Benefits: - Significantly improves throughput for I/O-bound workloads - Allows users to configure parallelism based on their needs - Maintains all existing functionality and error handling Signed-off-by: Yoav Katz <katz@il.ibm.com>
- Display table of all failed sessions with their error messages at end of run summary - Truncate long error messages to 50 characters for readability - Only show table if there are failed sessions - Helps quickly identify and diagnose session failures Signed-off-by: Yoav Katz <katz@il.ibm.com>
- Extract text from artifacts and result first, regardless of state - Then handle failed/canceled/rejected states with extracted information - Include extracted output in error messages for better debugging - Provides complete context when tasks don't complete successfully Signed-off-by: Yoav Katz <katz@il.ibm.com>
…cluster Add three new scripts to automate deployment and configuration of Exgentic benchmark system on Kagenti Kubernetes cluster: 1. deploy-benchmark.sh: Deploy MCP tools via Kagenti API - Syncs local container images to cluster registry - Authenticates with Keycloak using password grant flow - Deploys tools with proper service configuration - Patches imagePullPolicy for local images - Waits for deployment readiness 2. deploy-agent.sh: Deploy A2A agents from source - Fetches and parses environment variables from GitHub - Deploys agents using Shipwright builds - Monitors build progress and waits for completion - Waits for deployment creation and readiness - Tests agent accessibility via A2A protocol - Fixes port configuration (8080 -> 8000) 3. configure-agent-environment.sh: Configure agent environment - Updates OpenAI API secret via kubectl patch - Patches agent deployment with Azure OpenAI settings - Accepts benchmark name as parameter - Waits for rollout completion These scripts enable automated deployment and testing of the Exgentic benchmark system without manual kubectl commands or UI interaction. Fixes: - Agent port mismatch (container port 8000 vs service port 8080) - MCP_URLS environment variable configuration - Azure OpenAI endpoint and model configuration Signed-off-by: Yoav Katz <katz@il.ibm.com>
…agenti-ui Port 8080 was being used by both the A2A agent port-forward and the kagenti-ui service (via Istio gateway), causing intermittent access issues to http://kagenti-ui.localtest.me:8080/. Changes: - Updated A2A_BASE_URL from localhost:8080 to localhost:8081 in example.env - Modified run-with-port-forward.sh to forward A2A agent to port 8081 - Updated connectivity test to check port 8081 This allows kagenti-ui to be accessed on port 8080 via Istio gateway while the A2A agent uses port 8081, eliminating port conflicts. Signed-off-by: Yoav Katz <katz@il.ibm.com>
Changes:
- Made configure-agent-environment.sh executable (chmod +x)
- Fixed tool name in deploy-agent.sh: removed duplicate '-mcp' suffix
from 'exgentic-mcp-${BENCHMARK_NAME}-mcp' to 'exgentic-mcp-${BENCHMARK_NAME}'
- Fixed tool name in deploy-benchmark.sh: removed duplicate '-mcp' suffix
from 'exgentic-mcp-${BENCHMARK_NAME}-mcp' to 'exgentic-mcp-${BENCHMARK_NAME}'
This ensures consistent tool naming across deployment scripts and makes
the configuration script directly executable.
Signed-off-by: Yoav Katz <katz@il.ibm.com>
Signed-off-by: Yoav Katz <katz@il.ibm.com>
… auth Changes: - Updated QUICKSTART.md with comprehensive deployment instructions - Added Option 1: Deploy Your Own Benchmark and Agent - Added Option 2: Use Existing Services - Documented deploy-benchmark.sh and deploy-agent.sh usage - Updated configuration section with new port (8081) for A2A agent - Added reference documentation for deployment scripts - Fixed Keycloak authentication error in deployment scripts - Added automatic enabling of Direct Access Grants for kagenti client - Both deploy-benchmark.sh and deploy-agent.sh now configure Keycloak - Added better error messages for authentication failures - Renumbered steps after adding Keycloak configuration step This resolves the 'unauthorized_client' error when running deployment scripts and provides clear documentation for deploying benchmarks and agents to the Kagenti cluster. Signed-off-by: Yoav Katz <katz@il.ibm.com>
Changes: 1. Renamed configure-agent-environment.sh to configure-agent-and-benchmark-environment.sh - Use 'kubectl set env' instead of JSON patch for cleaner updates - Extended script to configure both agent and benchmark deployments - Added clear separation between agent and benchmark configuration sections - Improved output formatting with dedicated sections for each component - Added deployment-specific configuration summaries - Agent gets: LLM_API_BASE, OPENAI_API_BASE, LLM_MODEL - Benchmark gets: OPENAI_API_BASE, EXGENTIC_SET_BENCHMARK_USER_SIMULATOR_MODEL 2. Enhanced deploy-benchmark.sh - Added fetching and parsing of benchmark-specific environment variables - Fetches .env.<benchmark> from agent-examples repository - Parses environment variables using Kagenti API - Includes env vars in tool deployment configuration - Added graceful handling when env file is not found - Renumbered steps after adding env var fetching step These improvements ensure: - Consistent LLM configuration across agent and benchmark - Better visibility into what's being configured - Benchmark-specific settings are properly applied from repository - Clearer output for troubleshooting - Proper separation of concerns between agent and benchmark configuration Signed-off-by: Yoav Katz <katz@il.ibm.com>
Changed port-forward cleanup to kill processes by port number instead of service name. This ensures all existing port-forwards on ports 8000 and 8081 are cleaned up regardless of which benchmark or agent service they were forwarding to. Uses lsof to find processes using the ports and kills them, making the script more robust when switching between different benchmarks/agents. Signed-off-by: Yoav Katz <katz@il.ibm.com>
- Add resource limits (2Gi memory) to benchmark pod deployments - Rename close_session to delete_session throughout the stack - Add validation for delete_session response (supports both 'success' and 'status' fields) - Conditionally set EXGENTIC_SET_BENCHMARK_USER_SIMULATOR_MODEL only for tau benchmarks - Create evaluate_benchmark.sh script that accepts benchmark name as parameter - Set AGENT_SERVICE and BENCHMARK_SERVICE dynamically based on benchmark name Signed-off-by: Yoav Katz <katz@il.ibm.com>
- Move .env loading before service name exports to prevent override - Set A2A_ENDPOINT_PATH=/ for JSON-RPC protocol (was /v1/chat) - Fix BENCHMARK_SERVICE to include -mcp suffix - Set MAX_TASKS=1 in example.env for testing This fixes the 404 errors when connecting to the A2A agent endpoint. The agent uses JSON-RPC at the root path, not /v1/chat. Tested with gsm8k benchmark: 100% success rate (1/1 sessions) Signed-off-by: Yoav Katz <katz@il.ibm.com>
- Add optional model-name parameter with Azure/gpt-4o as default - Replace hardcoded model references with MODEL_NAME variable - Set benchmark pod memory limit to 3Gi (3GB) - Update usage documentation and examples - Add memory limit to configuration summary output Signed-off-by: Yoav Katz <katz@il.ibm.com>
- Remove static resource limits (CPU and memory) from deployment JSON - Resource limits are now set dynamically via configure script - Allows for flexible resource allocation per benchmark Signed-off-by: Yoav Katz <katz@il.ibm.com>
- Update evaluate_benchmark.sh to use port 7770 for MCP server (was 8000) - Update evaluate_benchmark.sh to use port 7701 for A2A agent (was 8081) - Update example.env with new port numbers - Update README.md with deployment instructions and usage examples - Increase default MAX_TASKS and MAX_PARALLEL_SESSIONS in example.env - Enable OTEL_EXPORTER_OTLP_ENDPOINT by default in example.env Signed-off-by: Yoav Katz <katz@il.ibm.com>
- Update agent-examples repo URL to yoavkatz/agent-examples - Update workload-harness repo URL to yoavkatz/workload-harness - Add git checkout for feature/exgentic-mcp-server branch - Add git checkout for feature/exgentic-a2a-runner branch - Ensures users clone from correct repos and use correct feature branches - Both repositories are publicly accessible Signed-off-by: Yoav Katz <katz@il.ibm.com>
- Uncomment cleanup function in evaluate_benchmark.sh - Enable trap to cleanup port forwards on exit (EXIT, INT, TERM) - Update README to reflect automatic cleanup behavior - Update feature list: parallel session processing (not sequential) - Add port forwarding details (7770 for MCP, 7701 for agent) - Clarify configure script parameters and defaults - Remove limitation about manual port forward cleanup Signed-off-by: Yoav Katz <katz@il.ibm.com>
- Add kubectl and Kagenti cluster prerequisites - Add note about optional Keycloak credentials for deploy scripts - Remove QUICKSTART.md (information merged into README) - README now contains all necessary setup and usage information Signed-off-by: Yoav Katz <katz@il.ibm.com>
- Modified otel.py to not use ConsoleSpanExporter when OTEL_EXPORTER_OTLP_ENDPOINT is not set - Traces are still collected internally but not exported or printed - Added comprehensive Jaeger setup instructions to README.md - Updated example.env with clearer OTEL configuration comments This prevents unwanted console spam when OTEL is not configured while still allowing full observability when Jaeger or another collector is set up. Signed-off-by: Yoav Katz <katz@il.ibm.com>
Signed-off-by: Yoav Katz <katz@il.ibm.com>
- Modified otel.py to not use ConsoleSpanExporter when OTEL_EXPORTER_OTLP_ENDPOINT is not set - Traces are still collected internally but not exported or printed - Added comprehensive Jaeger setup instructions to README.md - Reorganized README configuration section: main settings, debug, tracing, advanced - Updated example.env with clearer OTEL configuration comments - Modified evaluate_benchmark.sh to automatically set EXGENTIC_MCP_SERVER_URL and A2A_BASE_URL - Removed run-with-port-forward.sh (functionality integrated into evaluate_benchmark.sh) - Users no longer need to manually configure MCP and A2A URLs when using evaluate_benchmark.sh This prevents unwanted console spam when OTEL is not configured while still allowing full observability when Jaeger or another collector is set up. Signed-off-by: Yoav Katz <katz@il.ibm.com>
- Added asyncio logger to WARNING level to suppress 'Using selector' debug messages - Redirected kubectl port-forward output to /dev/null to suppress 'Handling connection' messages - Keeps console output clean and focused on actual runner progress Signed-off-by: Yoav Katz <katz@il.ibm.com>
- Add --log-level CLI argument (DEBUG, INFO, WARNING, ERROR) - Remove --verbose flag in favor of explicit log level control - Update evaluate_benchmark.sh to pass LOG_LEVEL from environment - Document LOG_LEVEL configuration in README and example.env - Default log level remains INFO for balanced output - Priority: CLI arg > LOG_LEVEL env var > default (INFO) Signed-off-by: Yoav Katz <katz@il.ibm.com>
…lity - Add prompt text to exgentic_a2a.prompt.build span - Add prompt, response, and duration to exgentic_a2a.a2a.send_prompt span - Add evaluation result and duration to exgentic_a2a.mcp.evaluate_session span - Maintain backward compatibility by keeping attributes on parent span This improves Jaeger trace analysis by making relevant data visible on the specific operation spans rather than only on the root span. Signed-off-by: Yoav Katz <katz@il.ibm.com>
- Add context field to SessionData dataclass as Optional[Dict[str, Any]] - Update MCP client to extract context from create_session response - Modify build_prompt to accept and format context in the prompt - Pass session context through the entire pipeline to the agent The context dictionary from the MCP server is now included in the prompt sent to the agent, providing additional information for task completion. Signed-off-by: Yoav Katz <katz@il.ibm.com>
- Add debug logging to log full task response for troubleshooting - Change text extraction to check for None instead of falsy values to allow empty strings - Return empty string for completed tasks with no extracted text - Improve handling of completed tasks without text content Signed-off-by: Yoav Katz <katz@il.ibm.com>
- Add estimated setup time (~15 minutes) - Clarify Python version requirements (3.13+ not supported) - Add note about uv automatically using Python 3.12 - Include secret_values.yaml creation step in Kagenti setup - Improve configuration section with clearer structure - Emphasize required .env file creation before running evaluations Signed-off-by: Yoav Katz <katz@il.ibm.com>
- Added note that project has been tested only locally with Podman - Clarifies Docker compatibility has not been validated Signed-off-by: Yoav Katz <katz@il.ibm.com>
Signed-off-by: Yoav Katz <katz@il.ibm.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR adds test harness to check Exgentic benchmarks.
For: kagenti/kagenti#963
as part of Epic: kagenti/kagenti#962