feat(mcp): download log content for self-hosted jobs in jobs_logs by cchristous · Pull Request #863 · semaphoreio/semaphore

cchristous · 2026-02-16T13:41:43Z

Summary

The jobs_logs MCP tool previously had split behavior:

Hosted jobs: returned actual log content (up to 200 lines) with cursor-based pagination
Self-hosted jobs: returned only a JWT token and URL, requiring the caller to manually curl the URL

This forced MCP clients (like Claude Code) to fall back to Bash curl to view self-hosted logs, defeating the purpose of having an MCP tool.

This PR makes fetchSelfHostedLogs download the log content via HTTP after generating the token, parse the JSON events response, and return a paginated preview — the same experience as hosted jobs. If the download fails for any reason (network error, non-200 status, JSON parse error), it falls back gracefully to the existing token/URL-only behavior with a logged warning.

Changes

mcp_server/pkg/tools/jobs/logs.go

Added computePaginationWindow helper that computes slice indices, display metadata, and next cursor — used by both fetchHostedLogs and paginateSelfHostedLines to eliminate duplicated windowing logic
computePaginationWindow only reports Truncated = true when the window genuinely doesn't cover all available lines (start > 0 || end < totalLines), rather than unconditionally when a cursor is provided
Added logDownloader function type and threaded it through logsHandler → fetchSelfHostedLogs as a parameter, replacing the previous downloadLogsFn package-level variable (enables clean test injection without global mutation)
Added downloadSelfHostedLogs function that GETs the logs URL, parses {"events": [{"output": "..."}]} JSON, and returns lines
Added io.LimitReader with 10 MiB cap to prevent OOM from unexpectedly large responses — reads maxLogDownloadBytes+1 so that responses of exactly 10 MiB are accepted while anything larger is rejected with a descriptive errLogResponseTooLarge error; uses explicit int64 cast on both sides of the size comparison; the error message is derived from maxLogDownloadBytes to stay in sync if the constant changes
downloadSelfHostedLogs accepts context.Context and uses http.NewRequestWithContext so the HTTP download respects parent cancellation/deadlines
Uses a shared package-level selfHostedHTTPClient for connection pooling instead of creating a new http.Client per call; on non-200 responses the body is drained before returning the error so the TCP connection can be reused
Added in-memory cache (logCache) keyed by job ID with 60-second TTL — avoids re-downloading the full log response on every pagination call, since self-hosted logs are fetched via a single HTTP GET with no server-side windowing; only finished jobs are cached — running jobs re-download on each call so new output is picked up immediately
Cache stores token metadata (token, tokenType, tokenTtlSeconds, logsURL) so cached responses include the same fields as fresh responses; note that tokenTtlSeconds reflects the original TTL at issuance, not the remaining lifetime (the 60s cache TTL is well within the 300s token TTL so the token is always still valid)
Both getCachedLogLines and cacheLogLines perform an expired-entry sweep on each call for consistent cleanup
Cache is bounded to 10 entries (~100 MiB worst case) with LRU-by-expiry eviction to prevent unbounded memory growth
Cache check is performed before gRPC GenerateToken call, so paginated requests that hit the cache skip the gRPC round-trip entirely (the token from the initial cache-miss call has a 300s TTL, well exceeding the 60s cache TTL)
Empty download results (jobs still starting up) are intentionally not cached so they are re-checked on subsequent calls
Added DownloadEmpty field to logsResult — when the log download succeeds but returns no events, formatSelfHostedLogsMarkdown now shows a distinct "no output yet, retry shortly" message instead of falling back to the token/curl instructions
Updated fetchSelfHostedLogs to accept startingLine for pagination and jobFinished to control caching behavior
Updated formatSelfHostedLogsMarkdown to show log preview when available, empty-download message when download succeeded but returned no lines, and token/URL fallback otherwise. When at cursor=0 the truncation message now says "This is the beginning of the log" instead of the misleading "No further logs are available"
Removed dead if start <= 0 { start = 0 } guards from both formatHostedLogsMarkdown and formatSelfHostedLogsMarkdown — StartLine is always >= 0
Updated logsFullDescription to reflect unified behavior
Updated success tracking to mark success when preview lines OR token are present
Stored tokenTypeToString result in a local variable to avoid redundant call

mcp_server/pkg/tools/jobs/register.go

Updated Register to pass downloadSelfHostedLogs as the logDownloader argument to logsHandler

mcp_server/pkg/tools/jobs/jobs_test.go

Extracted newSelfHostedTestEnv helper that creates the full test environment (httptest server, stubs, provider, handler, cache reset) and callLogsHandler helper for making requests — reduces ~40-60 lines of boilerplate per test
newSelfHostedTestEnv uses t.Cleanup for automatic teardown and passes a test downloader directly to logsHandler — no global state mutation, safe for parallel use
newSelfHostedTestEnv defaults job state to FINISHED; accepts selfHostedTestOpt to override (e.g. STARTED for running-job tests)
Test downloader now asserts that the URL it receives matches the expected logs URL, verifying the correct URL is passed through the call chain
Added shared constants (selfHostedTestJobID, selfHostedTestOrgID, selfHostedTestUser)
Updated TestFetchSelfHostedLogs to use helpers and verify preview lines are returned
Added TestSelfHostedLogsDownloadFailureFallback — HTTP 500 gracefully falls back to token/URL
Added TestSelfHostedLogsEmptyDownloadFallback — empty events response shows distinct "no output yet" message and sets DownloadEmpty=true
Added TestSelfHostedLogsPagination — 5 sub-tests mirroring hosted pagination test cases (including out-of-range cursor)
Added TestDownloadSelfHostedLogs — 5 sub-tests for the download function (valid, empty, error, bad JSON, response too large)
Added TestSelfHostedLogsCacheAvoidsDuplicateDownload — uses atomic.Int32 for the download counter to avoid a data race between the httptest handler goroutine and the test goroutine; verifies the HTTP server is only hit once across two handler calls for the same job, and asserts cached response includes Token/LogsURL metadata
Added TestSelfHostedRunningJobSkipsCache — verifies that a STARTED job downloads logs on every call (no caching), confirming running jobs always get fresh output
Added cursorBeyondEndReturnsEmptyPreview test case to both TestFetchHostedLogsPagination and TestSelfHostedLogsPagination — verifies a cursor past the end returns an empty preview with the correct backward cursor

Security considerations

The logs URL is constructed from trusted internal sources (server BaseURL config, gRPC org service, UUID-validated jobID, gRPC-generated token) — not user-controllable
HTTP response body is capped at 10 MiB via io.LimitReader to prevent OOM, with a clear error when the cap is hit
HTTP download uses http.NewRequestWithContext to respect parent context cancellation
Shared selfHostedHTTPClient has a 30-second timeout as a safety net; response bodies are drained on non-200 status to allow connection reuse
JWT token (300s TTL) travels over HTTPS; only shown in markdown when log download fails (fallback path). When preview succeeds, the token is intentionally omitted from the markdown but remains in StructuredContent for programmatic MCP client access
In-memory cache entries expire after 60 seconds; expired entries are evicted opportunistically on both read and write; cache is bounded to 10 entries with LRU-by-expiry eviction; only finished jobs are cached
Token security reminder preserved in the fallback display path

Test plan

All 25 tests pass (go test ./pkg/tools/jobs/... -v -count=1)
go vet ./pkg/tools/jobs/... clean
go build ./... clean
Manual verification with a real self-hosted job via Claude Code MCP integration

The jobs_logs MCP tool previously returned only a JWT token and URL for self-hosted jobs, requiring the caller to manually curl the URL. Now fetchSelfHostedLogs makes an HTTP GET to the logs endpoint, parses the JSON events, and returns a paginated log preview — the same experience as hosted jobs. If the download fails (network error, non-200 status, parse error), it falls back gracefully to the existing token/URL behavior.

…nload Pass context.Context to downloadSelfHostedLogs so the HTTP request respects parent cancellation/deadlines, and return a descriptive error when the response exceeds the 10 MiB safety cap instead of falling through to a confusing JSON parse failure.

Add in-memory cache (60s TTL) keyed by job ID to avoid re-downloading the full log response on every pagination call. Also: reuse a shared HTTP client for connection pooling, fix LimitReader off-by-one to accept exactly 10 MiB responses, improve the cursor=0 truncation message, extract test helpers to reduce boilerplate, and document the intentional token omission from markdown when preview succeeds.

Skip the gRPC GenerateToken call on cache hits by checking the cache at the top of fetchSelfHostedLogs, avoiding unnecessary latency during pagination. Bound the in-memory cache to 10 entries (~100 MiB worst case) with LRU-by-expiry eviction. Add clarifying comments for the intentional empty-result cache skip, the truncated flag behavior on explicit cursors, and the test parallel-safety constraint.

- Extract shared pagination into computePaginationWindow helper - Replace downloadLogsFn package-level var with logDownloader parameter injection - Add expired-entry sweep in getCachedLogLines matching cacheLogLines pattern - Use explicit int64 cast on maxLogDownloadBytes comparison - Store token metadata in cache so cached responses include Token/LogsURL - Remove dead start <= 0 guards in format functions - Add cursorBeyondEndReturnsEmptyPreview tests for both hosted and self-hosted - Refactor newSelfHostedTestEnv to use t.Cleanup instead of manual cleanup func

Use atomic.Int32 for downloadCount in cache test to avoid a data race between the httptest handler goroutine and the test goroutine. Add a comment noting that cached tokenTtlSeconds reflects issuance time, not remaining lifetime. Fix computePaginationWindow to only report truncated when the window genuinely doesn't cover all available lines.

… harden tests - Only cache self-hosted log downloads for finished jobs; running jobs re-download on each call so new output is picked up immediately. - Show a distinct "no output yet" message when log download succeeds but returns empty events, instead of falling back to token/curl instructions. - Assert the logs URL passed to the test downloader matches expectations. - Store tokenTypeToString result in a local variable to avoid redundant call.

- Use errors.New for errLogResponseTooLarge sentinel (item 6) - Clamp NextCursor to last real page when cursor is beyond total lines, so self-hosted pagination redirects to actual data instead of another empty page (item 3) - Extract writePreviewMarkdown helper to deduplicate preview rendering between hosted and self-hosted formatters (item 4) - Skip log cache on the read path for running jobs, matching the existing write-side bypass for consistency (item 1)

…stant Drain the HTTP response body on non-200 status in downloadSelfHostedLogs so the shared selfHostedHTTPClient can reuse the TCP connection. Derive the errLogResponseTooLarge message from maxLogDownloadBytes instead of hardcoding the byte count.

…pping When a cursor points past all available log lines, the markdown now shows "Cursor is past the end" with a redirect cursor instead of misleading "no output yet" (hosted) or token/URL fallback (self-hosted). Also adds a comment explaining the virtualTotal/sliceOffset mapping in fetchHostedLogs.

github-project-automation bot added this to Roadmap Feb 16, 2026

github-project-automation bot moved this to Backlog in Roadmap Feb 16, 2026

cchristous added 10 commits February 16, 2026 08:49

Merge branch 'main' into feat/mcp-self-hosted-log-download

8106ca6

cchristous marked this pull request as ready for review February 17, 2026 01:14

cchristous requested review from DamjanBecirovic, dexyk and hamir-suspect as code owners February 17, 2026 01:14

DamjanBecirovic approved these changes Mar 30, 2026

View reviewed changes

DamjanBecirovic added 2 commits March 30, 2026 12:20

Merge branch 'main' into feat/mcp-self-hosted-log-download

fc9c5b3

Merge branch 'main' into feat/mcp-self-hosted-log-download

7cc5fa8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(mcp): download log content for self-hosted jobs in jobs_logs#863

feat(mcp): download log content for self-hosted jobs in jobs_logs#863
cchristous wants to merge 13 commits intosemaphoreio:mainfrom
cchristous:feat/mcp-self-hosted-log-download

cchristous commented Feb 16, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

cchristous commented Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Security considerations

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cchristous commented Feb 16, 2026 •

edited

Loading