Skip to content

Symbol-lookup MCP tools backed by vendored cellar#599

Open
oyvindberg wants to merge 1 commit into
masterfrom
mcp-symbols
Open

Symbol-lookup MCP tools backed by vendored cellar#599
oyvindberg wants to merge 1 commit into
masterfrom
mcp-symbols

Conversation

@oyvindberg
Copy link
Copy Markdown
Owner

Summary

Four MCP tools that give coding agents JVM API knowledge straight from the build's resolved classpath:

  • bleep.symbol.get — full info for an FQN: signature, members, companion / Java statics as an object Foo: block, subtypes, deprecation tag, docstring
  • bleep.symbol.find — discovery: scope enumerates a package or class, pattern substring-searches the whole classpath, combine for filtered enumeration. Results carry source-file locations: (src/.../Foo.scala:42) for in-project code, (in group:artifact:version) for deps
  • bleep.symbol.get_source — fetch source from a published -sources.jar via Coursier, slice to the symbol's line range. Coordinate is optional; inferred from the project's resolved artifacts when omitted

Powered by bleep-cellar, a trimmed fork of VirtusLab/cellar (MPL-2.0) at liberated/cellar, reduced to the tasty-query glue and the output formatters. Bleep already owns the classpath, JRE info, and Coursier cache, so cellar's classpath extraction, build-tool detection, Coursier wrappers, CLI, HOCON config, and Mill/Nix build files were dropped (~4,300 LOC vs upstream). fs2 dropped; cats-effect kept (already a bleep dep via the MCP server).

The bridge layer in bleep-cli/src/scala-3/bleep/symbols/:

  • SymbolsBridge: ResolvedProject / ResolvedJvm → tasty-query Context with project classpath + JRE modules
  • SourceFetcher: on-demand Coursier -sources.jar fetch + zip slice

BleepMcpServer wiring:

  • per-project Context cache, invalidated on bleep.yaml change AND on every successful BSP compile/test cycle (since compile rewrites classesDir bytes)
  • staleness banner when source files are newer than the last compile
  • auto-coordinate inference for get_source via ResolvedProject.resolution

Output choices aimed at agent token-efficiency:

  • Scala-source-style rendering instead of Markdown chrome; modifiers baked into keywords (sealed trait, abstract def, inline def, implicit val); models parse Scala-shaped output more cheaply than free-form Markdown
  • scala.FunctionN[A,..,R] renders with arrow syntax
  • Java signatures hide <FromJavaObject> and NothingType bounds; drop synthetic x$0/x$1 param names
  • Constructors render as new[Tparams](args) not def <init>(): Unit
  • find/pattern groups by package to elide repeated FQN prefixes; Scala-2-limited-info note hoisted to a single header
  • inherited constructors filtered; module-class type-alias shims (Foo\$ traits) hidden from search noise
  • per-call member limit defaults to 30 to bound output on large types
  • members sorted by category (ctor → abstract → concrete → vals → types) then alphabetical

The cellar submodule is the bleep-build fork; the trimming commit is on a mcp-symbols branch on that fork (https://github.com/bleep-build/cellar/tree/mcp-symbols). It needs to land on bleep-build/cellar main (separate small PR) before this can merge.

Test plan

  • bleep test bleep-cellar-tests — 102/102 pass (resolution, lister, formatters, near-match, type printer, etc.)
  • bleep cellar-benchmark — 22/22 scenarios pass quality checks across Spring Boot + Reactor + Netty + Hibernate + Guava + Jackson classpath, avg ~110ms per scenario
  • bleep compile all touched projects: bleep-cellar, bleep-cli, bench-bigdeps, scripts-dev
  • bleep fmt
  • Manually exercise bleep.symbol.get / find / get_source via Claude Code against a real bleep build after binary is published

🤖 Generated with Claude Code

Adds four MCP tools to bleep-cli that give coding agents API knowledge
straight from the build's resolved classpath:

  bleep.symbol.get           — full info for an FQN (signature, members,
                               companion as `object Foo:` block, subtypes,
                               deprecation tag, docstring)
  bleep.symbol.find          — discovery: set `scope` (package or class) to
                               enumerate, set `pattern` to substring-search
                               the whole classpath, or combine the two.
                               Results carry source-location annotations:
                               `(src/.../Foo.scala:42)` for in-project code,
                               `(in group:artifact:version)` for deps
  bleep.symbol.get_source    — fetch source code from a published
                               `-sources.jar` via Coursier, slice to the
                               symbol's line range. Coordinate is optional;
                               inferred from the project's resolved artifacts
                               when omitted.

Powered by `bleep-cellar`, a trimmed fork of VirtusLab/cellar at
liberated/cellar (MPL-2.0), reduced to the tasty-query glue and the
output formatters. Bleep already owns the classpath, JRE info, and
Coursier cache, so we skipped cellar's classpath extraction, build-tool
detection, Coursier wrappers, CLI, HOCON config, and Mill/Nix build
files — ~4,300 LOC gone vs upstream. Dropped fs2; kept cats-effect
(already a bleep dep via the MCP server).

Bridge layer (bleep-cli/src/scala-3/bleep/symbols):
  - SymbolsBridge: ResolvedProject/ResolvedJvm → tasty-query Context
                   with the project's classpath + JRE modules from jrt-fs
  - SourceFetcher: on-demand Coursier fetch + zip-slice for sources jars

BleepMcpServer wiring:
  - per-project Context cache, invalidated on bleep.yaml change AND on
    every successful BSP compile/test cycle (since compile rewrites
    classesDir bytes)
  - staleness banner when source files are newer than the last compile
  - auto-coordinate inference for get_source using
    ResolvedProject.resolution

Output choices aimed at agent token-efficiency:
  - Scala-source-style rendering (no markdown chrome, no code fences),
    modifiers baked into keywords (`sealed trait`, `abstract def`,
    `inline def`, `implicit val`); models parse Scala-shaped output more
    cheaply than free-form markdown
  - `scala.FunctionN[A,..,R]` renders with arrow syntax (`A => R`,
    `(A, B) => R`)
  - Java signatures: hide `<FromJavaObject>` and `NothingType` bounds;
    drop synthetic `x$0`/`x$1` param names; render Object as Object
  - Constructors render as `new[Tparams](args)` not `def <init>(): Unit`
  - find/pattern groups results by package to elide repeated FQN
    prefixes; Scala-2-limited-info note hoisted to a single header
  - inherited constructors filtered out; module-class type-alias shims
    (`Foo$` traits) filtered out of find/pattern noise
  - per-call member limit defaults to 30 to bound output size on large
    types like Reactor Flux
  - members sorted by category (ctor → abstract → concrete → vals →
    types) then alphabetical, for predictable output

bench-bigdeps + `bleep cellar-benchmark`: a Spring-Boot-heavy bleep
project (Spring web/data-jpa/security/webflux, Reactor, Netty,
Hibernate, Jackson, Guava, Apache Commons) plus a benchmark script
exercising 22 representative scenarios across the four tools.
Currently: 22/22 pass quality checks, avg ~110ms per scenario.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant