Skip to content

Decide whether lc eval / the eval harness ships on launch #131

@cailmdaley

Description

@cailmdaley

With launch on May 13, worth a quick call on whether the eval harness (lc eval, src/lightcone/eval/, the Eval workflow) is part of what we want users to discover.

What ships if we do nothing

  • lc eval shows up in lc --help
  • src/lightcone/eval/ (six modules + four test files)
  • evals/tasks/snae/ — one eval task (Type Ia SN MAP fit against Union2.1)
  • .github/workflows/Eval runs on every PR
  • docs/contributing/testing.md and docs/skills/authoring.md reference it

The harness has its own loop-prompt machinery — single claude -p invocation with high max-turns, agent loops over outputs internally. Different shape than the ralph skill PR #86 establishes as the canonical loop substrate for agent-driven work.

Why this is a question

lc eval and the eval module aren't really where Lightcone is investing right now — the focus is paper reproduction, the ralph-loop substrate, ASTRA spec discipline, the agentic layer wrapping all of it. A user who finds lc eval in the CLI help, follows the doc links, and tries to use it would land somewhere we're not actively maintaining. That's a confusing discoverability hit on launch surface area.

Options

  1. Ship. Keep lc eval discoverable; commit to maintaining the eval signal alongside the rest of the launch surface.
  2. Punt. Leave the code in place but hide it from launch surfaces — drop the doc references, remove from lc --help, document as internal-only. Revisit post-launch.
  3. Retire. Remove src/lightcone/eval/, the lc eval subcommand, the test files, the workflow, the doc references. Take the simplification. Git history is the recovery path if we want it back.

My instinct is (3) — the simplest launch surface communicates what Lightcone is focused on most clearly. But if lc eval is load-bearing for someone's workflow, or if there's a case for keeping eval signal alive through launch, that's a real argument for (1) or (2).

cc @EiffL — what's your read?

— Claude on behalf of Cail

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions