Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -198,8 +198,8 @@ PlanProof values correctness over persuasion.
- [X] PR 2.4: Validator Unit Tests
- [X] PR 3.1: Metadata Extractor
- [x] PR 3.2: Validation Wiring
- [ ] PR 3.3: 1-Shot Repair Loop
- [ ] PR 4.1: Opik Trace Scaffolding
- [X] PR 3.3: 1-Shot Repair Loop
- [X] PR 4.1: Opik Trace Scaffolding
Comment on lines +201 to +202

Copilot AI Jan 21, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR marks multiple tasks as complete (PR 3.3 and PR 4.1), but according to the Micro-PR Mandate, "One Checkbox = One PR." If PR 3.3 was completed in a previous PR, it should have been marked complete in that PR. If it was never completed before, this PR should not include its implementation. This PR should only mark PR 4.1 as complete, as it implements the Opik tracing scaffolding.

Copilot generated this review using guidance from repository custom instructions.
- [ ] PR 4.2: Opik Metrics Integration

## Infrastructure
Expand Down
2 changes: 1 addition & 1 deletion apps/api/src/planproof_api/agent/extractor.py
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ def _normalize_entities(entities: list[str]) -> list[str]:
return normalized


@opik.track(name="extract_metadata")
@opik.track(name="extraction_step")
def extract_metadata(context: str) -> ExtractedMetadata:
client = OpenAI()
response = client.chat.completions.create(
Expand Down
2 changes: 0 additions & 2 deletions apps/api/src/planproof_api/agent/planner.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@
from openai import OpenAI

from planproof_api.agent.schemas import ExtractedMetadata, PlanItem
from planproof_api.observability.opik import opik

_SYSTEM_PROMPT = (
"You are a planning assistant. Return ONLY valid JSON with keys: "
Expand All @@ -19,7 +18,6 @@ class PlanGenerationError(RuntimeError):
pass


@opik.track(name="generate_plan")
def generate_plan(
context: str,
metadata: ExtractedMetadata,
Expand Down
44 changes: 41 additions & 3 deletions apps/api/src/planproof_api/observability/opik.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,30 @@
import os
import socket
import sys

if "OPIK_PROJECT_NAME" not in os.environ:
os.environ["OPIK_PROJECT_NAME"] = "Hackaton"
os.environ["OPIK_PROJECT_NAME"] = "PlanProof"


def _warn(message: str) -> None:
print(f"OPIK WARNING: {message}", file=sys.stderr)


def _network_available() -> bool:
try:
socket.getaddrinfo("www.comet.com", 443)
return True
except OSError:
return False


try:
import opik as _opik # type: ignore
except Exception: # pragma: no cover - optional tracing dependency
from opik import opik_context as _opik_context # type: ignore
except Exception as exc: # pragma: no cover - optional tracing dependency
_warn(f"Opik import failed; tracing disabled. ({exc})")
_opik = None
_opik_context = None


class _NoOpOpik:
Expand All @@ -18,4 +36,24 @@ def decorator(func):
return decorator


opik = _opik if _opik is not None else _NoOpOpik()
class _NoOpOpikContext:
@staticmethod
def update_current_span(*_args, **_kwargs) -> None:
return None

@staticmethod
def update_current_trace(*_args, **_kwargs) -> None:
return None


_opik_enabled = bool(_opik and _opik_context)
if not os.environ.get("OPIK_API_KEY"):
_warn("OPIK_API_KEY not set; tracing disabled.")
_opik_enabled = False

if _opik_enabled and not _network_available():
_warn("Network unavailable; tracing disabled.")
_opik_enabled = False

opik = _opik if _opik_enabled else _NoOpOpik()
opik_context = _opik_context if _opik_enabled else _NoOpOpikContext()
40 changes: 34 additions & 6 deletions apps/api/src/planproof_api/routes.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
PlanValidation,
ValidationMetrics,
)
from planproof_api.observability.opik import opik
from planproof_api.observability.opik import opik, opik_context

router = APIRouter()

Expand All @@ -43,7 +43,19 @@ def _format_plan(plan: list[PlanItem]) -> str:
return json.dumps([item.model_dump() for item in plan], indent=2)


@opik.track(name="validate_plan")
@opik.track(name="initial_planning_step")
def _initial_planning_step(
request: PlanRequest, metadata: ExtractedMetadata
) -> tuple[list[PlanItem], list[str], list[str]]:
return generate_plan(
request.context,
metadata,
request.current_time,
request.timezone,
)


@opik.track(name="validation_step")
def _validate_plan(
plan: list[PlanItem], metadata: ExtractedMetadata, current_time: str
) -> PlanValidation:
Expand Down Expand Up @@ -104,10 +116,22 @@ def _validate_plan(
keyword_recall_score=keyword_recall_score,
human_feasibility_flags=human_feasibility_flags,
)
try:
opik_context.update_current_span(
metadata={
"constraint_violation_count": constraint_violation_count,
"overlap_minutes": overlap_minutes,
"hallucination_count": hallucination_count,
"keyword_recall_score": keyword_recall_score,
"human_feasibility_flags": human_feasibility_flags,
}
)
except Exception:

Copilot AI Jan 21, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'except' clause does nothing but pass and there is no explanatory comment.

Copilot uses AI. Check for mistakes.
pass
return PlanValidation(status=status, metrics=metrics, errors=errors)


@opik.track(name="repair_plan")
@opik.track(name="repair_step")
def _repair_plan(
request: PlanRequest, metadata: ExtractedMetadata, failed_plan: list[PlanItem], errors: list[str]
) -> tuple[list[PlanItem], list[str], list[str]]:
Expand All @@ -129,12 +153,16 @@ def _repair_plan(


@router.post("/api/plan", response_model=PlanResponse)
@opik.track(name="plan_request")
def create_plan(request: PlanRequest) -> PlanResponse:
try:
opik_context.update_current_trace(metadata={"variant": request.variant})
except Exception:

Copilot AI Jan 21, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'except' clause does nothing but pass and there is no explanatory comment.

Suggested change
except Exception:
except Exception:
# Telemetry/tracing must never break request handling; ignore failures here.

Copilot uses AI. Check for mistakes.
pass

metadata = extract_metadata(request.context)
try:
plan, assumptions, questions = generate_plan(
request.context, metadata, request.current_time, request.timezone
)
plan, assumptions, questions = _initial_planning_step(request, metadata)
except PlanGenerationError as exc:
validation = PlanValidation(
status="fail",
Expand Down
6 changes: 3 additions & 3 deletions docs/assistant_prompts/codex_tasks.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,18 +42,18 @@ The implementation must conform to:

- [X] **PR 2.1:** Implement `eval/time_math.py` to detect overlaps between `start_time` and `end_time`.
- [X] **PR 2.2:** Implement `eval/hallucination.py` for Proper Noun matching between context and plan.
- [ ] **PR 2.3:** Implement `eval/recall.py` for Keyword Recall score calculation (deterministic string match).
- [X] **PR 2.3:** Implement `eval/recall.py` for Keyword Recall score calculation (deterministic string match).
- [X] **PR 2.4:** Add unit tests for all validators in `apps/api/tests/`.

## Phase 3 — Agent: The Sandwich Pipeline

- [X] **PR 3.1:** Implement the "Extractor" logic (LLM call to parse constraints and keywords).
- [X] **PR 3.2:** Wire the Validator to run after the Planner and populate `validation.status` and `errors`.
- [ ] **PR 3.3:** Implement the 1-shot "Repair Attempt" logic (if FAIL, retry once with errors in prompt).
- [X] **PR 3.3:** Implement the 1-shot "Repair Attempt" logic (if FAIL, retry once with errors in prompt).

## Phase 4 — Observability

- [ ] **PR 4.1:** Integrate Opik tracing hooks for each step (Extract -> Plan -> Validate -> Repair).
- [X] **PR 4.1:** Integrate Opik tracing hooks for each step (Extract -> Plan -> Validate -> Repair).
Comment on lines +45 to +56

Copilot AI Jan 21, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR marks multiple tasks as complete (PR 2.3, PR 3.3, and PR 4.1), but according to the Micro-PR Mandate, "One Checkbox = One PR." If PR 2.3 and PR 3.3 were completed in previous PRs, they should have been marked complete in those respective PRs. If they were never completed before, this PR should not include their implementation. This PR should only mark PR 4.1 as complete, as it implements the Opik tracing scaffolding.

Copilot generated this review using guidance from repository custom instructions.
- [ ] **PR 4.2:** Ensure `validation.metrics` are logged as properties in the Opik trace.

---
Expand Down