PR -> feat: Unified Telemetry Layer for Non-LangGraph Trace Pipelines (M2) Body by mjehanzaib999 · Pull Request #64 · AgentOpt/OpenTrace

mjehanzaib999 · 2026-02-26T11:18:16Z

Summary

This PR implements the "Generic Unified Telemetry" layer (Milestone 2), enabling OTEL span emission for non-LangGraph Trace pipelines while preserving all existing LangGraph instrumentation behavior.

After M1, only LangGraph pipelines could emit OTEL spans. This PR extends telemetry coverage so that any Trace pipeline using @trace.bundle or call_llm can produce OTEL-compatible spans when a TelemetrySession is active — with zero changes to existing code when no session is active.

What's new

Session activation via contextvars — TelemetrySession supports with context manager and activate() for global discovery by Trace hooks
OTEL spans around @trace.bundle ops — controlled by BundleSpanConfig (enable/disable, suppress default ops, capture inputs)
MessageNode-to-span binding — MessageNodeTelemetryConfig binds message.id to current span for stable node identity in TGJ conversion
call_llm provider span — emits a child OTEL span with trace.temporal_ignore=true when a session is active (visible for monitoring, excluded from output node selection)
Session activation in LangGraph root span — InstrumentedGraph._root_invocation_span now calls session.activate() so Trace-level hooks discover the session automatically
Optional MLflow autologging — opto.features.mlflow.autolog() enables mlflow.trace wrapping on bundle ops; safe no-op when MLflow is not installed
Export naming alignment — export_run_bundle() now writes otlp.json / tgj.json (aligned with repo demos) with backward-compatible aliases (otlp_trace.json / trace_graph.json)
Manifest + node records — manifest.json and message_nodes.jsonl included in export bundle for debugging

Files changed (9 files, +664 / -71)

File	Change
`opto/trace/settings.py`	New — global MLflow autologging toggle
`opto/features/mlflow/__init__.py`	New — MLflow integration package
`opto/features/mlflow/autolog.py`	New — `autolog()` / `disable_autolog()`
`opto/trace/__init__.py`	Expose `settings` and `mlflow` in public API
`opto/trace/bundle.py`	Optional OTEL span in `sync_forward`/`async_forward`; MLflow `mlflow.trace` wrapping
`opto/trace/io/telemetry_session.py`	Major expansion: activation, `BundleSpanConfig`, `MessageNodeTelemetryConfig`, span helpers, MLflow helpers, export alignment
`opto/trace/io/instrumentation.py`	Wrap root span with `session.activate()`
`opto/trace/nodes.py`	Hook `MessageNode.__init__` to call `on_message_node_created()`
`opto/trace/operators.py`	`call_llm` emits temporal-ignore provider span

Non-breaking guarantees

No session active → identical behavior — all hooks are guarded by TelemetrySession.current() is None checks
postprocess_output signature unchanged — preserves compatibility with existing callers
preprocess_inputs preserved — data extraction inside trace_nodes context is untouched
MLflow is optional — all imports are guarded; code works without MLflow installed

Test plan

…tion do not lose initial node to optimize (TODO: trainer might have a better solution)

…a lot of logs for further analysis

…ns and doc evaluation hooks

- Add T1 technical plan for LangGraph OTEL Instrumentation API - Add architecture & strategy doc (unified OTEL instrumentation design) - Add M0 README with before/after boilerplate reduction comparison - Add feedback analysis and API strategy comparison (Trace-first, dual semconv) - Add prototype_api_validation.py with real LangGraph StateGraph + OpenRouter/StubLLM - Add Jupyter notebook (prototype_api_validation.ipynb) for Colab-ready demo - Add example trace output JSON files (notebook_trace_output, optimization_traces) - Add .env.example for OpenRouter configuration

- Replace hardcoded API key with 3-tier auto-lookup (Colab Secrets → env → .env) - Save all trace outputs to RUN_FOLDER (Google Drive on Colab, local fallback) - Add run_summary.json export with scores and history - Update configuration docs with key setup priority table - Fix Colab badge URL with actual repo/branch path

…ace/io/otel_adapter.py

Deliver Milestone 1 — drop-in OTEL instrumentation and end-to-end optimization for any LangGraph agent via two function calls. New modules (opto/trace/io/): - instrumentation.py: instrument_graph() + InstrumentedGraph wrapper - optimization.py: optimize_graph() loop + EvalResult/EvalFn contracts - telemetry_session.py: TelemetrySession (TracerProvider + flush/export) - bindings.py: Binding dataclass + apply_updates() + make_dict_binding() - otel_semconv.py: emit_reward(), emit_trace(), record_genai_chat() Modified modules: - langgraph_otel_runtime.py: TracingLLM dual semconv (param.* parent + gen_ai.* child spans with trace.temporal_ignore) - __init__.py: export all new M1 public APIs Tests (63 passing, StubLLM-only, CI-safe): - Unit tests for bindings, semconv, session, instrumentation, optimization - E2E integration test (test_e2e_m1_pipeline.py): real LangGraph with StubLLM proving full pipeline instrument → invoke → OTLP → TGJ → optimizer → apply_updates → re-invoke with updated template Notebook + docs: - 01_m1_instrument_and_optimize.ipynb: dual-mode (StubLLM + live OpenRouter), Colab badge, executed outputs, <=3 item dataset, temperature=0, max_tokens=256 budget guard - docs/m1_README.md: architecture, API reference, data flow, semantic conventions, acceptance criteria status - requirements.txt: pinned dependencies for uv/pip environments

A. Live mode error handling: - A1: TracingLLM raises LLMCallError on HTTP errors/empty content instead of passing error strings as assistant content - A2: Notebook only prints [OK] when provider call actually succeeds with non-empty content - A3: gen_ai.provider.name correctly set to "openrouter" (not "openai") when using OpenRouter - A4: optimize_graph forces score=0 on invocation failure, bypassing eval_fn B. TelemetrySession API correctness + redaction: - B5: flush_otlp(clear=False) properly peeks at spans without clearing the exporter - B6: span_attribute_filter now applied during flush_otlp; supports drop (return {}), redact, and truncate C. TGJ/ingest correctness and optimizer safety: - C7: _deduplicate_param_nodes() strips numeric suffixes to collapse duplicate ParameterNodes - C8: _select_output_node() excludes child LLM spans, selects the true sink (synthesizer) D. OTEL topology and temporal chaining: - D9: Root invocation span wraps graph.invoke(), producing a single trace ID per invocation - D10: Temporal chaining uses trace.temporal_ignore attribute instead of OTEL parent presence E. optimize_graph semantics + trace-linked reward: - E11: best_parameters is a real snapshot captured at the best-scoring iteration - E12: eval.score attached to root invocation span before flush, linking reward to trace F. Non-saturating scoring for Stub mode: - F13: StubLLM and eval_fn are structure-aware; stub optimization demonstrates score improvement Files changed: - langgraph_otel_runtime.py: LLMCallError, _validate_content, flush_otlp(clear=) - telemetry_session.py: flush_otlp delegation, _apply_attribute_filter - otel_adapter.py: root span exclusion, trace.temporal_ignore chaining - instrumentation.py: _root_invocation_span context manager, root span on invoke/stream - optimization.py: _deduplicate_param_nodes, _select_output_node, _snapshot_parameters, eval-in-trace - __init__.py: export LLMCallError - test_optimization.py: updated for best_parameters field - 01_m1_instrument_and_optimize.ipynb: all fixes reflected in notebook - test_client_feedback_fixes.py: 20 new tests covering all 13 issues

… code Make the instrumentation layer fully generic and provider-agnostic: - TracingLLM: default provider_name "openai" → "llm", default llm_span_name "openai.chat.completion" → "llm.chat.completion" - init_otel_runtime: default service_name "trace-langgraph-demo" → "trace-otel-runtime" - DEFAULT_EVAL_METRIC_KEYS: remove example-specific "plan_quality", add generic "score" - instrument_graph: add llm_span_name, input_key, output_key parameters so callers explicitly configure provider/schema specifics - InstrumentedGraph: add input_key field; invoke()/stream() use it instead of hardcoded "query" for the root span hint - optimize_graph: add output_key parameter; _make_state uses graph.input_key instead of hardcoded "query"; error fallback no longer assumes result["answer"] - _select_output_node: replace hardcoded "openai"/"chat.completion" name checks with trace.temporal_ignore attribute from info.otel - otel_adapter: propagate temporal_ignore flag into TGJ info dict - tgj_ingest: preserve info.otel metadata through conversion and onto MessageNode objects Tests and notebook updated to explicitly pass example-specific values (provider_name, llm_span_name, output_key) rather than relying on defaults. All 88 tests pass.

…st iteration Previously, best_updates was overwritten on every iteration where updates were applied, regardless of whether that iteration achieved the best score. This caused best_updates to always contain the last applied updates rather than the updates that produced the best-performing parameters. Introduce last_applied_updates to track the most recently applied updates separately, and snapshot it at the start of each iteration as applied_updates_for_this_iter. best_updates is now only assigned inside the best-score guard (avg_score > best_score), ensuring it accurately reflects the updates that led to best_parameters. Addresses PR feedback item doxav#1: optimize_graph() best_updates tracking.

optimize_graph() previously ignored the graph's configured output_key unless the caller explicitly passed output_key=..., causing incorrect eval payload shape. Now auto-inherits graph.output_key when the parameter is not provided, and logs a debug note when an explicit override disagrees with the graph's configuration. Addresses PR feedback item doxav#2: output_key fallback in optimize_graph.

enable_code_optimization was accepted by instrument_graph() but never used — TracingLLM.emit_code_param always remained None. Now constructs a _emit_code_param callback when the flag is True that emits source code, SHA-256 hash, truncation metadata, and trainable marker as param.__code_* span attributes. Source is capped at 10K chars with truncation flag. Addresses PR feedback item doxav#3: enable_code_optimization no-op.

(4A) otel_adapter: after temporal hierarchy resolution, null out effective_psid when it still references a skipped root invocation span, preventing dangling parent edges in the TGJ graph. (4B) langgraph_otel_runtime: capture child LLM span ref and propagate error/error.type attributes to it on LLMCallError and unexpected exceptions, so OTEL UIs correctly flag the LLM call as failed. Addresses PR feedback item doxav#4.

…race validation Notebook trace validation used "openai" in name to detect child spans, which silently matched nothing after the generic refactoring. Now uses trace.temporal_ignore attribute for provider-agnostic detection and asserts the set is non-empty. Also adds root invocation span assertion to enforce the D9 single-trace-ID invariant. Addresses PR feedback item doxav#6.

…into m1-for-upstream

…into m1-for-upstream # Please enter a commit message to explain why this merge is necessary, # especially if it merges an updated upstream into a topic branch. # # Lines starting with '#' will be ignored, and an empty message aborts # the commit.

…e spans Library (langgraph_otel_runtime.py): - Restructure child LLM span error handling: catch errors inside the child span context manager so attributes are set before the span ends - Add error.message attribute (truncated to 500 chars) on both parent and child spans for LLMCallError and unexpected exceptions Notebook (01_m1_instrument_and_optimize.ipynb): - Rewrite graph to 6-node architecture aligned with reference demo: planner → executor → web_researcher/wikidata_researcher → synthesizer → evaluator - Use Command routing from langgraph.types for dynamic node dispatch - Switch to DEMO_QUERIES (French Revolution / Tesla / CRISPR) - Add 3 trainable templates (planner, executor, synthesizer) with output_key=final_answer - Rewrite StubLLM to produce JSON plans, routing JSON, and topic-aware answers; respond to prompt template changes for non-saturating scoring - Rewrite stub_eval_fn: base 0.2 + plan richness + answer length, cap 0.95 - Fix live section: provider_name="openrouter", trace invariant checks, only print [OK] on actual success - Fix ParameterNode deduplication in TGJ inspection (id-based dedup) - Update Colab Drive paths to OpenTrace_runs/M1/{OPENTRACE_REF} - Add optimization table output (iteration → avg_score → best_score) Verified: 41 tests pass, notebook runs end-to-end, baseline=0.75 → best=0.95

- Replace rate-limited meta-llama/llama-3.3-70b-instruct:free with qwen/qwen3-next-80b-a3b-instruct:free (instruction-tuned, no thinking traces) - Use eval_fn=None in Section 9 live optimization so optimize_graph() uses the library's _default_eval_fn which reads eval.score from the evaluator span in the OTLP trace - Fix Cell 30 header to say 'openai client' instead of 'Trace LiteLLM'

…te limits

apply_updates() now normalizes ParameterNode object keys to strings via _normalize_key(), so OptoPrimeV2 updates are no longer silently skipped. ingest_tgj() gains a param_cache to reuse stable ParameterNode instances across multi-query iterations. The backward pass now iterates all output nodes, and stale OTLP spans are flushed at the start of optimize_graph(). - bindings.py: accept Dict[Any, Any], return applied dict - tgj_ingest.py: add param_cache kwarg for ParameterNode reuse - optimization.py: flush stale spans, use param_cache, fix backward loop, use applied dict from apply_updates() - notebook: enable INFO logging in live optimization cell

The GraphPropagator asserts that user_feedback is identical when aggregating across multiple backward passes. Running zero_feedback → backward → step per query (matching the BBEH notebook pattern) avoids this and lets each query contribute updates independently.

…optimizer steps Replace the per-query backward/step loop with Trace's canonical minibatch pattern: batchify all output nodes into a single batched target and all per-query feedback into a single batched feedback string, then call backward() and step() once. This avoids the GraphPropagator assertion ("user feedback should be the same for all children") while ensuring all queries' graph paths contribute to the optimization gradient. The batchify import is lazy-loaded via _ensure_trace_imports() to avoid pulling in numpy and the trainer package at module level.

Implement TelemetrySession activation via contextvars so @trace.bundle ops and MessageNode creation can emit OTEL spans outside LangGraph. - Add BundleSpanConfig and MessageNodeTelemetryConfig to control span emission and node-to-span binding (message.id) - Add bundle_span() context manager and on_message_node_created() hook in TelemetrySession for non-LangGraph OTEL visibility - Wrap sync_forward/async_forward in optional OTEL span when session active - Emit temporal-ignore child span in call_llm for provider monitoring - Activate session inside InstrumentedGraph root span so Trace hooks discover it automatically - Add opto.features.mlflow with autolog/disable_autolog (safe no-op when MLflow not installed) - Add opto.trace.settings for global MLflow toggle - Align export naming to otlp.json/tgj.json with legacy aliases - Add manifest.json and message_nodes.jsonl to export bundle

Covers all M2 features: TelemetrySession activation, bundle span emission, default-op silencing, MessageNode binding, call_llm temporal-ignore spans, export bundle naming, MLflow autolog API, M1 non-breaking compatibility, and end-to-end non-LangGraph pipeline. Includes live OpenRouter sections (auto-skipped if no API key).

…ooks and remove stale files Move 02_m2_unified_telemetry.ipynb into examples/notebooks/ for consistency with the M1 notebook location. Remove leftover files from the repo root: M1 notebook copy, OVERVIEW.md, and PR diff files." git push origin m2-unified-telemetry

…ng works postprocess_output (which creates the MessageNode) was called after the bundle span had closed, so on_message_node_created could never find an active span to attach message.id to. Move it inside the span_cm block for both sync_forward and async_forward.

The install cell only cloned on first run but never pulled updates when the repo folder already existed, causing stale code to persist across runtime restarts. Added git fetch + pull to guarantee the

- Updated M2 notebook install cell to add repo root to sys.path when running locally, eliminating the need for pip install - Added git fetch + pull to Colab install cell so restarts pick up latest commits instead of using stale cloned code - Removed debug probe from MessageNode binding cell - Relaxed setup.py python_requires from >=3.13 to >=3.12

Added Sections 8a-8c that install MLflow and validate real integration paths: autolog enabling, bundle wrapping via mlflow.trace(), artifact logging via TelemetrySession, and log_metric/log_param recording.

@Bundle

… compatibility mlflow.trace() wrapping accesses fn.__name__ on the decorated callable. FunModule (the object returned by @Bundle) did not expose this attribute, causing an AttributeError when executing bundle-decorated functions inside an active MLflow run. Forward the original function's __name__ and __qualname__ onto the FunModule instance.

…flow.trace() - Add Section 8d to M2 notebook: launches MLflow UI inline on Colab (port 5000) for visual inspection of experiments, runs, artifacts, and metrics. Falls back to terminal instructions when running locally. - Expose __name__ and __qualname__ on FunModule so mlflow.trace() can resolve the function name without AttributeError. - Update notebook summary tables (header + footer) to include Section 8d.

…n 8d) Renders an embedded iframe and a direct "Open in new tab" link using Colab's proxyPort API so users can visually inspect MLflow experiments, runs, artifacts, and metrics logged by the preceding test cells. Also exposes __name__/__qualname__ on FunModule to fix AttributeError when mlflow.trace() wraps @bundle-decorated functions.

Replace proxyPort-based link (blocked by Colab pop-up blocker) with subprocess.Popen + serve_kernel_port_as_iframe for reliable inline rendering of the MLflow UI in notebook output.

The call was passing unsupported kwargs (operation, output_messages, response, temperature, max_tokens) which silently raised TypeError under the bare except, leaving gen_ai.input.messages and gen_ai.output.messages unset. Use the actual signature parameters (provider, model, input_messages, output_text) so the semconv attributes are recorded on the LLM child span.

Replace single _token with _token_stack list so that nesting with session: on the same TelemetrySession instance correctly restores the context variable on each exit instead of leaking the active session.

Allow activating a TelemetrySession without indenting all pipeline code under a with-block. Useful in notebooks and long scripts. Both methods share the _token_stack so they compose safely with context-manager activation and nested calls.

… cells Add Section 8e validating MessageNodeTelemetryConfig(mode="span") which creates dedicated spans when no active span exists. Add Section 8f validating the full OTLP -> TGJ -> ingest_tgj() round-trip that underpins the optimization data path. Update header and summary tables accordingly.

…_signature__ MLflow's capture_function_input_args uses inspect.signature(func) to bind args. FunModule inherited Module.__call__(self, *args, **kwargs), so inspect returned the wrong signature and bind failed or produced bad data. Set __signature__ = inspect.signature(fun) so MLflow sees the real parameter names (x, y) and can capture inputs correctly. Remove the previous warning suppression and note from the notebook.

doxav and others added 30 commits February 12, 2026 15:01

checkpoint of WIP JSON OTEL demo

192949c

working OTEL/LANGGRAPH demo

2f1794b

converted demo JSON/OpenTelemetry to LangGraph

bc0b304

checkpoint

e81ad34

OTEL/JSON/LANGGRAPH demo: add a mechanism to ensure multiple optimiza…

a71e1ed

…tion do not lose initial node to optimize (TODO: trainer might have a better solution)

ADDED batchify for handling the multiple feedback in a batch + ADDED …

53871aa

…a lot of logs for further analysis

working code optimization - TODO: clean, simplify the code

87d3c67

fixed code optimization

da80055

ADD synthtizer prompt in optim score > High score

d88a779

TEST removing span/OTEL from optimized code

d03fec5

fixed and updated LangGraph/Otel demo README

1692a89

restore

1c75117

ADD demo and tests for native LangGraph integration with OTEL tracing

779db55

ADD refactor run_graph_with_otel to support custom evaluation functio…

23a377c

…ns and doc evaluation hooks

ADD implement run_benchmark function to compare different feedback mode

d19ba70

Fix Colab badge URL: replace placeholders with actual repo/branch path

30a89c8

Update T1 tech plan: notebooks + acceptance alignment + fixed opto/tr…

c85baf8

…ace/io/otel_adapter.py

Merge branch 'experimental' of https://github.com/AgentOpt/OpenTrace …

c39bed8

…into m1-for-upstream

mjehanzaib999 and others added 30 commits February 21, 2026 00:10

fix: use OPENAI_BASE_URL env var for OpenRouter routing

0f09eed

fix: add openai/ prefix for litellm OpenRouter routing

6fb3c92

fix: use openai package directly for OpenRouter, add smoke test

73f48bf

fix: use paid meta-llama/llama-3.3-70b-instruct to avoid free-tier ra…

82d0c31

…te limits

fix: correct Colab badge URL to point to fork and M2 branch

9af43f2

fix: ensure Colab install cell fetches latest code on re-run

9d34ea2

The install cell only cloned on first run but never pulled updates when the repo folder already existed, causing stale code to persist across runtime restarts. Added git fetch + pull to guarantee the

fix: use string-compatible function in span redaction test cell

485a199

fix: add missing cell IDs to M2 notebook for GitHub compatibility

3e85f65

fix: revert M2 notebook to nbformat 4.4 for GitHub compatibility

f460889

fix: use llama-3.3-70b-instruct model in M2 notebook to match M1

5ed7bd4

Created using Colab

617eaee

feat: add MLflow integration tests to M2 notebook

f4a5b43

Added Sections 8a-8c that install MLflow and validate real integration paths: autolog enabling, bundle wrapping via mlflow.trace(), artifact logging via TelemetrySession, and log_metric/log_param recording.

fix: use serve_kernel_port_as_iframe for MLflow UI on Colab

8ec9d1c

Replace proxyPort-based link (blocked by Colab pop-up blocker) with subprocess.Popen + serve_kernel_port_as_iframe for reliable inline rendering of the MLflow UI in notebook output.

fix: use token stack for TelemetrySession nested activation

ea77845

Replace single _token with _token_stack list so that nesting with session: on the same TelemetrySession instance correctly restores the context variable on each exit instead of leaking the active session.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PR -> feat: Unified Telemetry Layer for Non-LangGraph Trace Pipelines (M2) Body#64

PR -> feat: Unified Telemetry Layer for Non-LangGraph Trace Pipelines (M2) Body#64
mjehanzaib999 wants to merge 62 commits intoAgentOpt:experimentalfrom
mjehanzaib999:m2-unified-telemetry

mjehanzaib999 commented Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

mjehanzaib999 commented Feb 26, 2026

Summary

What's new

Files changed (9 files, +664 / -71)

Non-breaking guarantees

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants