-
-
Notifications
You must be signed in to change notification settings - Fork 18
[Infrastructure] Integrate OpenTelemetry (OTel) Tracing for the Safety Pipeline #7
Copy link
Copy link
Open
Labels
backendIssues involving server-side logic, databases, or APIsIssues involving server-side logic, databases, or APIsenhancementNew feature or requestNew feature or requestlevel:advancedRequires advanced implementation, bug fixing or refactoringRequires advanced implementation, bug fixing or refactoringothersMiscellaneous issues that don’t fit other categoriesMiscellaneous issues that don’t fit other categoriestype:devopsPipeline or deployment relatedPipeline or deployment relatedtype:docsDocumentation changeDocumentation changetype:featureAdds a new featureAdds a new featuretype:performanceFixes performance issuesFixes performance issues
Metadata
Metadata
Assignees
Labels
backendIssues involving server-side logic, databases, or APIsIssues involving server-side logic, databases, or APIsenhancementNew feature or requestNew feature or requestlevel:advancedRequires advanced implementation, bug fixing or refactoringRequires advanced implementation, bug fixing or refactoringothersMiscellaneous issues that don’t fit other categoriesMiscellaneous issues that don’t fit other categoriestype:devopsPipeline or deployment relatedPipeline or deployment relatedtype:docsDocumentation changeDocumentation changetype:featureAdds a new featureAdds a new featuretype:performanceFixes performance issuesFixes performance issues
🎯 Objective
Instrument the 3-stage cascade pipeline with OpenTelemetry spans so enterprise users can monitor HumaneProxy in Datadog, Grafana, or Jaeger.
💡 Why this matters
When an AI agent fails or is slow, developers need to know why. If HumaneProxy adds latency because Stage 3 (Groq API) is slow, it needs to be visible in the company's distributed tracing dashboard. OpenTelemetry is the industry standard for this.
✅ Acceptance Criteria
opentelemetry-apiandopentelemetry-sdkas optional dependencies (pip install humane-proxy[telemetry]).proxy.check_async()and the individual stages (Heuristics, Embeddings, Reasoning LLM) in OpenTelemetry spans.humane_proxy.session_id,humane_proxy.final_score,humane_proxy.stage_reached).config.yaml(telemetry.enabled: true). If disabled, there should be zero performance overhead.📚 Resources