[FEATURE] First-class compatibility with Observe Inc (board, docs, monitors)

## Feature Description

First-class compatibility with [Observe Inc](https://www.observeinc.com/) so InferCost users running Observe as their monitoring backend can ingest GPU/token cost metrics, get a pre-built Observe board (the equivalent of the Grafana dashboard we ship), and use Observe's monitor primitives for budget alerting without writing OPAL from scratch.

## Problem Statement

> As an InferCost operator standardized on Observe (not Grafana/Datadog), I want the same per-namespace, per-model, per-GPU cost visibility I'd get from the Grafana dashboard, surfaced inside Observe's UI with sensible defaults so I don't have to reverse-engineer the metric semantics from scratch.

Today the README claims "Any monitoring tool (Grafana, Datadog, New Relic, etc.)" but the only concrete artifact we ship is the Grafana dashboard JSON. Anyone running Observe has to:

1. Stand up the Prometheus → Observe scrape path themselves.
2. Reverse-engineer the 12 InferCost metric names + label cardinality from the operator source.
3. Write OPAL queries by hand for cost-per-token, namespace breakdown, GPU power, etc.
4. Author monitors for budget breach alerts.

For a tool whose core promise is "AI cost attribution out of the box," asking operators to do that work themselves contradicts the value prop on the dimension that matters most to Observe customers.

## Proposed Solution

Three deliverables, each independently shippable:

### 1. `config/observe/` — pre-built board (equivalent of the Grafana JSON)

A reproducible Observe board definition (whatever format Observe currently supports for board-as-code; check their `terraform-provider-observe`, `observectl`, or board export JSON) covering the same panels as our Grafana dashboard:

- Cost-per-token (USD), faceted by model + namespace
- GPU power draw (watts), faceted by node + GPU index
- Hourly cost vs cloud equivalent
- UsageReport CRD summaries
- Cumulative spend per namespace

Ship as `config/observe/infercost-board.json` (or `.tf` if Terraform is the canonical path).

### 2. `docs/integrations/observe.md` — setup recipe

End-to-end recipe an Observe customer can follow:

- Point Observe's Prometheus collector at the InferCost operator's `/metrics` endpoint (or use the PodMonitor we already ship, paired with the kube-prometheus-stack remote-write to Observe).
- Import the board.
- (Optional) Map our `infercost_*` metric names to Observe's "Dataset" abstraction so they appear under a recognizable namespace.

### 3. `config/observe/monitors.yaml` — pre-built Observe monitors

Three alerts that mirror our `TokenBudget` CRD semantics:

- `infercost_budget_breach`: hourly cost exceeds the namespace budget for two consecutive samples.
- `infercost_gpu_idle_with_load`: GPU power < 50W while requests/sec > 0 (signals a stuck llama.cpp pod, dual-use with LLMKube's own alerts).
- `infercost_anomaly_per_token_cost`: per-token cost > 2× the 24h trailing mean (detects a misconfigured CostProfile or a stuck-loop model).

These are also useful as the source of truth for the `Alert` phase row in our Roadmap.

**Example operator config**: no new CRD fields, just a new optional values block in the Helm chart so `helm install ... --set integrations.observe.enabled=true` materializes the relevant ConfigMap + (eventually) auto-provisions the board via Observe's API. Phase 1 ships the artifacts; phase 2 (later) automates installation.

## Alternatives Considered

- **Do nothing; users figure it out.** Rejected: this is the same argument we already rejected for Grafana, and Grafana support is one of the most-cited reasons people pick InferCost. Parity for Observe is just consistency with the value prop.
- **Generic OpenTelemetry-only path.** Considered. We do plan to expose OTLP metrics in a future PR (it's the right long-term abstraction). But OTLP alone doesn't ship dashboards or alerts; those are the part operators actually need. The Observe artifacts on top of OTLP/Prometheus would still be the work item.
- **Wait for Observe to write the integration themselves.** Most vendors do publish third-party content; relying on it is fragile and the value prop is sharper when it's first-party. Compare: we ship the Grafana dashboard ourselves rather than depending on grafana.com community boards.

## Additional Context

- Related issues: #
- Similar features in other projects:
  - Datadog Marketplace integrations for cost tools (e.g. Vantage)
  - Grafana Cloud's first-party LLM cost dashboards
  - OpenCost / Kubecost — both publish vendor integration recipes for the major observability backends

## Priority

- [ ] Critical
- [ ] High
- [x] Medium — Nice to have (high value for Observe-shop prospects; low value for everyone else)
- [ ] Low

## Willingness to Contribute

- [x] Yes, I can submit a PR
- [ ] Yes, I can help test
- [ ] No, but I can provide feedback

## Notes for the implementer

- **Verify Observe's current board-as-code format first.** Their product surface evolves; what was "JSON board export" two years ago may be Terraform-only or API-only today. The right starting point is `observectl` or the `terraform-provider-observe` README.
- **Cardinality check.** Our 12 metrics have ~5-15 label combinations each. Observe's pricing is volume-based on event ingest; the recipe should call out the expected daily event volume so customers don't get a surprise bill.
- **Auth model.** Observe uses workspace API tokens. The setup doc should be specific about least-privilege scope (read on Datasets, write on Boards and Monitors, no admin).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] First-class compatibility with Observe Inc (board, docs, monitors) #65

Feature Description

Problem Statement

Proposed Solution

1. `config/observe/` — pre-built board (equivalent of the Grafana JSON)

2. `docs/integrations/observe.md` — setup recipe

3. `config/observe/monitors.yaml` — pre-built Observe monitors

Alternatives Considered

Additional Context

Priority

Willingness to Contribute

Notes for the implementer

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[FEATURE] First-class compatibility with Observe Inc (board, docs, monitors) #65

Description

Feature Description

Problem Statement

Proposed Solution

1. config/observe/ — pre-built board (equivalent of the Grafana JSON)

2. docs/integrations/observe.md — setup recipe

3. config/observe/monitors.yaml — pre-built Observe monitors

Alternatives Considered

Additional Context

Priority

Willingness to Contribute

Notes for the implementer

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

1. `config/observe/` — pre-built board (equivalent of the Grafana JSON)

2. `docs/integrations/observe.md` — setup recipe

3. `config/observe/monitors.yaml` — pre-built Observe monitors