[FEATURE] Support diffusion model cost tracking (Flux, Stable Diffusion)

## Feature Description

InferCost currently tracks cost for LLM inference workloads by scraping llama.cpp/vLLM token metrics. Diffusion model workloads (Flux, Stable Diffusion, ComfyUI) also consume significant GPU resources but are invisible to InferCost because they don't produce tokens.

## Problem Statement

A Flux image generation server consuming a full GPU (150W+) incurs real hardware and electricity costs, but InferCost doesn't track it because:
1. Pod discovery is hardcoded to the `inference.llmkube.dev/model` label
2. The scraper only understands llama.cpp token metrics
3. The cost model is token-specific (`cost_per_token`, `tokens_per_hour`)

## Proposed Solution

Generalize the cost model from tokens to "work units":

| Workload | Unit | Metric Source |
|----------|------|--------------|
| LLM | Tokens | `llamacpp:tokens_predicted_total` |
| Diffusion | Images or Steps | `images_generated_total`, `diffusion_steps_total` |
| Embeddings | Requests | `requests_total` |
| Audio/TTS | Seconds | `audio_seconds_total` |

The core cost formula doesn't change: `hourly_cost / units_per_hour = cost_per_unit`

### Implementation Approach

1. **Pluggable scraper interface**: Define a `WorkloadScraper` interface with adapters for llama.cpp, vLLM, and diffusion frameworks
2. **Configurable pod discovery**: Support custom label selectors or annotations beyond `inference.llmkube.dev/model`
3. **Generalize snapshots**: Replace `TokenSnapshot` with `WorkloadSnapshot` carrying a `UnitType` field
4. **Skip cloud comparison for non-token workloads**: No standard cloud pricing exists for image generation APIs in the same way

### Files That Would Need Changes

- `internal/scraper/` - Add scraper interface + diffusion adapter
- `internal/controller/costprofile_controller.go` - Dispatcher on workload type
- `internal/calculator/calculator.go` - Generic rate/cost computation (minimal change)
- `internal/api/store.go` - Generalize `ModelData` structure
- `internal/metrics/metrics.go` - New metric families for non-token workloads

## Why This Matters

Organizations running mixed AI workloads (LLMs + image generation + embeddings) on shared GPU infrastructure need cost visibility across all of them, not just LLMs. This is the difference between tracking 60% of GPU costs and tracking 100%.

## Alternatives Considered

- Tracking only GPU-hours for non-LLM workloads (loses per-unit granularity)
- Requiring users to add LLMKube labels to non-LLMKube pods (hacky, breaks the "works with any stack" promise)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Support diffusion model cost tracking (Flux, Stable Diffusion) #8

Feature Description

Problem Statement

Proposed Solution

Implementation Approach

Files That Would Need Changes

Why This Matters

Alternatives Considered

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Workload	Unit	Metric Source
LLM	Tokens	`llamacpp:tokens_predicted_total`
Diffusion	Images or Steps	`images_generated_total`, `diffusion_steps_total`
Embeddings	Requests	`requests_total`
Audio/TTS	Seconds	`audio_seconds_total`

[FEATURE] Support diffusion model cost tracking (Flux, Stable Diffusion) #8

Description

Feature Description

Problem Statement

Proposed Solution

Implementation Approach

Files That Would Need Changes

Why This Matters

Alternatives Considered

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions