Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
94a46c4
Name clarification
jonross May 12, 2026
d24735c
Improve comments & naming
jonross May 14, 2026
5c9f377
Rename resource type -> resource family
jonross May 14, 2026
9554829
Revert "Rename resource type -> resource family"
jonross May 15, 2026
48d08f6
Clarify when something is a resource type vs a resource
jonross May 15, 2026
61a709a
Add CLAUDE.md
jonross May 15, 2026
8e75b7a
Tweaks
jonross May 15, 2026
66253d1
Add row_source unit tests for multi-step, kv+parent-nav, and double-p…
jonross May 15, 2026
2ad9bc6
Doc tweak
jonross May 15, 2026
00e8a1d
Document how to run tests correctly in CLAUDE.md
jonross May 15, 2026
8f53efe
Add services and deployments as built-in Kubernetes tables
jonross May 15, 2026
bb6397f
Add events as a built-in Kubernetes table
jonross May 16, 2026
cc9f038
Add some Claude discussions
jonross May 16, 2026
44392b3
Doc tweaks
jonross May 16, 2026
ad264c4
Include README subset in index.rst
jonross May 16, 2026
31f0a19
Replace GPU example with a universally-relevant node memory pressure …
jonross May 16, 2026
8433b67
Tweak README
jonross May 16, 2026
587bdef
Rename cache/reckless flags; add --context
jonross May 16, 2026
5199ce4
Wire --context flag through to kubectl invocations
jonross May 16, 2026
7165b84
Add missing doc for --context
jonross May 16, 2026
fad2b83
Remove old Markdown docs
jonross May 16, 2026
aa5f1fb
Update changelogs
jonross May 16, 2026
b0e0ad2
Doc tweak
jonross May 16, 2026
1511915
Update row_source plan
jonross May 17, 2026
f1a0b72
Implement named scopes in row_source (Phase 1)
jonross May 17, 2026
9402acf
Move scope parsing into FieldRef.parse_scoped
jonross May 17, 2026
08cf705
Remove unused method
jonross May 17, 2026
2df4662
Clarify field name: Itemizer.scope_name
jonross May 17, 2026
48da528
Child scope optimization
jonross May 17, 2026
2bd7b42
Switch scope reference syntax from prefix to suffix: 'expr in scope'
jonross May 17, 2026
d3b9fc0
Add 'from:' column key with auto-detection of label vs path
jonross May 17, 2026
8eadc6c
Update docs, examples, and CHANGELOG for named scopes and from: key
jonross May 17, 2026
56903ce
Doc tweaks
jonross May 17, 2026
798580f
Doc tweaks
jonross May 17, 2026
e52baf2
Add comments
jonross May 17, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
201 changes: 201 additions & 0 deletions .claude/plans/row-source.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,201 @@
# Implementation Plan: Named Scopes + `from:` Unification

Two related improvements to the YAML extension mechanism. They can be implemented
sequentially on one branch or separately; Phase 1 is a prerequisite for Phase 2's
scope-aware path resolution.

Scope references use a consistent `in <name>` suffix, mirroring the `as <name>` suffix
in `row_source` declarations: `as` binds a name, `in` references it.

---

## Phase 1: Named Scopes in `row_source`

**Status: implemented with `<scope>.` prefix syntax — needs revision to `in <scope>` suffix.**

### Goal

Replace the `^` parent-hop syntax with named scope references.

**Single row_source** (the common case — default `["items"]` or one explicit entry):
path expressions resolve against the one implicit object; no scope qualifier required.

**Multiple row_source entries**: every entry must carry `as <name>`, and every path /
label expression must end with an explicit `in <name>` qualifier. There is no implicit
"current object" when more than one level exists.

```yaml
create:
- table: node_taints
resource: nodes
row_source:
- items as node
- spec.taints as taint
columns:
- name: node_uid
path: metadata.uid in node
- name: taint_key
path: key in taint
```

### Changes

**`kugl/impl/tables.py` — `Itemizer`**

- Parse `as <name>` suffix from row_source entries. `"items as node"` yields
`Itemizer(expr="items", name="node", finder=..., unpack=False)`.
- Store `name: Optional[str]` on the dataclass.

**`kugl/impl/tables.py` — `RowContext`**

- Add `_scopes: dict[int, dict[str, object]]`. Key is `id(child)`; value is the
map of scope names visible at that child's level.
- `set_scope(child, name, parent)` records the child's scope map, inheriting all
ancestor scopes from parent and adding `name → child`.
- Add `get_scope(obj, name) -> Optional[object]` that looks up the named object.

**`kugl/impl/tables.py` — `TableFromConfig._itemize`**

- After calling `context.set_parent(child, item)`, also call
`context.set_scope(child, source.name, item)` when `source.name` is not None,
carrying forward all ancestor scopes so deeper levels can still reference `node`.

**`kugl/impl/extract.py` — `FieldRef` / `PathExtractor` / `LabelExtractor`**

- `FieldRef.parse`: remove `^` handling; detect a trailing ` in <word>` suffix as a
scope name. Store as `scope_name: Optional[str]` and strip it from the target
before JMESPath compilation.
- In `PathExtractor.extract` and `LabelExtractor.extract`, when `self._ref.scope_name`
is set, resolve the object via `context.get_scope(obj, scope_name)`.
- Validation at table-build time (`TableFromConfig.__init__`): if `len(row_source) > 1`,
every `row_source` entry must have a name and every column path/label must carry an
`in <name>` qualifier; raise a clear `ConfigError` if either constraint is violated.

### Builtin Update

`kugl/builtins/schemas/kubernetes.yaml` — convert `node_taints` to use named scopes
as a self-contained example:

```yaml
row_source:
- items as node
- spec.taints as taint
columns:
- name: node_uid
path: metadata.uid in node
- name: taint_key
path: key in taint
```

### Tests

- Update the existing `node_taints` test (wherever it lives) to verify the new
syntax produces the same output.
- Add a new test with three levels of nesting (e.g. `pod → container → env`) using
two named scopes, verifying that both ancestor levels are reachable by name.
- Add a test that `^` in a path raises a clear parse error.
- Add a test that a multi-step `row_source` with a missing `as` name raises a `ConfigError`.
- Add a test that a multi-step `row_source` with a bare (un-scoped) column path raises a `ConfigError`.

---

## Phase 2: `from:` Key Unification

### Goal

Replace the two-key `path:` / `label:` vocabulary with a single `from:` key that
auto-detects extraction type. Named scope qualifiers compose naturally via the same
`in <name>` suffix.

Single row_source (no scope qualifier needed):

```yaml
columns:
- name: node_pool
from: karpenter.sh/nodepool # auto-detected: label
- name: provider_id
from: spec.providerID # auto-detected: JMESPath
```

Multi-step row_source (all entries named, all columns scoped):

```yaml
row_source:
- items as pod
- spec.containers as container
columns:
- name: pod_name
from: metadata.name in pod # JMESPath on pod scope
- name: pod_pool
from: karpenter.sh/nodepool in pod # label on pod scope — unambiguous
- name: container_name
from: name in container # JMESPath on container scope
```

### Auto-Detection Rule

Strip any trailing ` in <word>` suffix first, then apply to the remainder:

- Matches `[a-zA-Z0-9.-]+/[a-zA-Z0-9._/-]+` (K8s label format: DNS domain + `/` +
key) → `LabelExtractor`
- Otherwise → `PathExtractor`

A value like `metadata.labels.foo/bar` is a JMESPath, not a label — the `/` appears
inside a path segment, not as the label-domain separator. The regex handles this
correctly because `metadata.labels.foo` is not a valid DNS domain segment.

Parsing ` in <word>` is safe because neither JMESPath expressions nor label keys
contain spaces, so the delimiter is unambiguous.

### Changes

**`kugl/impl/config.py` — `UserColumn`**

- Add `from_: Optional[str] = Field(None, alias="from")` (Pydantic alias needed
because `from` is a Python keyword).
- In `gen_extractor`, handle `from_` alongside `path` and `label`.
- If `from_` is set alongside `path` or `label`, raise `ValueError`.
- Strip any ` in <word>` suffix from `from_` to extract the scope name.
- Apply the label-vs-path regex to the remainder.
- Construct the appropriate extractor, passing the scope name through.
- Keep `path:` and `label:` fully supported so existing configs are not broken.

**`kugl/impl/extract.py` — `FieldRef`**

- Centralise the ` in <scope>` parsing in `FieldRef.parse_scoped(s)`; both
`gen_extractor` (for `from:`) and `FieldRef.parse` (for `path:`/`label:`) delegate
to it.
- Known scopes are not available at Pydantic parse time. Use lazy validation: accept
any ` in <word>` suffix as a potential scope; fail at table-build time in
`TableFromConfig.__init__` if the referenced scope name is not declared in
`row_source`.

### Tests

- `from: karpenter.sh/nodepool` produces the same result as `label: karpenter.sh/nodepool`.
- `from: spec.providerID` produces the same result as `path: spec.providerID`.
- `from: metadata.name in pod` with a named `pod` scope resolves correctly.
- `from: karpenter.sh/nodepool in pod` with a named `pod` scope resolves as a label
on the pod object.
- Error: `from:` and `path:` both specified → validation error.
- Error: `from: foo in unknownscope` where `unknownscope` is not in `row_source` → clear
error message at table-build time.

---

## Files Touched

| File | Change |
|---|---|
| `kugl/impl/extract.py` | `FieldRef.parse`: detect ` in <scope>` suffix; extractors: resolve via scope |
| `kugl/impl/tables.py` | `Itemizer`: parse `as <name>`; `RowContext`: track named scopes |
| `kugl/impl/config.py` | `UserColumn`: add `from_` field and dispatch in `gen_extractor` |
| `kugl/builtins/schemas/kubernetes.yaml` | Convert `node_taints` to named scope syntax |
| `tests/` | Update node_taints test; add multi-level and `from:` tests |

---

## Out of Scope

- The broader resource-coverage gaps from `discuss.md` (deployments, containers table,
etc.) are separate work and should not be bundled here.
113 changes: 113 additions & 0 deletions .claude/plans/shortcomings.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
# Kugl Discussion Summary

## What Kugl Is

Kugl is a Python CLI tool that queries Kubernetes resources using SQL (SQLite). It runs `kubectl get` commands, caches the JSON output, and loads it into an in-memory SQLite database. Users write SQL queries directly on the command line or via saved shortcuts.

Built-in tables: `pods`, `jobs`, `nodes`, `node_labels`, `pod_labels`, `job_labels`, `node_taints`. Resource types, namespaces, and cache TTL are controlled via CLI flags (`-a`, `-n`, `-u`, `-c`, `-t`).

Kugl automatically converts Kubernetes-specific value formats to queryable numerics: `50Mi` → bytes, `100m` CPU → float, ISO8601 timestamps → epoch seconds. Helper functions `to_size()`, `to_age()`, `to_utc()` convert back to human-readable strings for output.

---

## Strengths

- **SQL is better than jq for aggregation.** Queries involving `GROUP BY`, `SUM`, `JOIN`, `ORDER BY`, and CTEs are dramatically more readable in SQL than in jq pipelines. The target use case — "how is compute distributed across node pools and taints?" — is well served.
- **Automatic type coercion.** CPU, memory, and timestamp conversion is handled transparently. Steampipe's Kubernetes plugin likely exposes these as raw strings or JSONB; kugl makes them directly comparable numerically.
- **Built-in caching.** A 2-minute TTL cache avoids hammering the API server during exploratory queries.
- **Declarative extensions require no code.** Adding a label or nested field to an existing table takes 4 lines of YAML, no build step, no Go, no Python. Far more accessible than Steampipe's Go plugin model.
- **Multi-schema queries.** Joining Kubernetes data with other JSON sources (files, exec output) via `kubernetes.nodes JOIN ec2.instances` is architecturally sound, even if the AWS side is experimental.

---

## Weaknesses

### Priority (blocking credibility)

1. **Narrow built-in resource coverage.** Only pods, jobs, and nodes are built in. Deployments, StatefulSets, DaemonSets, CronJobs, Services, Ingresses, Namespaces, PVs/PVCs are absent. Users can add them via YAML config, but requiring setup before querying standard resources is a significant barrier.

2. **No per-container table.** Pod-level resource data aggregates across all containers. For multi-container pods (sidecars, init containers), individual container visibility is lost. A `containers` table (one row per container, joinable to `pods` via pod UID) is needed.

3. **No context selection at invocation time.** Users must `kubectl config use-context` before running kugl. A `--context` flag is table stakes for anyone with more than one cluster.

4. **No structured output.** Output is human-readable tabular text only. Without `--output csv` or `--output json`, kugl cannot participate in pipelines or feed dashboards.

5. **No shortcut parameters.** Shortcuts are static query aliases. The docs acknowledge this gap and suggest wrapper scripts as the workaround. Named parameter substitution (e.g., `{{namespace}}`) is needed for real team adoption.

### Nice-to-Have

- **Events table.** `kubectl get events` is one of the most-used debugging commands; it should be built in.
- **PVs/PVCs.** Important for stateful workloads.
- **RBAC tables.** Roles, RoleBindings, ClusterRoles for security auditing.
- **Metrics integration.** Joining `kubectl top pods` data with resource requests would enable requests-vs-actual-usage analysis.
- **Shell completions,** especially for shortcuts.
- **Richer `--schema` output** (columns, types, source paths).

---

## Comparison to Steampipe (Kubernetes plugin)

| Capability | Kugl | Steampipe |
|---|---|---|
| Built-in resource types | pods, jobs, nodes + labels/taints | All standard K8s types |
| SQL dialect | SQLite | PostgreSQL (full) |
| CPU/memory type handling | Auto-converted to numerics | Likely raw strings/JSONB |
| Adding a label column | 4 lines of YAML | Go code + rebuild + reinstall |
| Adding a new resource type | YAML `create:` block | Go plugin with K8s client call |
| Ecosystem integration | CLI output only | Postgres wire protocol (Grafana, psql, etc.) |
| Multi-cluster | Not supported | Aggregator plugins |
| Cross-source joins | Experimental | Core feature, 100+ plugins |
| Caching | Built-in TTL cache | Plugin-level |
| Maintenance | Personal project | Turbot-backed, active community |

Steampipe's Kubernetes plugin likely does **not** pre-convert CPU/memory strings to numerics — this appears to be a genuine and specific kugl advantage for resource utilization queries.

---

## Extension Mechanism

### Current model

Users add columns via `~/.kugl/init.yaml` or `~/.kugl/kubernetes.yaml`:

```yaml
extend:
- table: nodes
columns:
- name: node_pool
type: text
label: karpenter.sh/nodepool # shortcut for metadata.labels."..."
- name: provider_id
type: text
path: spec.providerID # JMESPath expression
```

Special kugl types (`size`, `age`, `cpu`, `date`) handle K8s-specific string-to-numeric conversion.

Multi-row-per-resource tables (e.g., one row per container or taint) use `row_source:` — a sequential JMESPath pipeline — with `^` prefix to reference parent-level fields.

### Friction points

1. **Two-vocabulary system (`path:` vs `label:`).** Users who don't know about `label:` write awkward quoted JMESPath: `metadata.labels."karpenter.sh/nodepool"`. The shortcut is useful but invisible until you need it.
2. **`path:` is a required key even when it's the only thing expressed.** Three keys for a conceptually one-line mapping.
3. **`row_source` + `^` parent references** are non-obvious, but affect only the minority of multi-row-per-resource cases.

### Recommended improvement: unified `from:` key

Replace `path:` / `label:` with a single `from:` key that auto-detects the extraction type:
- Value containing `/` with no leading dot-path segment → label name (matches all real K8s labels)
- Otherwise → JMESPath expression

```yaml
extend:
- table: nodes
columns:
- name: node_pool
type: text
from: karpenter.sh/nodepool # auto-detected as label
- name: provider_id
type: text
from: spec.providerID # auto-detected as JSON path
```

**Implementation:** add `from_` field to `UserColumn` in `config.py`; dispatch to `LabelExtractor` or `PathExtractor` in `gen_extractor` validator. Keep `path:` and `label:` for backward compatibility. Change is small and non-breaking.
33 changes: 32 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,34 @@
## 0.8.0

New tables in ``kubernetes`` schema:

- ``events``
- ``cronjobs`` and ``cronjob_labels``
- ``services`` and ``service_labels``
- ``deployments`` and ``deployment_labels``

CLI changes (breaking):

- Added ``-c``/``--context`` option to specify a Kubernetes context
- Renamed ``-a`` option to ``-A`` for consistency with ``kubectl``
- Renamed ``-c``/``--cache`` to ``-s``/``--stale``
- Renamed ``-u``/``--update`` to ``-r``/``--refresh``
- Renamed ``-r``/``--reckless`` to ``-q``/``--quiet`` (and ``reckless:`` in settings to ``quiet:``)

Extending tables:

- Breaking: Named scope syntax for multi-step ``row_source``: each entry takes ``as <name>`` and
columns reference ancestor objects with ``in <name>`` suffix (e.g. ``metadata.uid in node``);
the old ``^`` parent-hop syntax is removed
- New ``from:`` column key that auto-detects label vs JMESPath: values matching
``domain/key`` format (e.g. ``karpenter.sh/nodepool``) use label extraction, everything
else uses JMESPath (``path:`` and ``label:`` to be removed in a future release)

Documentation:

- New masthead example of ``kugl`` vs ``kubectl | jq``


## 0.7.0

- Add `init` subcommand to generate `kubernetes.yaml` per recommended post-install configuration
Expand Down Expand Up @@ -40,7 +71,7 @@
- Allow environment variables in `file` resource paths
- Fix the `exec` resource by adding a `cache_key` field; these resources would otherwise experience cache collisions
- Resource cache paths and file formats have changed, and cache now lives in `~/.kuglcache`
- `rm -r ~/.kugl/cache` is recommended to clear obsolete files
- `rm -r ~/.kuglcache` is recommended to clear obsolete files

## 0.3.3

Expand Down
Loading