diff --git a/.claude/commands/pick-up-issue.md b/.claude/commands/pick-up-issue.md new file mode 100644 index 0000000..57f77ab --- /dev/null +++ b/.claude/commands/pick-up-issue.md @@ -0,0 +1,174 @@ +--- +description: Implement a GitHub issue end-to-end — research, plan, branch, code, test, PR +argument-hint: +--- + +# /pick-up-issue + +Take a GitHub issue number, understand it deeply, plan the implementation, and ship a production-ready PR. The command stops and asks for your approval at three gates: **after planning** (before any code changes), **before pushing** the branch, and **before opening** the pull request. Never push or open a PR without explicit approval. + +If `$ARGUMENTS` is empty or not a positive integer, stop and ask for the issue number. + +## Workflow + +### 1. Fetch the issue and its context + +Run all of these in parallel — they're independent reads: + +- `gh issue view $ARGUMENTS --json number,title,body,state,labels,assignees,milestone,comments,closedByPullRequestsReferences` +- `gh issue view $ARGUMENTS --comments` (full comment thread, formatted) +- `git fetch origin && git log --oneline -20 origin/main` (ground truth for the base branch) +- `git status` and `git branch --show-current` (confirm clean working tree on `main`; if not, stop and ask before proceeding) + +Reject early if: +- The issue is **closed** — confirm with the user before doing anything. +- The issue is **assigned to someone else** — confirm before claiming. +- A linked PR already exists in `closedByPullRequestsReferences` — surface it and ask whether to continue. + +### 2. Understand the issue + +Read every linked artefact before writing any plan: + +- The issue body and full comment thread. +- Any URLs in the body — fetch them with `WebFetch` if they're public docs/specs. Skip authenticated systems (Linear, Jira, internal Confluence) unless an MCP tool is available. +- Cross-references to other issues/PRs — `gh issue view ` / `gh pr view ` for each. +- Repo conventions that constrain the implementation: + - [CLAUDE.md](../../CLAUDE.md) — architecture, commands, conventions. + - [.claude/rules/](../../.claude/rules/) — read every file relevant to the area being changed (e.g. `translators.md` for a translator issue). + - [.golangci.yml](../../.golangci.yml) — lint rules that will block the PR. + - `go.mod` — minimum Go version, dependency constraints. + - `.github/workflows/ci.yml` — what CI will actually run. + +Then survey the code area the issue touches: read the relevant packages end-to-end before planning, not just the files mentioned in the issue. If the issue requests a new translator, read at least two existing translator packages plus `internal/translate/prepare.go` and `internal/translate/resolver.go`. If it's a bug fix, locate the failing path and read the surrounding code so the fix isn't a band-aid. + +### 3. Write the plan and get approval (Gate 1) + +Use the `Plan` subagent for non-trivial work (new feature, new package, cross-cutting change). For a small bug fix, draft the plan inline. + +The plan must include: + +- **Acceptance criteria** restated in concrete terms — what input produces what output, what existing tests must still pass, what new tests will exist. +- **Files to add or modify**, each with one-line intent. +- **Public API changes** (exported types, function signatures, CLI flags) — call these out explicitly so the user can object before implementation. +- **Risk and reversibility** — does this change generated output for existing users? Is there a deprecation path? Does it touch the spec parser (`internal/opendpi/`) or session loader where regressions are silent? +- **Test plan** — list each test case with a one-line description. For translators, follow the matrix in [.claude/rules/translators.md](../../.claude/rules/translators.md): primitives, formats, refs, required-vs-optional, inline objects, plus any constraint-driven behavior. +- **Out of scope** — explicit list of things the issue might suggest but this PR will not do, with rationale. + +Present the plan to the user. **Stop. Wait for explicit approval.** Iterate on the plan until the user agrees. Do not write code yet. + +### 4. Branch + +After plan approval, create the branch from `origin/main` (not local `main` — avoid stale base): + +``` +git fetch origin +git switch -c /- origin/main +``` + +`` follows Conventional Commits: `feat`, `fix`, `docs`, `chore`, `test`, `refactor`, `ci`, `build`, `perf`, `style`. `` is 3-6 words from the issue title. Example: `feat/100-iceberg-translator`. + +If `make setup` hasn't been run in this clone, run it now so the `commit-msg` hook is active — Conventional Commits are enforced and a missing hook will let bad messages through. + +### 5. Implement + +Code to production standards. Non-negotiable: + +- **License header** on every new Go file: + ``` + // SPDX-License-Identifier: Apache-2.0 + // Copyright 2026 Daco Labs + ``` +- **Package comment** on the first file of each new package: `// Package x ...`. +- **No new public API surface outside `internal/`** unless the issue explicitly requires it. +- **Errors wrapped with `%w`** and lowercase strings (lint enforces `errorf` and `error-strings`). +- **No new dependencies** without checking `go.mod` and surfacing the addition to the user before adding. +- **Templates parsed once at init** with `template.Must(template.ParseFS(...))` — never per-call (per `.claude/rules/translators.md`). +- **Tests alongside code**, not after. Each behavior the plan calls out gets a test case. Table tests are the default shape — see `internal/translate/pyspark/translator_test.go`. +- **Deterministic output** — preserve property order (`Prepare` already does this; don't sort in templates), don't iterate Go maps in output paths. + +After each meaningful chunk of work, run the full local check before moving on: + +``` +make format +make lint +make test +make build +``` + +If any of these fail, **fix the underlying issue** — never skip a hook with `--no-verify`, never `// nolint` to silence a real warning, never delete a failing test. If the failure suggests the plan was wrong, stop and revise the plan with the user. + +### 6. Commit + +Use Conventional Commits. Default to a single focused commit per logical step; squash later if needed. Message format: + +``` +(): + + + +Refs # +``` + +For the final commit on the branch, use `Closes #` so the PR auto-closes the issue on merge. **Stage specific files**, not `git add -A`. Commit messages must be passed via heredoc to preserve formatting (see CLAUDE.md). Never amend a published commit; create a new one if you need to fix something. + +### 7. Push (Gate 2) + +Before pushing, run the full check matrix one final time and report results: + +- `make format` — diff must be empty after this. +- `make lint` — zero warnings. +- `make test` — all green, including `-race`. +- `make build` — binary produced. +- `git diff origin/main...HEAD --stat` — show the user what's about to be pushed. + +**Stop. Confirm with the user before `git push -u origin `.** Pushing makes the branch visible on the remote and may trigger CI; do not push unattended. + +### 8. Open the PR (Gate 3) + +After the push succeeds, draft the PR title and body. **Show them to the user and wait for approval before running `gh pr create`.** + +PR title: same Conventional-Commits prefix as the branch, under 70 chars. Example: `feat: add Apache Iceberg schema translator (#100)`. + +PR body (use heredoc): + +```markdown +## Summary +<2-4 bullets — what changed and why, in plain language> + +## Closes +Closes # + +## Implementation notes + + +## Test plan +- [x] `make lint` passes +- [x] `make test` passes (including `-race`) +- [x] `make build` produces a working binary +- [x] + +## Out of scope + +``` + +After PR creation, return the PR URL to the user. Do not request reviewers, do not merge. + +## Stop conditions + +Halt and surface the situation to the user — do not improvise — when: + +- Tests fail and the fix isn't obvious within one or two attempts. +- The plan turns out to be wrong mid-implementation. +- The issue requires a dependency upgrade, a breaking API change, or a spec version bump. +- An external system (CI, release pipeline, published artefacts) would be affected. +- The user's working tree had uncommitted changes when the command started. + +## What this command never does + +- Push without approval. +- Open a PR without approval. +- Skip hooks (`--no-verify`), suppress lint warnings to make the build green, or comment out failing tests. +- Force-push, amend published commits, or rewrite history on a shared branch. +- Add or upgrade dependencies without surfacing the change first. +- Touch unrelated files for "drive-by" cleanup. +- Mark the issue closed manually — let the merged PR do it via `Closes #N`. diff --git a/.claude/commands/research-translators.md b/.claude/commands/research-translators.md new file mode 100644 index 0000000..3dd9c08 --- /dev/null +++ b/.claude/commands/research-translators.md @@ -0,0 +1,42 @@ +--- +description: Research schema-binding ecosystems for translator gaps and propose GitHub issues +argument-hint: [optional focus area, e.g. "pyspark" or "iceberg types"] +--- + +# /research-translators + +Survey upstream schema-binding ecosystems for changes that the translators under [internal/translate/](../../internal/translate/) don't yet cover, then propose GitHub issues for the highest-value gaps. Issue creation is gated on user approval — the command never opens issues unattended. + +## Workflow + +1. **Index current translators** + - List packages under [internal/translate/](../../internal/translate/) and read each `README.md` (the user-facing capability matrix per [.claude/rules/translators.md](../../.claude/rules/translators.md)). + - For each translator, capture: target ecosystem, supported JSON Schema features, and any explicit "not supported" notes. + - Read [internal/translate/prepare.go](../../internal/translate/prepare.go) and [internal/translate/resolver.go](../../internal/translate/resolver.go) so gap analysis is framed in terms of `TypeResolver` methods (`PrimitiveType`, `ArrayType`, `MapType`, `RefType`, `FormatDefName`, `EnrichField`) and `SchemaData`/`Constraints`, not invented vocabulary. + +2. **Research upstream changes** (use `WebSearch` then `WebFetch` for primary sources only — release notes, spec changelogs, official docs) + - Existing translator targets: Avro spec, Protobuf, PySpark / Databricks type system, Pydantic, Scala/Spark SQL, DQX, Go `time`/`encoding/json` evolutions. + - JSON Schema itself: latest draft activity (2020-12 → next), new keywords, format registrations. + - Adjacent ecosystems worth a translator: Apache Iceberg type system, Arrow, Delta Lake schema, Polars. + - Focus area from arguments: $ARGUMENTS + - Skip secondary sources (blog roundups, tutorials). Cite the primary URL for every claim. + +3. **Check for duplicates** + - Run `gh issue list --state all --search "translator"` and scan titles/bodies before proposing anything new. Note duplicates in the report rather than re-filing. + +4. **Gap analysis** + - For each candidate gap, write: target translator (or "new translator: X"), the specific schema feature or upstream change, and the concrete `TypeResolver` change it implies (e.g. "extend `PrimitiveType` to map `format: duration` to `IntervalType`" or "add `EnrichField` integer narrowing for `exclusiveMaximum`"). + - Rank by: (a) appears in ≥2 independent primary sources or in the target's official release notes, (b) implementation fits the existing pipeline without changing `Prepare` or `SchemaData` shape, (c) no existing open issue. Drop anything that doesn't clear (a). + +5. **Propose, don't execute** + - Print a numbered list of proposed issues to the user. For each: title, 3-5 line body preview, source links, suggested `TypeResolver`/`EnrichField` mapping. + - If zero gaps clear the bar, say so and stop. Do not manufacture issues to hit a quota. + - Ask the user which (if any) to file. Wait for explicit approval per issue or "file all". + +6. **File approved issues** + - Only after approval, run `gh issue create` for each. Title format: `translators: support {feature} in {target}`. Body must include: + - **Problem**: what JSON Schema input is currently mis-translated or rejected. + - **Proposed change**: which translator package, which `TypeResolver` method or `EnrichField` step, what `Constraints` field (if any) drives it. + - **References**: primary source URLs. + - **Test cases**: concrete schema → expected output snippets, matching the test shape in [internal/translate/pyspark/translator_test.go](../../internal/translate/pyspark/translator_test.go). + - Report the created issue numbers and URLs back to the user. diff --git a/.claude/rules/translators.md b/.claude/rules/translators.md new file mode 100644 index 0000000..53720a1 --- /dev/null +++ b/.claude/rules/translators.md @@ -0,0 +1,115 @@ +# Translator implementation rules + +Read these before adding a new translator under [internal/translate/](../../internal/translate/) or making non-trivial changes to an existing one. They encode invariants the shared pipeline relies on. + +For the high-level architecture (where translators fit in the pipeline `JSON Schema → Prepare → SchemaData → text/template → []byte`), see the "Schema translation pipeline" section of [CLAUDE.md](../../CLAUDE.md). + +## Package layout + +A translator lives in its own package at `internal/translate//` with these files: + +- `translator.go` — exported `Translator` struct, `Translate` and `FileExtension` methods, `//go:embed` of the template, package doc comment. +- `resolver.go` — unexported `resolver` struct implementing `translate.TypeResolver`. +- `.go.tmpl` — the `text/template` source. Suffix is `.go.tmpl` regardless of the output language so `go:embed` and tooling treat it consistently. +- `translator_test.go` — table tests against `Translate(...)` output. +- `README.md` — checkbox list of supported JSON Schema features (primitives, formats, composition, constraints, etc.). Keep this honest; it is the user-facing capability matrix. + +Register the translator in **exactly one** place: add a line to `registerTranslators` in [cmd/daco/internal/app.go](../../cmd/daco/internal/app.go). The map key is what users pass to `daco ports translate --format `. + +## The Translate method + +```go +func (t *Translator) Translate(portName string, schema *jsonschema.Schema, outputDir string) ([]byte, error) { + data, err := translate.Prepare(portName, schema, &resolver{}) + if err != nil { return nil, fmt.Errorf("failed to prepare schema data: %w", err) } + // ... optionally populate data.Extra ... + var buf bytes.Buffer + if err := tmpl.ExecuteTemplate(&buf, ".go.tmpl", data); err != nil { + return nil, fmt.Errorf("failed to execute template: %w", err) + } + return buf.Bytes(), nil +} +``` + +Rules: + +- **Always go through `translate.Prepare`.** Do not re-walk the schema yourself. `Prepare` is the single source of truth for: `$defs` topological order, property order preservation, inline-object extraction, required→`Nullable` mapping, and `$ref` resolution. +- **Templates are package-level**: parse once at init with `template.Must(template.ParseFS(tmplFS, ".go.tmpl"))`. Never re-parse per call. +- **`text/template`, not `html/template`.** None of these targets are HTML. +- **`FileExtension` returns the leading dot** (`".py"`, `".go"`, `".sql"`, `".yaml"`). +- **Emit a "do not edit" header** at the top of the generated file in the target language's comment syntax: `# Code generated by daco; DO NOT EDIT.` / `// Code generated by daco; DO NOT EDIT.` / etc. +- **Use `data.Extra`** for translator-specific template inputs you compute *after* `Prepare`. Examples: `gotypes` sets `Extra["Package"] = filepath.Base(outputDir)` and `Extra["NeedsTimeImport"] = bool` after scanning field types. Don't add fields to `SchemaData` for one translator's needs. +- **Wrap errors** with `fmt.Errorf("...: %w", err)`. Lint enforces `errorf` and `error-strings`. + +## The TypeResolver — what goes where + +The resolver has six methods. The split between them is deliberate. + +| Method | Called when | Receives | Must return | +|---|---|---|---| +| `PrimitiveType(schemaType, format)` | leaf types | JSON Schema `type` + `format` | target type string | +| `ArrayType(elemType)` | `type: array` | already-resolved element type | wrapped array type | +| `MapType(keyType, valueType)` | object with only `additionalProperties` | resolved key/value types | wrapped map type | +| `RefType(defName)` | `$ref` to a `$def` | def name (raw, not formatted) | the type-reference string | +| `FormatDefName(defName)` | each `$def` | def name (raw) | how the def is *named* in output | +| `FormatRootName(portName)` | the root schema once | port name | how the root type is named | + +**Critical**: `RefType(defName)` and `FormatDefName(defName)` must agree — if `FormatDefName` produces `_user_address` (PySpark prefix-snake style), `RefType` must produce the same string for the same input. Otherwise references won't resolve at runtime in the generated code. Both `pyspark` and `gotypes` implement these as the same expression for this reason. + +**Format wins over type.** In `PrimitiveType`, check `format` first (`date`, `date-time`, `uuid` typically map to dedicated types) and fall through to the `type` switch only if no format matched. Order matters — copy the `pyspark`/`gotypes` pattern. + +**Resolvers are stateless.** `PrimitiveType`/`ArrayType`/`MapType`/`RefType`/`FormatDefName`/`FormatRootName` see one node at a time and must not depend on cross-field state. Anything that needs to look at multiple constraints, the field name, or nullability goes in `EnrichField`. + +## EnrichField — the post-processing escape hatch + +`EnrichField(*translate.Field)` is called once per field after type resolution, before template execution. Order the mutations as: + +1. **Type narrowing from `Constraints`** — e.g. integer narrowing from `Minimum`/`Maximum` (pyspark `LongType` → `ByteType`/`ShortType`/`IntegerType`), or `MultipleOf` → `DecimalType(precision, scale)`. Do this before nullability wrapping so the wrapping sees the final inner type. +2. **Nullability wrapping** — e.g. gotypes prepends `*` when `f.Nullable`; pydantic/python wraps in `Optional[...]`. +3. **Tag / annotation** — set `f.Tag` (struct tags for Go, ` = None` defaults for Python, etc.). +4. **Name casing** — mutate `f.Name` last (e.g. snake_case → PascalCase for Go). The original name is the JSON property name and is what should appear in struct tags / serialization annotations, so capture it into `Tag` *before* renaming. + +Only `EnrichField` should read `f.Constraints`. Templates should not — keep constraint logic in Go. The exception is translators whose entire purpose is emitting constraints (e.g. `dqxyaml`), which can iterate `Constraints` in the template. + +## What `Prepare` already does for you + +You can rely on these without re-implementing: + +- **`$defs` arrive in topological order** in `data.Defs` — a def that references another def comes after its dependency. Templates can emit defs in slice order without forward-reference issues (in languages where that matters). +- **Property order matches the source file.** `Prepare` reads `Schema.PropertyOrder` (populated by the loader from raw bytes) or falls back to `jschema.ExtractKeyOrder` from the original YAML/JSON. Output is diff-stable across runs. Don't sort fields in your template. +- **Inline objects are extracted as named `TypeDef`s.** A nested anonymous object on a field becomes a synthetic `$def` named after the field in PascalCase, appended to `data.Defs`. Resolvers and templates only ever see named types. +- **`$ref` is dereferenced to a def name.** When `Prepare` encounters `{"$ref": "#/$defs/Address"}`, your `RefType("Address")` is called — you don't see the raw ref string. +- **`Nullable` is computed** from `schema.Required` (a property is `Nullable: true` iff it is not in `Required`). Don't second-guess it. +- **`Constraints` is fully populated** with every JSON Schema validation keyword the field had (`Enum`, `Const`, `Pattern`, `Format`, `MinLength`, `MaxLength`, `Minimum`, `Maximum`, `ExclusiveMinimum`, `ExclusiveMaximum`, `MultipleOf`, `MinItems`, `MaxItems`). Don't drop fields from `Constraints` when extending it — every translator gets the full set and decides what to use. + +## Naming helpers + +Reuse, don't reinvent: + +- `translate.ToSnakeCase(s)` — split on non-alphanumerics, lowercase, underscore-prefix if starts with a digit. +- `translate.ToPascalCase(s)` — split on `_`/`-`, uppercase each part. (Note: `gotypes` has its own `toPascalCase` that handles Go-specific acronym casing — `id`→`ID`, `url`→`URL`, etc. — because the generic version doesn't.) + +## Testing + +Each `translator_test.go` should cover at minimum: + +- Simple object with primitive types — verifies `PrimitiveType` mappings. +- All primitive types together — `string`, `integer`, `number`, `boolean`. +- Date/time formats — `date`, `date-time`, `uuid` (whatever the resolver special-cases). +- Arrays of primitives and arrays of refs. +- `$defs` + `$ref` — verifies `FormatDefName`/`RefType` agreement. +- Required vs optional — verifies nullability wrapping. +- Inline nested objects — verifies the auto-extraction produces the expected synthetic def name. +- Any constraint-driven behavior the resolver implements (e.g. integer narrowing, decimal scaling, enum emission). + +Tests build the schema as a literal `*jsonschema.Schema` and assert substrings on `string(output)`. See [internal/translate/pyspark/translator_test.go](../../internal/translate/pyspark/translator_test.go) for the canonical shape. + +## Common pitfalls + +- **Adding logic to `PrimitiveType` that needs the field's constraints.** Constraints aren't passed in — move it to `EnrichField`. +- **Iterating `schema.Properties` inside the translator.** Don't. `Prepare` did it; iterate `data.Defs` / `data.Root.Fields`. +- **Sorting in the template.** Breaks property-order preservation and produces churny diffs. +- **Mutating `f.Name` before reading it for `Tag`.** The JSON name needed for the tag will be gone. +- **`RefType`/`FormatDefName` disagreement.** Generated code will reference an undefined type. +- **Re-parsing the template per call.** Parse once at package init with `template.Must`. +- **Adding fields to `SchemaData` or `Field` for one translator's need.** Use `data.Extra` instead. diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 0000000..458b97c --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,103 @@ +# CLAUDE.md + +This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. + +## What this is + +`daco` is a Go CLI for authoring and managing **data product specifications** in the OpenDPI format (`opendpi.yaml` / `opendpi.json`). It manages **ports** (typed data interfaces backed by JSON Schemas), **connections** (infrastructure endpoints), and **product** metadata, and **translates** port schemas to language/runtime targets (PySpark, Avro, Protobuf, Pydantic, Go types, Scala, Spark/Databricks SQL, DQX YAML, Markdown, etc.). + +A daco project on disk has two files: `daco.yaml` (a thin config pointing at the spec dir) and `opendpi.yaml`/`opendpi.json` under that dir, plus referenced JSON Schemas (`schemas/*.yaml`). + +## Common commands + +```bash +make build # build ./bin/daco with version ldflags +make run ARGS="ports list" # go run with ldflags, pass args via ARGS +make test # go test -v -race ./... +make lint # golangci-lint run ./... +make format # gofmt -s -w . && goimports -w . +make setup # configure git hooks (commit-msg validates Conventional Commits) +make release-snapshot # local goreleaser dry run +``` + +Run a single test: + +```bash +go test -v -race ./internal/translate/pyspark -run TestTranslate +go test -v -race ./internal/opendpi -run TestParse/yaml_round_trip +``` + +CI (`.github/workflows/ci.yml`) runs lint + `go test -v -race ./...` + cross-platform build on every push/PR. Conventional Commits are enforced by the `commit-msg` hook. + +## Architecture + +### Entry point and bootstrap + +`cmd/daco/main.go` is intentionally tiny — it calls `cmd/daco/internal.Run`, which **registers all translators** in a `translate.Register` map and hands them to `commands.NewRootCmd`. Adding a new translator means: implement `translate.Translator` in a new `internal/translate//` package, then add one line to `registerTranslators` in [cmd/daco/internal/app.go](cmd/daco/internal/app.go). The CLI's translator list is the keys of that map; nothing else needs to know. + +### Command tree (cobra) + +[internal/commands/register.go](internal/commands/register.go) builds the cobra tree: top-level `init` and `describe`, plus `ports`, `connections`, and `product` groups. The three group commands all set `PersistentPreRunE: session.PreRunLoad`, which is the load-the-project hook — `init` and `describe` deliberately do not, because `init` runs before a project exists. + +Every subcommand pulls the loaded project via `session.RequireFromCommand(cmd)` rather than re-reading files. Commands that need user input use `internal/prompts/` (Charm `huh` forms); commands run non-interactively when their key flag is provided (e.g. `daco init --name X` skips the form). + +### Session / project loading + +[internal/session/context.go](internal/session/context.go) defines the loading contract used by every project-aware command: + +1. Find `daco.yaml` in CWD (else `ErrNotInitialized`). +2. Decode + `Validate()` the config (`config.CurrentConfigVersion` must match — currently `1`). +3. Resolve the spec dir from `Config.Path` and look for `opendpi.yaml` then `opendpi.json`. +4. Parse via `opendpi.YAML.Parse(f, os.DirFS(specDir))` — the `fs.FS` is required so external `$ref`s like `schemas/user.yaml` can be resolved. +5. Stash the resulting `*session.Context` (config + parsed spec) on `cmd.Context()`. + +The sentinel errors (`ErrNotInitialized`, `ErrInvalidConfig`, `ErrSpecNotFound`, `ErrInvalidSpec`) are part of the contract — wrap, don't replace, when adding new failure modes. + +### OpenDPI parsing and round-trip + +[internal/opendpi/](internal/opendpi/) parses YAML/JSON specs and **resolves all `$ref`s** (both internal `#/...` and external file refs via the supplied `fs.FS`), unifying every schema into `Spec.Schemas`. `Port.SchemaRef` preserves the **original** external ref path so the writer can round-trip the spec back to disk without inlining schemas. Treat that field as load-bearing — anything that adds/edits ports must set it correctly. + +### Schema translation pipeline + +This is the architectural backbone of the codebase. All translators share one pipeline in [internal/translate/](internal/translate/): + +``` +JSON Schema ──► translate.Prepare ──► SchemaData ──► text/template ──► []byte + (uses TypeResolver) (Defs, Root, + Fields, Constraints) +``` + +`Prepare` ([internal/translate/prepare.go](internal/translate/prepare.go)) is **target-agnostic**: + +- Walks `$defs` in topological order (so a def that references another def is emitted after its dependency). +- Preserves original JSON/YAML key order via `jschema.ExtractKeyOrder` and `Schema.PropertyOrder` — this matters for deterministic, diff-friendly output. +- Extracts inline nested objects as named `TypeDef`s on the fly (named after the field in PascalCase) and appends them to `Defs`. Don't try to emit nested anonymous types in templates — by the time the template runs, every object has a name. +- Maps each property into a `Field` with `Name`, `Type`, `Nullable` (computed from `schema.Required`), `Description`, and `Constraints`. + +Per-target behavior is concentrated in a `TypeResolver` ([internal/translate/resolver.go](internal/translate/resolver.go)) implemented by each translator package. The resolver controls: + +- Primitive type mapping (`PrimitiveType(schemaType, format)` — format wins, so `"string"` + `"date-time"` can become `TimestampType()`). +- Container wrapping: `ArrayType`, `MapType`. +- Naming: `FormatDefName`, `FormatRootName`, `RefType`. +- `EnrichField(*Field)` — the post-processing escape hatch. Mutate `Name` for casing, wrap `Type` for nullability (`Optional[T]`, `*T`), set `Tag` for struct tags / Python defaults. Called once per field, after type resolution, before template execution. + +A new translator is therefore: a `Translator` struct with `Translate` + `FileExtension`, a `resolver` implementing `TypeResolver`, and an embedded `*.tmpl` file. Look at `internal/translate/pyspark/` and `internal/translate/gotypes/` as canonical examples; both consume the same `SchemaData` shape. + +**Before writing or non-trivially editing a translator, read [.claude/rules/translators.md](.claude/rules/translators.md)** — it specifies the package layout, the `EnrichField` mutation order, what `Prepare` guarantees so you don't reimplement it, the `RefType`/`FormatDefName` symmetry rule, and the test matrix every translator should cover. + +### Constraints + +`translate.Constraints` carries the full set of JSON Schema validation keywords (`Enum`, `Pattern`, `Minimum`, `MultipleOf`, `MinItems`, etc.). Whether a translator emits them is target-specific (e.g. dqx-yaml maps them to data-quality checks; gotypes ignores most). Don't drop fields from `Constraints` — translators decide what to use. + +## Conventions + +- **Conventional Commits required** (`feat`, `fix`, `docs`, `chore`, `test`, `refactor`, `ci`, `build`, `perf`, `style`). The commit-msg hook installed by `make setup` enforces this. +- **License header** on every Go file: + ```go + // SPDX-License-Identifier: Apache-2.0 + // Copyright 2026 Daco Labs + ``` +- **Package comments** are required by `revive` (see [.golangci.yml](.golangci.yml)). Every package's first file should have `// Package x ...`. +- **Internal-only**: production code lives under `internal/` (private to this module) and `cmd/daco/internal/` (private to the CLI binary). There is no public API surface. +- Min Go version is **1.23.0** (see [go.mod](go.mod)). +- Generated CLI reference docs live in [docs/cli/](docs/cli/) — keep them in mind if you change command shape.