Add post-processing SQL over semantic query results#135
Conversation
Support arbitrary SQL (CASE, window functions, arithmetic, etc.) on top of semantic query results via subquery wrapping. The rewriter now walks the query tree recursively so nested subqueries and JOIN subqueries that reference semantic models are compiled correctly. Also adds a post_process parameter to compile() and query() for the Python API path, with automatic CTE hoisting.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 552c2d9091
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| has_subquery_in_joins = any(isinstance(join.this, exp.Subquery) for join in (parsed.args.get("joins") or [])) | ||
|
|
||
| if has_ctes or has_subquery_in_from: | ||
| if has_ctes or has_subquery_in_from or has_subquery_in_joins: |
There was a problem hiding this comment.
Preserve root semantic rewrite when JOIN has subquery
Routing every query with a JOIN subquery through _rewrite_with_ctes_or_subqueries skips _rewrite_simple_query for the root SELECT, because _rewrite_select_tree only rewrites child scopes. For queries whose root FROM is a semantic model (for example, FROM orders o JOIN (SELECT ...) s), metric references like o.revenue are no longer rewritten through the semantic pipeline or rejected by the explicit JOIN guard, so execution can now fail with binder errors or silently return raw row-level columns instead of semantic aggregates.
Useful? React with 👍 / 👎.
When the root FROM references a semantic model and a JOIN contains a subquery, the routing into _rewrite_with_ctes_or_subqueries must still apply _rewrite_simple_query to the root SELECT so the explicit JOIN guard and semantic rewriting are not bypassed.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c3565e5730
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
sidemantic/sql/query_rewriter.py
Outdated
| if self._references_semantic_model(parsed): | ||
| return self._rewrite_simple_query(parsed) |
There was a problem hiding this comment.
Keep CTE scope when rewriting root semantic queries
When a query has a root semantic FROM plus a WITH clause, this branch rewrites the whole root through _rewrite_simple_query, which regenerates SQL from extracted metrics/dimensions and drops the original CTE definitions. Any root predicate/expression that references those CTE names then fails at execution time (e.g., WITH allowed AS (...) SELECT ... FROM orders WHERE status IN (SELECT ... FROM allowed) now errors with missing table allowed). This regression is introduced by the new root rewrite path for CTE/subquery queries.
Useful? React with 👍 / 👎.
sidemantic/core/semantic_layer.py
Outdated
| parsed_inner.set("with", None) | ||
| body = parsed_inner.sql(dialect=target_dialect) | ||
| ctes = with_clause.sql(dialect=target_dialect) | ||
| return ctes + "\n" + post_process.replace("{inner}", body) |
There was a problem hiding this comment.
Merge hoisted CTEs with outer post-process WITH
The post-processing path prepends hoisted inner CTEs directly in front of the user-supplied post_process SQL. If post_process itself starts with WITH, the result becomes WITH ...\nWITH ..., which is invalid SQL and fails to parse/execute. Since the feature is documented as allowing arbitrary SQL wrapping, this makes valid CTE-based post-processing unusable whenever the inner semantic query emits CTEs.
Useful? React with 👍 / 👎.
Fix two issues from PR review: 1. When the root SELECT references a semantic model and has user-defined CTEs, _rewrite_simple_query was discarding them. Now user CTEs are saved before rewriting and merged back into the generated SQL so filter references like IN (SELECT ... FROM user_cte) remain valid. 2. When post_process SQL has its own WITH clause and the inner semantic query also produces CTEs, the two WITH clauses are now merged into one instead of producing invalid double-WITH SQL.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 02194ec3a6
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| user_ctes = [cte.copy() for cte in original_with.expressions] | ||
| gen_with.set("expressions", user_ctes + list(gen_with.expressions)) |
There was a problem hiding this comment.
Preserve WITH RECURSIVE when merging root CTEs
When a root semantic query includes WITH RECURSIVE, this merge path copies only original_with.expressions into the generated WITH and drops the recursive flag from the original clause. The rewritten SQL becomes WITH ... instead of WITH RECURSIVE ..., so self-referencing CTEs fail at execution (e.g., DuckDB reports a circular CTE reference). I reproduced this with a root FROM orders query filtered by a recursive CTE; rewrite output omitted RECURSIVE.
Useful? React with 👍 / 👎.
Propagate the recursive flag from user-defined CTEs to the merged WITH clause so self-referencing CTEs execute correctly.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6cc3958289
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| gen_with = rewritten.args.get("with") | ||
| if gen_with: | ||
| user_ctes = [cte.copy() for cte in original_with.expressions] | ||
| gen_with.set("expressions", user_ctes + list(gen_with.expressions)) |
There was a problem hiding this comment.
Avoid duplicate CTE aliases when merging root WITH clauses
When a root semantic query already has a user CTE name that matches an internally generated CTE (for example orders_cte), this merge creates duplicate aliases and the rewritten SQL fails to parse. I reproduced this with WITH orders_cte AS (...) SELECT orders.revenue FROM orders ..., which rewrote to two orders_cte entries and raised a parser error. Since generated CTE names are implementation details, users can hit this unintentionally; the merge needs collision handling (rename or namespace).
Useful? React with 👍 / 👎.
sidemantic/core/semantic_layer.py
Outdated
| merged = list(inner_with.expressions) + list(outer_with.expressions) | ||
| outer_with.set("expressions", merged) |
There was a problem hiding this comment.
Prevent CTE name collisions in post_process WITH hoisting
The post_process hoisting path prepends inner generated CTEs to outer post_process CTEs without checking alias conflicts, so a user CTE name like orders_cte collides with the semantic generator’s orders_cte and produces invalid SQL. Calling compile(..., post_process='WITH orders_cte AS (...) SELECT ... FROM ({inner})') currently yields duplicate orders_cte definitions and fails at execution time.
Useful? React with 👍 / 👎.
Two changes: 1. post_process path: remove CTE hoisting entirely. Inner SQL (with CTEs) is placed directly in the subquery position. CTEs inside subqueries are valid in all target databases and naturally scoped, so name collisions with post_process CTEs cannot occur. 2. Root semantic + user CTEs: detect name collisions between user CTEs and generated CTEs, raising a clear error instead of producing invalid SQL. Walk-based renaming was too aggressive (renamed user CTE references inside filter subqueries).
Summary
post_processparameter tocompile()andquery()for the Python API path, with automatic CTE hoisting so the composed SQL is valid.post_processAPI.