Skip to content

SessionContext.transform_sql() does not resolve calculated fields that use relationships (MANY_TO_ONE). #1481

@SP-John

Description

@SP-John

Describe the bug

Calculated fields defined with isCalculated: true and an expression that traverses a MANY_TO_ONE relationship (e.g., "expression": "customer.name") are not resolved by the SQL planner. The column is rejected with Schema error: No field named <calculated_field>, listing only the base (non-calculated) columns as valid.

This contradicts the Calculated Field documentation which shows this exact pattern working.

Base columns on the same model work correctly. Only calculated fields that reference a relationship column fail.

To Reproduce

  1. Create orders and customers tables in PostgreSQL with sample data:
CREATE TABLE customers (
    custkey INTEGER PRIMARY KEY,
    name VARCHAR(100) NOT NULL
);

CREATE TABLE orders (
    orderkey INTEGER PRIMARY KEY,
    custkey INTEGER NOT NULL REFERENCES customers(custkey),
    price NUMERIC(10,2),
    order_date DATE
);

INSERT INTO customers (custkey, name) VALUES
    (1, 'Alice Johnson'), (2, 'Bob Smith'), (3, 'Carol Williams');

INSERT INTO orders (orderkey, custkey, price, order_date) VALUES
    (101, 1, 250.00, '2025-01-15'), (102, 1, 175.50, '2025-02-20'),
    (103, 2, 320.00, '2025-01-10'), (104, 3, 89.99, '2025-03-01'),
    (105, 2, 450.00, '2025-03-15');
  1. Create this MDL (mdl.json):
{
  "catalog": "wren_poc",
  "schema": "public",
  "models": [
    {
      "name": "orders",
      "tableReference": { "catalog": "wren_poc", "schema": "public", "table": "orders" },
      "primaryKey": "orderkey",
      "columns": [
        { "name": "orderkey", "type": "INTEGER" },
        { "name": "custkey", "type": "INTEGER" },
        { "name": "price", "type": "FLOAT" },
        { "name": "order_date", "type": "DATE" },
        { "name": "customer", "type": "customers", "relationship": "orders_customers" },
        {
          "name": "customer_name",
          "type": "VARCHAR",
          "isCalculated": true,
          "expression": "customer.name",
          "relationship": "orders_customers"
        }
      ]
    },
    {
      "name": "customers",
      "tableReference": { "catalog": "wren_poc", "schema": "public", "table": "customers" },
      "primaryKey": "custkey",
      "columns": [
        { "name": "custkey", "type": "INTEGER" },
        { "name": "name", "type": "VARCHAR" }
      ]
    }
  ],
  "relationships": [
    {
      "name": "orders_customers",
      "models": ["orders", "customers"],
      "joinType": "MANY_TO_ONE",
      "condition": "orders.custkey = customers.custkey"
    }
  ]
}
  1. Base64-encode the MDL and test with wren_core.SessionContext (inside the ibis container):
import base64, json, wren_core

with open("mdl.json") as f:
    mdl = json.load(f)

manifest_str = base64.b64encode(json.dumps(mdl).encode()).decode()
ctx = wren_core.SessionContext(manifest_str, None, None)

# This WORKS (base column):
result = ctx.transform_sql("SELECT orderkey FROM orders LIMIT 1")
# Returns: SELECT orders.orderkey FROM (SELECT orders.orderkey FROM
#          (SELECT __source.orderkey AS orderkey FROM wren_poc."public".orders
#          AS __source) AS orders) AS orders LIMIT 1

# This FAILS (calculated field via relationship):
result = ctx.transform_sql("SELECT customer_name FROM orders LIMIT 1")
# Raises: Schema error: No field named customer_name.
#         Valid fields are orders.orderkey, orders.custkey, orders.price, orders.order_date.
  1. Also reproducible via the ibis-server v3 REST API:
# Base column — works, returns data:
curl -X POST http://localhost:8083/v3/connector/postgres/query \
  -H "Content-Type: application/json" \
  -d '{"connectionInfo": {"host":"wren-postgres","port":5432,"database":"wren_poc","user":"wren","password":"wren_poc_pass"}, "manifestStr": "<base64>", "sql": "SELECT orderkey, custkey FROM orders LIMIT 3"}'
# Returns: {"columns":["orderkey","custkey"],"data":[[101,1],[102,1],[103,2]]}

# Calculated field — fails:
curl -X POST http://localhost:8083/v3/connector/postgres/query \
  -H "Content-Type: application/json" \
  -d '{"connectionInfo": {"host":"wren-postgres","port":5432,"database":"wren_poc","user":"wren","password":"wren_poc_pass"}, "manifestStr": "<base64>", "sql": "SELECT orderkey, customer_name FROM orders LIMIT 3"}'
# Returns HTTP 422: Schema error: No field named customer_name.
#         Valid fields are orders.orderkey, orders.custkey, orders.price, orders.order_date.

Expected behavior

transform_sql() should resolve customer_name by following the orders_customers relationship to customers.name and generate SQL with the appropriate JOIN, as described in the Calculated Field docs.

Expected planned SQL should be something like:

SELECT orders.orderkey, customers.name AS customer_name
FROM orders
JOIN customers ON orders.custkey = customers.custkey
LIMIT 3

Screenshots

N/A — text-based reproduction with verified output above.

Desktop (please complete the following information):

  • OS: Linux (Oracle Linux 8, x86_64)

Wren AI Information

  • wren-core-py: 0.1.0 (installed inside ghcr.io/canner/wren-engine-ibis:latest)
  • wren-engine: ghcr.io/canner/wren-engine:latest (pulled 2025-03-25)
  • wren-engine-ibis: ghcr.io/canner/wren-engine-ibis:latest (pulled 2025-03-25)
  • PostgreSQL 15 (Alpine), Docker Compose deployment

Additional context

Tested all three code paths:

  1. wren_core.SessionContext.transform_sql() (Python, direct): Fails at schema validation — calculated field customer_name is not in the valid fields list. Only base columns (orderkey, custkey, price, order_date) are recognized.

  2. v3 /dry-plan endpoint (embedded engine): Returns HTTP 200 but generates a stub: WITH "orders" AS (SELECT 1) SELECT customer_name FROM orders LIMIT 1 — the relationship is not resolved and the CTE is a no-op.

  3. v2 /query endpoint (Java engine): Generates a CTE with base columns only: WITH "orders" AS (SELECT "orders"."orderkey", "orders"."custkey", "orders"."price", "orders"."order_date" FROM ...) SELECT orderkey, customer_name FROM orders LIMIT 3 — the calculated column is passed through without resolution, causing a PostgreSQL error: column "customer_name" does not exist.

The relationship column (customer) and the calculated field (customer_name) are both absent from the schema that the SQL planner sees. The MDL structure follows the documented pattern exactly (relationship column + calculated field with expression referencing the relationship).

Discovery context: This bug was found while building a model that uses a MANY_TO_ONE relationship to expose columns from a related table as calculated fields on a parent model — a common pattern for creating pre-joined "wide" models that LLM agents can query as flat tables without writing manual JOINs. The simplified orders/customers example above reproduces the same issue with minimal setup.

Narrowed root cause — relationship-traversing expressions only:

Simple calculated fields (no relationship) work correctly. Adding "double_price" with "isCalculated": true, "expression": "price * 2" to the same model produces valid planned SQL and double_price appears in the valid fields list. The bug is specifically in how wren-core handles calculated fields whose expression traverses a relationship column (e.g., "expression": "customer.name" where customer is a relationship column). These fields are silently excluded from the model's schema during SQL planning.

# With double_price (simple calc) + customer_name (relationship calc) on the same model:

Test A — SELECT orderkey, double_price FROM orders LIMIT 1
  OK: planned SQL resolves double_price to (price * 2)

Test B — SELECT orderkey, customer_name FROM orders LIMIT 1
  ERROR: Schema error: No field named customer_name.
  Valid fields are orders.orderkey, orders.custkey, orders.price, orders.order_date, orders.double_price.
  # Note: double_price IS in the valid fields, customer_name is NOT

Definitive root cause — relationship columns not in schema (isCalculated is irrelevant):

Tested 6 MDL variations. Every variation that includes a relationship column fails identically — isCalculated true/false/absent makes no difference. The relationship column itself ("type": "customers", "relationship": "orders_customers") is never included in the valid fields list. The schema builder in wren-core-py 0.1.0 strips all relationship columns from the model schema.

V1: isCalculated=true + expression + relationship  → ERROR (customer_name not in schema)
V2: isCalculated=false + expression + relationship → ERROR (customer_name not in schema)
V3: no isCalculated + expression + relationship    → ERROR (customer_name not in schema)
V4: just relationship, no expression               → ERROR (customer_name not in schema)
V5: isCalculated=true, no relationship on column   → ERROR (customer.name not in schema)
V6: plain column (no relationship at all)          → OK (works as expected)

In all failing cases, valid fields = only the raw base columns (orderkey, custkey). The relationship column handle (customer) is also absent — SELECT customer FROM orders fails with the same error.

The only workaround is an explicit SQL JOIN between the two models, which bypasses the relationship mechanism entirely:

SELECT orders.orderkey, customers.name
FROM orders JOIN customers ON orders.custkey = customers.custkey
LIMIT 1
-- This works because it doesn't use the relationship column at all

Conclusion: The relationship column feature ("type": "<model>", "relationship": "<name>") is not implemented in wren-core-py 0.1.0. The schema builder ignores relationship columns entirely. This is a missing feature in the Python/Rust binding, not a bug in the MDL structure.

UPDATE — Further investigation (2026-03-25):

We cloned the wren-engine repo and found that the upstream Rust test test_rlac_on_calculated_field in wren-core/core/src/mdl/mod.rs DOES successfully resolve calculated fields via relationships. The test uses ColumnBuilder::new("customer_name", "string").calculated(true).expression("customer.c_name").build() (note: NO relationship attribute on the calculated field — only on the handle column).

However, the same MDL pattern fails when used through wren_core.SessionContext.transform_sql() in the Python binding. The Rust test calls transform_sql_with_ctx directly, while the Python binding goes through PySessionContext::new()AnalyzedWrenMDL::analyze()apply_wren_on_ctx()transform_sql_with_ctx().

The error occurs during the ModelAnalyzeRule phase: Schema error: No field named customer.name. The expression customer.name is being parsed as a qualified column reference, but the relationship handle customer isn't being resolved to the related model.

This suggests a discrepancy between how the Rust test context and the Python binding context are initialized — possibly in how the ModelAnalyzeRule is registered or how the session state handles relationship resolution.

Correct MDL pattern (from upstream tests):

  • Relationship handle column: {"name": "customer", "type": "customers", "relationship": "orders_customers"} — has relationship
  • Calculated field: {"name": "customer_name", "isCalculated": true, "expression": "customer.name"} — NO relationship attribute

The isCalculated: true + expression pattern is correct per the docs and upstream tests. The relationship attribute should only be on the handle column, not on the calculated field.


Summary for ticket update (relationship issue only):

The upstream Rust test test_rlac_on_calculated_field in wren-core/core/src/mdl/mod.rs proves that calculated fields via relationships DO work when called through transform_sql_with_ctx directly in Rust. However, the identical MDL pattern fails when used through the Python binding (wren_core.SessionContext.transform_sql()).

The correct MDL pattern (confirmed from upstream tests) is:

  • Relationship handle: {"name": "customer", "type": "customers", "relationship": "orders_customers"}
  • Calculated field: {"name": "customer_name", "isCalculated": true, "expression": "customer.name"} — NO relationship on this column

Both SELECT customer_name FROM orders and SELECT * FROM orders fail through the Python binding with: ModelAnalyzeRule caused by Schema error: No field named customer.name.

The bug appears to be in how PySessionContext::new() initializes the session context vs how the Rust tests do it. Investigation is ongoing.

Deep dive findings (2026-03-25, continued):

Traced the error through the Rust source code:

  1. SessionContext.transform_sql("SELECT customer_name FROM orders LIMIT 1") creates a logical plan: Projection: orders.customer_name → TableScan: orders

  2. The ModelAnalyzeRule processes this plan and finds customer_name is a calculated field with expression customer.name

  3. It calls create_wren_calculated_field_expr() in wren-core/core/src/mdl/utils.rs (line 79)

  4. This function looks up required_fields_map for wren_poc.public.orders.customer_namethis lookup fails with "Required fields not found"

  5. The required_fields_map is built by Lineage::collect_required_fields() in wren-core/core/src/mdl/lineage.rs (line 80)

  6. The lineage analysis iterates over source_columns_map entries. For customer_name, the source column is customer.name (a compound identifier)

  7. The filter at line 112-115 checks: if !column_ref.column.is_calculated || column_ref.column.relationship.is_some() — for customer_name (calculated=true, relationship=None), this evaluates to false || false = false, so it DOES proceed to analyze

  8. The analysis then calls to_expr_queue on the source column customer.name, splitting it into ["customer", "name"]

  9. It looks for a column reference for customer on the orders model — this IS the relationship handle column

  10. It follows the relationship to find the related model customers and looks for name

The lineage analysis appears to work correctly in the Rust code. The upstream test test_rlac_on_calculated_field proves this. The error occurs specifically when going through the Python binding's SessionContext, suggesting the issue is in how the Python binding initializes the session context or how the analyzed MDL is passed to the transform function.

Key difference identified: The Python binding creates TWO contexts — one for unparsing (Mode::Unparse) and one for execution (Mode::LocalRuntime). The Rust test only creates one. The error may be in how the second context interacts with the first, or in how the AnalyzedWrenMDL is shared between them.

Files examined:

  • wren-core/core/src/mdl/mod.rstransform_sql_with_ctx, infer_source_column, collect_source_columns
  • wren-core/core/src/mdl/lineage.rscollect_required_fields, collect_source_columns
  • wren-core/core/src/mdl/utils.rscreate_wren_calculated_field_expr, collect_identifiers
  • wren-core/core/src/mdl/context.rsapply_wren_on_ctx, register_table_with_mdl, WrenDataSource::new
  • wren-core-py/src/context.rsPySessionContext::new, transform_sql

ROOT CAUSE FOUND (2026-03-25):

The bug is triggered when the relationship handle column name differs from the related model name.

Working: handle=customer, model=customer (same name) → expression customer.c_name resolves correctly
Failing: handle=customer, model=customers (different name) → expression customer.name fails with "No field named customer.name"

The expression resolver uses the handle column name (customer) as a table reference when resolving the dot-notation expression. But the table is registered under the model name (customers). When they don't match, the lookup fails.

Workaround: Name the relationship handle column the same as the related model name. Instead of:

{"name": "customer", "type": "customers", "relationship": "orders_customers"}

Use:

{"name": "customers", "type": "customers", "relationship": "orders_customers"}

And update the expression accordingly:

{"name": "customer_name", "isCalculated": true, "expression": "customers.name"}

Verified with test:

# FAILS: handle='customer', model='customers'
# ERROR: Schema error: No field named customer.name

# WORKS: handle='customers', model='customers'  
# SUCCESS: generates JOIN SQL correctly

This is a bug in the expression resolution logic in wren-core — the handle column name should be resolved through the relationship to find the actual model name, not used directly as a table reference.

RESOLUTION — Workaround confirmed working (2026-03-25):

The relationship handle column name MUST match the related model name. This is an undocumented constraint in the expression resolver.

Applied the workaround to our production MDL (inf_condition_summary_with_demographics):

  • Renamed relationship handle from "member" to "inf_member" (matching the model name inf_member)
  • Updated 9 calculated field expressions from member.X to inf_member.X

Verified end-to-end:

  • SELECT member_person_state FROM inf_condition_summary_with_demographics → Success, T2 constrained
  • Wren Engine dry-plan shows correct JOIN SQL generated through the relationship
  • Deep Thinker E2E test returns correct results (TX: 5, CA: 1, FL: 1, OH: 1)

The bug remains: The expression resolver should resolve the handle column name through the relationship to find the actual model name, regardless of whether they match. The upstream test only works because it uses matching names (customer model + customer handle). The documentation does not mention this naming constraint.

Naming policy (workaround): When defining a relationship handle column, always name it the same as the related model:

// WORKS: handle name matches model name
{"name": "inf_member", "type": "inf_member", "relationship": "ConditionSummaryMember"}

// FAILS: handle name differs from model name  
{"name": "member", "type": "inf_member", "relationship": "ConditionSummaryMember"}

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions