SessionContext.transform_sql() does not resolve calculated fields that use relationships (MANY_TO_ONE).

**Describe the bug**

Calculated fields defined with `isCalculated: true` and an expression that traverses a MANY_TO_ONE relationship (e.g., `"expression": "customer.name"`) are not resolved by the SQL planner. The column is rejected with `Schema error: No field named <calculated_field>`, listing only the base (non-calculated) columns as valid.

This contradicts the [Calculated Field documentation](https://docs.getwren.ai/oss/engine/guide/modeling/calculated#example-calculated-field-using-a-relationship) which shows this exact pattern working.

Base columns on the same model work correctly. Only calculated fields that reference a relationship column fail.

**To Reproduce**

1. Create `orders` and `customers` tables in PostgreSQL with sample data:

```sql
CREATE TABLE customers (
    custkey INTEGER PRIMARY KEY,
    name VARCHAR(100) NOT NULL
);

CREATE TABLE orders (
    orderkey INTEGER PRIMARY KEY,
    custkey INTEGER NOT NULL REFERENCES customers(custkey),
    price NUMERIC(10,2),
    order_date DATE
);

INSERT INTO customers (custkey, name) VALUES
    (1, 'Alice Johnson'), (2, 'Bob Smith'), (3, 'Carol Williams');

INSERT INTO orders (orderkey, custkey, price, order_date) VALUES
    (101, 1, 250.00, '2025-01-15'), (102, 1, 175.50, '2025-02-20'),
    (103, 2, 320.00, '2025-01-10'), (104, 3, 89.99, '2025-03-01'),
    (105, 2, 450.00, '2025-03-15');
```

2. Create this MDL (`mdl.json`):

```json
{
  "catalog": "wren_poc",
  "schema": "public",
  "models": [
    {
      "name": "orders",
      "tableReference": { "catalog": "wren_poc", "schema": "public", "table": "orders" },
      "primaryKey": "orderkey",
      "columns": [
        { "name": "orderkey", "type": "INTEGER" },
        { "name": "custkey", "type": "INTEGER" },
        { "name": "price", "type": "FLOAT" },
        { "name": "order_date", "type": "DATE" },
        { "name": "customer", "type": "customers", "relationship": "orders_customers" },
        {
          "name": "customer_name",
          "type": "VARCHAR",
          "isCalculated": true,
          "expression": "customer.name",
          "relationship": "orders_customers"
        }
      ]
    },
    {
      "name": "customers",
      "tableReference": { "catalog": "wren_poc", "schema": "public", "table": "customers" },
      "primaryKey": "custkey",
      "columns": [
        { "name": "custkey", "type": "INTEGER" },
        { "name": "name", "type": "VARCHAR" }
      ]
    }
  ],
  "relationships": [
    {
      "name": "orders_customers",
      "models": ["orders", "customers"],
      "joinType": "MANY_TO_ONE",
      "condition": "orders.custkey = customers.custkey"
    }
  ]
}
```

3. Base64-encode the MDL and test with `wren_core.SessionContext` (inside the ibis container):

```python
import base64, json, wren_core

with open("mdl.json") as f:
    mdl = json.load(f)

manifest_str = base64.b64encode(json.dumps(mdl).encode()).decode()
ctx = wren_core.SessionContext(manifest_str, None, None)

# This WORKS (base column):
result = ctx.transform_sql("SELECT orderkey FROM orders LIMIT 1")
# Returns: SELECT orders.orderkey FROM (SELECT orders.orderkey FROM
#          (SELECT __source.orderkey AS orderkey FROM wren_poc."public".orders
#          AS __source) AS orders) AS orders LIMIT 1

# This FAILS (calculated field via relationship):
result = ctx.transform_sql("SELECT customer_name FROM orders LIMIT 1")
# Raises: Schema error: No field named customer_name.
#         Valid fields are orders.orderkey, orders.custkey, orders.price, orders.order_date.
```

4. Also reproducible via the ibis-server v3 REST API:

```bash
# Base column — works, returns data:
curl -X POST http://localhost:8083/v3/connector/postgres/query \
  -H "Content-Type: application/json" \
  -d '{"connectionInfo": {"host":"wren-postgres","port":5432,"database":"wren_poc","user":"wren","password":"wren_poc_pass"}, "manifestStr": "<base64>", "sql": "SELECT orderkey, custkey FROM orders LIMIT 3"}'
# Returns: {"columns":["orderkey","custkey"],"data":[[101,1],[102,1],[103,2]]}

# Calculated field — fails:
curl -X POST http://localhost:8083/v3/connector/postgres/query \
  -H "Content-Type: application/json" \
  -d '{"connectionInfo": {"host":"wren-postgres","port":5432,"database":"wren_poc","user":"wren","password":"wren_poc_pass"}, "manifestStr": "<base64>", "sql": "SELECT orderkey, customer_name FROM orders LIMIT 3"}'
# Returns HTTP 422: Schema error: No field named customer_name.
#         Valid fields are orders.orderkey, orders.custkey, orders.price, orders.order_date.
```

**Expected behavior**

`transform_sql()` should resolve `customer_name` by following the `orders_customers` relationship to `customers.name` and generate SQL with the appropriate JOIN, as described in the [Calculated Field docs](https://docs.getwren.ai/oss/engine/guide/modeling/calculated#example-calculated-field-using-a-relationship).

Expected planned SQL should be something like:
```sql
SELECT orders.orderkey, customers.name AS customer_name
FROM orders
JOIN customers ON orders.custkey = customers.custkey
LIMIT 3
```

**Screenshots**

N/A — text-based reproduction with verified output above.

**Desktop (please complete the following information):**

- OS: Linux (Oracle Linux 8, x86_64)

**Wren AI Information**

- `wren-core-py`: 0.1.0 (installed inside `ghcr.io/canner/wren-engine-ibis:latest`)
- `wren-engine`: `ghcr.io/canner/wren-engine:latest` (pulled 2025-03-25)
- `wren-engine-ibis`: `ghcr.io/canner/wren-engine-ibis:latest` (pulled 2025-03-25)
- PostgreSQL 15 (Alpine), Docker Compose deployment

**Additional context**

Tested all three code paths:

1. **`wren_core.SessionContext.transform_sql()` (Python, direct):** Fails at schema validation — calculated field `customer_name` is not in the valid fields list. Only base columns (`orderkey`, `custkey`, `price`, `order_date`) are recognized.

2. **v3 `/dry-plan` endpoint (embedded engine):** Returns HTTP 200 but generates a stub: `WITH "orders" AS (SELECT 1) SELECT customer_name FROM orders LIMIT 1` — the relationship is not resolved and the CTE is a no-op.

3. **v2 `/query` endpoint (Java engine):** Generates a CTE with base columns only: `WITH "orders" AS (SELECT "orders"."orderkey", "orders"."custkey", "orders"."price", "orders"."order_date" FROM ...) SELECT orderkey, customer_name FROM orders LIMIT 3` — the calculated column is passed through without resolution, causing a PostgreSQL error: `column "customer_name" does not exist`.

The relationship column (`customer`) and the calculated field (`customer_name`) are both absent from the schema that the SQL planner sees. The MDL structure follows the documented pattern exactly (relationship column + calculated field with expression referencing the relationship).

**Discovery context:** This bug was found while building a model that uses a MANY_TO_ONE relationship to expose columns from a related table as calculated fields on a parent model — a common pattern for creating pre-joined "wide" models that LLM agents can query as flat tables without writing manual JOINs. The simplified `orders`/`customers` example above reproduces the same issue with minimal setup.

**Narrowed root cause — relationship-traversing expressions only:**

Simple calculated fields (no relationship) work correctly. Adding `"double_price"` with `"isCalculated": true, "expression": "price * 2"` to the same model produces valid planned SQL and `double_price` appears in the valid fields list. The bug is specifically in how `wren-core` handles calculated fields whose expression traverses a relationship column (e.g., `"expression": "customer.name"` where `customer` is a relationship column). These fields are silently excluded from the model's schema during SQL planning.

```
# With double_price (simple calc) + customer_name (relationship calc) on the same model:

Test A — SELECT orderkey, double_price FROM orders LIMIT 1
  OK: planned SQL resolves double_price to (price * 2)

Test B — SELECT orderkey, customer_name FROM orders LIMIT 1
  ERROR: Schema error: No field named customer_name.
  Valid fields are orders.orderkey, orders.custkey, orders.price, orders.order_date, orders.double_price.
  # Note: double_price IS in the valid fields, customer_name is NOT
```

**Definitive root cause — relationship columns not in schema (isCalculated is irrelevant):**

Tested 6 MDL variations. Every variation that includes a relationship column fails identically — `isCalculated` true/false/absent makes no difference. The relationship column itself (`"type": "customers", "relationship": "orders_customers"`) is never included in the valid fields list. The schema builder in `wren-core-py 0.1.0` strips all relationship columns from the model schema.

```
V1: isCalculated=true + expression + relationship  → ERROR (customer_name not in schema)
V2: isCalculated=false + expression + relationship → ERROR (customer_name not in schema)
V3: no isCalculated + expression + relationship    → ERROR (customer_name not in schema)
V4: just relationship, no expression               → ERROR (customer_name not in schema)
V5: isCalculated=true, no relationship on column   → ERROR (customer.name not in schema)
V6: plain column (no relationship at all)          → OK (works as expected)
```

In all failing cases, valid fields = only the raw base columns (`orderkey`, `custkey`). The relationship column handle (`customer`) is also absent — `SELECT customer FROM orders` fails with the same error.

The only workaround is an explicit SQL JOIN between the two models, which bypasses the relationship mechanism entirely:
```sql
SELECT orders.orderkey, customers.name
FROM orders JOIN customers ON orders.custkey = customers.custkey
LIMIT 1
-- This works because it doesn't use the relationship column at all
```

**Conclusion:** The relationship column feature (`"type": "<model>", "relationship": "<name>"`) is not implemented in `wren-core-py 0.1.0`. The schema builder ignores relationship columns entirely. This is a missing feature in the Python/Rust binding, not a bug in the MDL structure.

**UPDATE — Further investigation (2026-03-25):**

We cloned the wren-engine repo and found that the upstream Rust test `test_rlac_on_calculated_field` in `wren-core/core/src/mdl/mod.rs` DOES successfully resolve calculated fields via relationships. The test uses `ColumnBuilder::new("customer_name", "string").calculated(true).expression("customer.c_name").build()` (note: NO `relationship` attribute on the calculated field — only on the handle column).

However, the same MDL pattern fails when used through `wren_core.SessionContext.transform_sql()` in the Python binding. The Rust test calls `transform_sql_with_ctx` directly, while the Python binding goes through `PySessionContext::new()` → `AnalyzedWrenMDL::analyze()` → `apply_wren_on_ctx()` → `transform_sql_with_ctx()`.

The error occurs during the `ModelAnalyzeRule` phase: `Schema error: No field named customer.name`. The expression `customer.name` is being parsed as a qualified column reference, but the relationship handle `customer` isn't being resolved to the related model.

This suggests a discrepancy between how the Rust test context and the Python binding context are initialized — possibly in how the `ModelAnalyzeRule` is registered or how the session state handles relationship resolution.

**Correct MDL pattern (from upstream tests):**
- Relationship handle column: `{"name": "customer", "type": "customers", "relationship": "orders_customers"}` — has `relationship`
- Calculated field: `{"name": "customer_name", "isCalculated": true, "expression": "customer.name"}` — NO `relationship` attribute

The `isCalculated: true` + `expression` pattern is correct per the docs and upstream tests. The `relationship` attribute should only be on the handle column, not on the calculated field.

---

**Summary for ticket update (relationship issue only):**

The upstream Rust test `test_rlac_on_calculated_field` in `wren-core/core/src/mdl/mod.rs` proves that calculated fields via relationships DO work when called through `transform_sql_with_ctx` directly in Rust. However, the identical MDL pattern fails when used through the Python binding (`wren_core.SessionContext.transform_sql()`).

The correct MDL pattern (confirmed from upstream tests) is:
- Relationship handle: `{"name": "customer", "type": "customers", "relationship": "orders_customers"}`
- Calculated field: `{"name": "customer_name", "isCalculated": true, "expression": "customer.name"}` — NO `relationship` on this column

Both `SELECT customer_name FROM orders` and `SELECT * FROM orders` fail through the Python binding with: `ModelAnalyzeRule caused by Schema error: No field named customer.name`.

The bug appears to be in how `PySessionContext::new()` initializes the session context vs how the Rust tests do it. Investigation is ongoing.


**Deep dive findings (2026-03-25, continued):**

Traced the error through the Rust source code:

1. `SessionContext.transform_sql("SELECT customer_name FROM orders LIMIT 1")` creates a logical plan: `Projection: orders.customer_name → TableScan: orders`

2. The `ModelAnalyzeRule` processes this plan and finds `customer_name` is a calculated field with expression `customer.name`

3. It calls `create_wren_calculated_field_expr()` in `wren-core/core/src/mdl/utils.rs` (line 79)

4. This function looks up `required_fields_map` for `wren_poc.public.orders.customer_name` — **this lookup fails with "Required fields not found"**

5. The `required_fields_map` is built by `Lineage::collect_required_fields()` in `wren-core/core/src/mdl/lineage.rs` (line 80)

6. The lineage analysis iterates over `source_columns_map` entries. For `customer_name`, the source column is `customer.name` (a compound identifier)

7. The filter at line 112-115 checks: `if !column_ref.column.is_calculated || column_ref.column.relationship.is_some()` — for `customer_name` (calculated=true, relationship=None), this evaluates to `false || false = false`, so it DOES proceed to analyze

8. The analysis then calls `to_expr_queue` on the source column `customer.name`, splitting it into `["customer", "name"]`

9. It looks for a column reference for `customer` on the `orders` model — this IS the relationship handle column

10. It follows the relationship to find the related model `customers` and looks for `name`

**The lineage analysis appears to work correctly in the Rust code.** The upstream test `test_rlac_on_calculated_field` proves this. The error occurs specifically when going through the Python binding's `SessionContext`, suggesting the issue is in how the Python binding initializes the session context or how the analyzed MDL is passed to the transform function.

**Key difference identified:** The Python binding creates TWO contexts — one for unparsing (`Mode::Unparse`) and one for execution (`Mode::LocalRuntime`). The Rust test only creates one. The error may be in how the second context interacts with the first, or in how the `AnalyzedWrenMDL` is shared between them.

**Files examined:**
- `wren-core/core/src/mdl/mod.rs` — `transform_sql_with_ctx`, `infer_source_column`, `collect_source_columns`
- `wren-core/core/src/mdl/lineage.rs` — `collect_required_fields`, `collect_source_columns`
- `wren-core/core/src/mdl/utils.rs` — `create_wren_calculated_field_expr`, `collect_identifiers`
- `wren-core/core/src/mdl/context.rs` — `apply_wren_on_ctx`, `register_table_with_mdl`, `WrenDataSource::new`
- `wren-core-py/src/context.rs` — `PySessionContext::new`, `transform_sql`


**ROOT CAUSE FOUND (2026-03-25):**

The bug is triggered when the relationship handle column name differs from the related model name.

Working: handle=`customer`, model=`customer` (same name) → expression `customer.c_name` resolves correctly
Failing: handle=`customer`, model=`customers` (different name) → expression `customer.name` fails with "No field named customer.name"

The expression resolver uses the handle column name (`customer`) as a table reference when resolving the dot-notation expression. But the table is registered under the model name (`customers`). When they don't match, the lookup fails.

**Workaround:** Name the relationship handle column the same as the related model name. Instead of:
```json
{"name": "customer", "type": "customers", "relationship": "orders_customers"}
```
Use:
```json
{"name": "customers", "type": "customers", "relationship": "orders_customers"}
```
And update the expression accordingly:
```json
{"name": "customer_name", "isCalculated": true, "expression": "customers.name"}
```

**Verified with test:**
```python
# FAILS: handle='customer', model='customers'
# ERROR: Schema error: No field named customer.name

# WORKS: handle='customers', model='customers'  
# SUCCESS: generates JOIN SQL correctly
```

This is a bug in the expression resolution logic in `wren-core` — the handle column name should be resolved through the relationship to find the actual model name, not used directly as a table reference.


**RESOLUTION — Workaround confirmed working (2026-03-25):**

The relationship handle column name MUST match the related model name. This is an undocumented constraint in the expression resolver.

Applied the workaround to our production MDL (`inf_condition_summary_with_demographics`):
- Renamed relationship handle from `"member"` to `"inf_member"` (matching the model name `inf_member`)
- Updated 9 calculated field expressions from `member.X` to `inf_member.X`

Verified end-to-end:
- `SELECT member_person_state FROM inf_condition_summary_with_demographics` → Success, T2 constrained
- Wren Engine dry-plan shows correct JOIN SQL generated through the relationship
- Deep Thinker E2E test returns correct results (TX: 5, CA: 1, FL: 1, OH: 1)

**The bug remains:** The expression resolver should resolve the handle column name through the relationship to find the actual model name, regardless of whether they match. The upstream test only works because it uses matching names (`customer` model + `customer` handle). The documentation does not mention this naming constraint.

**Naming policy (workaround):** When defining a relationship handle column, always name it the same as the related model:
```json
// WORKS: handle name matches model name
{"name": "inf_member", "type": "inf_member", "relationship": "ConditionSummaryMember"}

// FAILS: handle name differs from model name  
{"name": "member", "type": "inf_member", "relationship": "ConditionSummaryMember"}
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SessionContext.transform_sql() does not resolve calculated fields that use relationships (MANY_TO_ONE). #1481

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

SessionContext.transform_sql() does not resolve calculated fields that use relationships (MANY_TO_ONE). #1481

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions