Allow custom check messages by ghanse · Pull Request #1092 · databrickslabs/dqx

ghanse · 2026-03-20T22:22:02Z

Changes

Add an optional message callable parameter to DQRule, DQRowRule, DQDatasetRule, and DQForEachColRule that allows users to define custom check failure messages. The callable receives rule context (rule_name, check_func_name, check_func_args, column_value) and returns a Spark Column expression, enabling dynamic messages that can include column values.

When message is None (the default), the existing auto-generated message behavior is preserved. When provided, the custom message replaces the default message for failed rows while maintaining null/non-null semantics for passing rows.

Linked issues

Resolves #958

Tests

manually tested
added unit tests
added integration tests

Co-authored-by: Isaac

github-actions · 2026-03-20T23:11:35Z

❌ no tests were run

_{Running from acceptance #4350}

mwojtyczka

Review: Allow custom check messages

Clean feature addition. The inspect-based argument dispatching is flexible and the null-safety handling is correct. A few items below.

mwojtyczka · 2026-03-31T14:46:54Z

+
+        * *rule_name* is the name of the DQX rule
+        * *check_func_name* is the name of the DQX check function
+        * *check_func_args* is a dictionary of DQX check function arguments as key-value pairs


MEDIUM - inspect.signature called at execution time per row-group: _build_message_col calls inspect.signature twice (on check_func and message) every time a result struct is built. While this runs once per check (not per row), it's still a hot path during DataFrame construction. Consider caching the signatures on the DQRule or DQRuleManager instance.

Also, bind_partial can raise TypeError if args don't match — should this be wrapped with error handling that gives a clear message about the check function signature mismatch?

mwojtyczka · 2026-03-31T14:46:54Z

+        * *column_value* is the column value that failed the DQX check
+
+        Args:
+            condition: Default DQX condition message returned by evaluating the DQX check function


When self.check.column is None but self.check.columns is set (e.g., a multi-column row rule), column_value falls back to F.lit(None). The message function receives no useful value to display.
Either document this or pass all column values as a struct. Then always pass a struct, even for single-column rules:

if self.check.column: column_value = F.struct(F.col(self.check.column).cast("string").alias(self.check.column)) elif self.check.columns: column_value = F.struct(*[F.col(c).cast("string").alias(c) for c in self.check.columns]) else: column_value = F.lit(None)

Passing as struct would be more flexible, but it will add significant complexity for users when defining the message func so maybe just document this.

mwojtyczka · 2026-03-31T14:46:54Z

+    message: Callable[..., Column] | None = None

    def __post_init__(self):
        self._validate_rule_type(self.check_func)


MEDIUM - Type annotation: The field is message: Callable[..., Column] | None = None here on DQRule, but on DQForEachColRule (line 441) it's message: Callable | None = None without the return type. These should be consistent — use Callable[..., Column] | None in both places.

mwojtyczka · 2026-03-31T14:46:54Z

    * *user_metadata* (optional) - User-defined key-value pairs added to metadata generated by the check.
+    * *custom_message_func* - A user-defined function that returns a message when the check fails. The function should
+        return a Spark Column and can optionally accept the following keyword arguments:
+        - *rule_name* is the name of the DQX rule


LOW - Docstring says custom_message_func but field is named message: The docstring says custom_message_func but the actual field added is message. These should match to avoid confusion.

mwojtyczka · 2026-03-31T14:46:54Z

+    assert rule.message is None
+
+
+def test_dq_dataset_rule_accepts_message_callable():


LOW - Misleading test name: test_dq_dataset_rule_accepts_message_callable creates a DQRowRule, not a DQDatasetRule. Either rename the test or actually test DQDatasetRule.

mwojtyczka · 2026-03-31T14:46:54Z

+
+from typing import Any
+
+import pyspark.sql.functions as F


Missing test: No test for DQDatasetRule with a custom message. Dataset-level rules have different column semantics (no single column attribute), so the column_value fallback to F.lit(None) should be verified.

mwojtyczka · 2026-03-31T14:46:55Z

@@ -167,6 +170,52 @@ def _build_result_struct(self, condition: Column, skipped: bool = False) -> Colu
            F.lit(skipped or None).alias("skipped"),


We should support custom message for both programmatic and metadata definition.

Option 1: Template strings (simplest)

Support a message field in YAML as a template string with placeholders:

- name: email_not_null criticality: error check: function: is_not_null arguments: column: email message: "Rule '{rule_name}' failed: {check_func_name} on column '{column}'"

DQX resolves placeholders at runtime using Python's str.format() with the same context args (rule_name, check_func_name, check_func_args, etc.) and wraps the result in F.lit(...).

This covers static/semi-dynamic messages but can't include column_value (which is a Spark Column expression, not a Python string at template time).

Option 2: Template string + column_value via Spark concat

To support column_value in templates, use a special placeholder that gets expanded into a Spark concat expression: message: "Rule '{rule_name}' failed for value=${column_value}"

DQX parses the template, splits on ${column_value}, and builds:

F.concat( F.lit("Rule 'email_not_null' failed for value="), F.coalesce(F.col("email").cast("string"), F.lit("null")) )

This gives dynamic per-row messages from YAML.

Option 3: SQL expression - I would prefer this one

Allow the message to be a SQL expression string: ``message_expr: "concat('Failed: ', coalesce(cast(email as string), 'null'))" `

DQX wraps it with F.expr(...). Most flexible, and could unify both approaches

Programmatic

DQRowRule( check_func=check_funcs.is_not_null, column="email", message="concat('Rule failed for value=', coalesce(cast(email as string), 'null'))" )

YAML

- name: email_not_null check: function: is_not_null arguments: column: email message: "concat('Rule failed for value=', coalesce(cast(email as string), 'null'))"

Both resolve to F.expr(message_str) internally. No inspect.signature, no argument dispatching, no callable serialization problem.

mwojtyczka · 2026-03-31T14:50:08Z

+        column="email",
+        message=custom_message,
+      )
+    ]


may be worth to showcase how the exact message as defined in the example looks like when it fails

mwojtyczka · 2026-03-31T15:14:03Z

+    assert_df_equality(checked_df, expected_df)
+
+
+def test_apply_checks_without_custom_message_unchanged(ws, spark):


do we need this test? we have a lot of regression tests

mwojtyczka

Main point is around supporting this with metadata definition

github-actions · 2026-04-15T09:44:09Z

❌ no tests were run

_{Running from anomaly #464}

feat: add message callable to DQRule for custom check messages

c28d041

Co-authored-by: Isaac

ghanse requested a review from a team as a code owner March 20, 2026 22:22

ghanse requested review from gergo-databricks and removed request for a team March 20, 2026 22:22

ghanse temporarily deployed to tool March 20, 2026 22:22 — with GitHub Actions Inactive

ghanse had a problem deploying to tool March 20, 2026 22:22 — with GitHub Actions Failure

ghanse changed the title ~~feat: add message callable to DQRule for custom check messages~~ Allow custom check messages Mar 21, 2026

Refactor

6721317

ghanse had a problem deploying to tool March 21, 2026 01:40 — with GitHub Actions Failure

ghanse temporarily deployed to tool March 21, 2026 01:40 — with GitHub Actions Inactive

ghanse had a problem deploying to tool March 21, 2026 01:40 — with GitHub Actions Failure

Update documentation and tests

ab9a4ff

ghanse had a problem deploying to tool March 23, 2026 15:19 — with GitHub Actions Failure

ghanse had a problem deploying to tool March 23, 2026 15:19 — with GitHub Actions Error

Fix tests

05baafb

ghanse had a problem deploying to tool March 23, 2026 17:22 — with GitHub Actions Failure

Merge branch 'main' into ghanse/issue-958-custom-check-messages

e6fe9e5

mwojtyczka self-requested a review March 30, 2026 19:10

mwojtyczka reviewed Mar 31, 2026

View reviewed changes

mwojtyczka added the under-review This PR is currently being reviewed by one of DQX maintainers. label Mar 31, 2026

mwojtyczka requested changes Mar 31, 2026

View reviewed changes

Merge branch 'main' into ghanse/issue-958-custom-check-messages

0c2132f

mwojtyczka had a problem deploying to tool April 15, 2026 09:42 — with GitHub Actions Failure

mwojtyczka added needs-review Ready for re-review needs-changes Changes required after review and removed needs-review Ready for re-review under-review This PR is currently being reviewed by one of DQX maintainers. labels Apr 15, 2026

Refactor to SQL expressions for custom check messages

c08cf31

ghanse had a problem deploying to tool April 16, 2026 16:33 — with GitHub Actions Failure

		assert rule.message is None


		def test_dq_dataset_rule_accepts_message_callable():

		@@ -167,6 +170,52 @@ def _build_result_struct(self, condition: Column, skipped: bool = False) -> Colu
		F.lit(skipped or None).alias("skipped"),

		assert_df_equality(checked_df, expected_df)


		def test_apply_checks_without_custom_message_unchanged(ws, spark):

Conversation

ghanse commented Mar 20, 2026

Changes

Linked issues

Tests

Uh oh!

github-actions bot commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mwojtyczka left a comment

Choose a reason for hiding this comment

Review: Allow custom check messages

Uh oh!

mwojtyczka Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

mwojtyczka Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mwojtyczka Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

mwojtyczka Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mwojtyczka Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mwojtyczka Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

mwojtyczka Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mwojtyczka Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mwojtyczka Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

mwojtyczka left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions bot commented Mar 20, 2026 •

edited

Loading

mwojtyczka Mar 31, 2026 •

edited

Loading

mwojtyczka Mar 31, 2026 •

edited

Loading

mwojtyczka Mar 31, 2026 •

edited

Loading

mwojtyczka Mar 31, 2026 •

edited

Loading

github-actions bot commented Apr 15, 2026 •

edited

Loading