Is there an existing issue for this?
Problem statement
DQX uses chispa (assert_df_equality) as a test dependency. Migrating to pyspark testing utils (available from pyspark 3.5) would remove the 3rd party dependency.
Differences:
- chispa has ignore_nullable=True and ignore_column_order=True which are used extensively. assertDataFrameEqual supports these but with slightly different parameter names (check_nullable, check_column_order — need to verify the exact API).
Proposed Solution
The migration would be:
- Replace chispa import with from pyspark.testing.utils import assertDataFrameEqual
- Update assert_df_equality_ignore_fingerprints to use assertDataFrameEqual internally
- Map parameter names (ignore_nullable → PySpark equivalent)
- Remove chispa from pyproject.toml
It's straightforward but touches every test file.
Additional Context
No response
Is there an existing issue for this?
Problem statement
DQX uses chispa (assert_df_equality) as a test dependency. Migrating to pyspark testing utils (available from pyspark 3.5) would remove the 3rd party dependency.
Differences:
Proposed Solution
The migration would be:
It's straightforward but touches every test file.
Additional Context
No response