Skip to content

Conversation

@neilconway
Copy link
Contributor

@neilconway neilconway commented Feb 5, 2026

Test briefly by compiling and using datafusion-sqlancer against the latest DataFusion code in git master. Without this PR, this does not work; with this PR, it works as expected.

@alamb alamb requested a review from 2010YOUY01 February 8, 2026 14:34
@2010YOUY01
Copy link
Collaborator

Thank you! This should be good to go after CI passing.

However, I don’t think we’re going to continue maintaining this repo. It’s built on top of an existing Java SQL testing framework, which I’ve found hard to maintain due to (1) the cross-language setup and (2) a lot of unnecessary abstractions in the implementation.

I’ve put together a Rust re-implementation that should be much simpler. It can reuse DataFusion utilities for most major testing components, and avoids re-implementing things in Java:

The core framework is already in place; it mainly needs some cleanup and ongoing maintenance. I’m happy to help maintain it together if there’s interest.

@2010YOUY01 2010YOUY01 merged commit 9e18aef into datafusion-contrib:main Feb 9, 2026
3 checks passed
@neilconway
Copy link
Contributor Author

Thanks for the review and for the explanation.

Building our own fuzzer sounds like a fun project, although personally I wonder if DF's requirements for a fuzzer are particularly unique. That is, in an ideal world, we would want an upstream project that develops the SQL fuzzer and just supports DF as one target SQL system among others, no? It would be nice to avoid needing to implement new oracles and similar generic fuzzer machinery ourselves.

@2010YOUY01
Copy link
Collaborator

Thanks for the review and for the explanation.

Building our own fuzzer sounds like a fun project, although personally I wonder if DF's requirements for a fuzzer are particularly unique. That is, in an ideal world, we would want an upstream project that develops the SQL fuzzer and just supports DF as one target SQL system among others, no? It would be nice to avoid needing to implement new oracles and similar generic fuzzer machinery ourselves.

I was thinking along the same lines — relying on an existing SQL fuzzing framework (specifically SQLancer) sounds easier in theory: we declare the syntax and it should just work.

However, after implementing it on the SQLancer Java framework, I found that in practice you still need to build almost everything from scratch, including query tree generation, query transformations for testing oracles and expression rendering. The utilities provided by the framework are mostly generic infrastructure such as CLI argument handling and logging.

Meanwhile, much of this “generic” fuzzer machinery already exists in DataFusion itself:

Given this, implementing a Rust-native fuzzer tailored to DataFusion may actually be simpler than integrating with an external framework. I hope to find time to continue the Rust version.

@neilconway
Copy link
Contributor Author

@2010YOUY01 Interesting, thanks for the additional context!

@neilconway neilconway deleted the neilc/df-update-versions branch February 11, 2026 16:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants