DuckDB-NSQL

Numbers Station Text to SQL model for DuckDB.

NSQL is a family of autoregressive open-source foundational models (FMs) that are particularly designed for SQL generation tasks. We are thrilled to introduce DuckDB-NSQL in this repository, an FM tailored for local DuckDB SQL analytics tasks. All model weights can be found on HuggingFace.

Model Name	Size	Link
motherduckdb/DuckDB-NSQL-7B-v0.1	7B	link
motherduckdb/DuckDB-NSQL-7B-v0.1-GGUF	7B	link

Setup

To install all the necessary dependencies, please run

pip install -r requirements.txt

Usage

Please refer to the examples in the examples/ folder to learn how to connect to a local DuckDB and directly query your data. A simple notebook is provided in the examples/ directory for reference.

To host the model with llama.cpp, please execute the following:

# Import necessary modules
from llama_cpp import Llama
from wurlitzer import pipes

# Set up client with model path and context size
with pipes() as (out, err):
    client = Llama(
        model_path="DuckDB-NSQL-7B-v0.1-q8_0.gguf",
        n_ctx=2048,
    )

To load the DuckDB database and query against it, please execute the following:

# Import necessary modules
import duckdb
from utils import generate_sql

# Connect to DuckDB database
con = duckdb.connect("nyc.duckdb")

# Sample question for SQL generation
question = "alter taxi table and add struct column with name test and keys a:int, b:double"

# Generate SQL, check validity, and print
sql = generate_sql(question, con, client)
print(sql)

Training Data

The training data for this model consists of two parts: 1) 200k synthetically generated DuckDB SQL queries, based on the DuckDB v.0.9.2 documentation, and 2) labeled text-to-SQL pairs from NSText2SQL transpiled to DuckDB SQL using sqlglot.

Evaluate the benchmark

Please refer to the eval/ folder to check the details for evaluating the model against our proposed DuckDB benchmark.

Contributors

Vishal Motwani — Founding Product Manager, Numbers Station AI
Sen Wu — Co-founder, Numbers Station AI
Laurel Orr — Principal Developer, Numbers Station AI
Till Döhmen — DuckDB
Jordan Tigani — DuckDB

Acknowledgement

We would like to express our appreciation to all authors of the evaluation scripts. Their work made this project possible.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DuckDB-NSQL

Setup

Usage

Training Data

Evaluate the benchmark

Contributors

Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
eval		eval
examples		examples
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

DuckDB-NSQL

Setup

Usage

Training Data

Evaluate the benchmark

Contributors

Acknowledgement

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages