Numbers Station Text to SQL model for DuckDB.
NSQL is a family of autoregressive open-source foundational models (FMs) that are particularly designed for SQL generation tasks. We are thrilled to introduce DuckDB-NSQL in this repository, an FM tailored for local DuckDB SQL analytics tasks. All model weights can be found on HuggingFace.
| Model Name | Size | Link |
|---|---|---|
| motherduckdb/DuckDB-NSQL-7B-v0.1 | 7B | link |
| motherduckdb/DuckDB-NSQL-7B-v0.1-GGUF | 7B | link |
To install all the necessary dependencies, please run
pip install -r requirements.txt
Please refer to the examples in the examples/ folder to learn how to connect to a local DuckDB and directly query your data. A simple notebook is provided in the examples/ directory for reference.
To host the model with llama.cpp, please execute the following:
# Import necessary modules
from llama_cpp import Llama
from wurlitzer import pipes
# Set up client with model path and context size
with pipes() as (out, err):
client = Llama(
model_path="DuckDB-NSQL-7B-v0.1-q8_0.gguf",
n_ctx=2048,
)To load the DuckDB database and query against it, please execute the following:
# Import necessary modules
import duckdb
from utils import generate_sql
# Connect to DuckDB database
con = duckdb.connect("nyc.duckdb")
# Sample question for SQL generation
question = "alter taxi table and add struct column with name test and keys a:int, b:double"
# Generate SQL, check validity, and print
sql = generate_sql(question, con, client)
print(sql)The training data for this model consists of two parts: 1) 200k synthetically generated DuckDB SQL queries, based on the DuckDB v.0.9.2 documentation, and 2) labeled text-to-SQL pairs from NSText2SQL transpiled to DuckDB SQL using sqlglot.
Please refer to the eval/ folder to check the details for evaluating the model against our proposed DuckDB benchmark.
- Vishal Motwani — Founding Product Manager, Numbers Station AI
- Sen Wu — Co-founder, Numbers Station AI
- Laurel Orr — Principal Developer, Numbers Station AI
- Till Döhmen — DuckDB
- Jordan Tigani — DuckDB
We would like to express our appreciation to all authors of the evaluation scripts. Their work made this project possible.