Skip to content

Latest commit

 

History

History
166 lines (121 loc) · 4.07 KB

File metadata and controls

166 lines (121 loc) · 4.07 KB

SQL Query Engine — Implementation Plan

Overview

A CLI tool that runs real SQL queries against CSV/JSON flat files. No database server needed — files are the tables. Fully offline, measurable, and naturally extensible.

csv-sql query "SELECT name, age FROM users.csv WHERE age > 25 ORDER BY age DESC"

Tech Stack

  • Language: TypeScript (Node.js)
  • No external runtime dependencies (pure implementation)
  • Input: CSV / JSON files as tables
  • Output: Formatted table results in terminal

Directory Structure

src/
├── index.ts              # CLI entry point
├── tokenizer/
│   ├── tokenizer.ts      # Breaks SQL string into tokens
│   └── tokens.ts         # Token type definitions
├── parser/
│   ├── parser.ts         # Builds AST from token stream
│   └── ast.ts            # AST node type definitions
├── executor/
│   ├── executor.ts       # Walks AST and executes query
│   └── filter.ts         # WHERE clause evaluation
├── io/
│   ├── csvReader.ts      # Reads CSV into row objects
│   └── jsonReader.ts     # Reads JSON array into row objects
└── formatter/
    └── tableFormatter.ts # Prints results as aligned table

Phase 1 — Build Here (Foundation)

What gets built:

A working SQL engine that handles basic SELECT queries with WHERE filtering.

Step 1: Token Definitions (tokenizer/tokens.ts)

Define token types:

SELECT, FROM, WHERE, ORDER, BY, AS, AND, OR, NOT,
STAR, COMMA, LPAREN, RPAREN,
EQ, NEQ, LT, GT, LTE, GTE,
IDENTIFIER, STRING, NUMBER,
EOF

Step 2: Tokenizer (tokenizer/tokenizer.ts)

  • Reads raw SQL string character by character
  • Emits typed tokens with value and position
  • Handles: keywords (case-insensitive), identifiers, string literals ('...'), numbers, operators
  • Throws descriptive error with position on unknown character

Step 3: AST Node Types (parser/ast.ts)

Define nodes:

SelectStatement { columns, from, where?, orderBy?, limit? }
Column         { name, alias? } | Wildcard
TableRef       { filename, alias? }
BinaryExpr     { left, op, right }
Identifier     { name }
Literal        { value: string | number }

Step 4: Parser (parser/parser.ts)

Recursive descent parser. Phase 1 supports:

  • SELECT * FROM file.csv
  • SELECT col1, col2 FROM file.csv
  • SELECT col AS alias FROM file.csv
  • WHERE expr AND/OR expr
  • Comparison operators: =, !=, <, >, <=, >=
  • String and number literals

Step 5: CSV Reader (io/csvReader.ts)

  • Parse CSV with header row
  • Return Row[] where Row = Record<string, string>
  • Handle quoted fields, escaped commas
  • Support both , and ; delimiters

Step 6: Executor (executor/executor.ts)

  • Takes AST + file reader
  • Evaluates WHERE clause against each row
  • Projects selected columns
  • Returns Row[]

Step 7: Table Formatter (formatter/tableFormatter.ts)

  • Auto-sizes columns based on max content width
  • Prints header row + separator + data rows
  • Shows row count at bottom

Step 8: CLI Entry Point (index.ts)

csv-sql query "<SQL>"
csv-sql query "<SQL>" --format json
csv-sql query "<SQL>" --limit 20

Acceptance Criteria for Phase 1

# Basic select
csv-sql query "SELECT * FROM data/users.csv"

# Column projection
csv-sql query "SELECT name, age FROM data/users.csv"

# WHERE filtering
csv-sql query "SELECT name FROM data/users.csv WHERE age > 25"

# AND/OR conditions
csv-sql query "SELECT * FROM data/users.csv WHERE age > 18 AND city = 'NYC'"

# Aliases
csv-sql query "SELECT name AS full_name FROM data/users.csv"

All must produce correct, formatted output. Wrong results = not done.


Test Data to Include

Create data/users.csv:

id,name,age,city,salary
1,Alice,30,NYC,95000
2,Bob,25,LA,72000
3,Carol,35,NYC,110000
4,Dave,28,Chicago,85000
5,Eve,22,LA,65000

Create data/orders.csv:

order_id,user_id,product,amount,date
101,1,Laptop,1200,2024-01-15
102,2,Phone,800,2024-01-16
103,1,Monitor,400,2024-01-17
104,3,Laptop,1200,2024-01-18