pip install wowdataFor local development:
git clone https://github.com/sci2pro/wowdata.git
cd wowdata
pip install -e .[dev]from wowdata import Pipeline, Sink, Source, Transform
pipe = (
Pipeline(Source("people.csv"))
.then(Transform("cast", params={"types": {"age": "integer"}, "on_error": "null"}))
.then(Transform("filter", params={"where": "age >= 18"}))
.then(Sink("adults.csv"))
)
pipe.run()wowdata: 0
pipeline:
start:
uri: people.csv
type: csv
steps:
- transform:
op: filter
params:
where: "age >= 18"
- sink:
uri: adults.csv
type: csvRun it:
wow run pipeline.yamlFallback command:
wowdata run pipeline.yamlThe repository includes ready-to-run sample pipelines and data files in examples/.
From the repo root:
wow run examples/climate_heat_events.yaml --base-dir examples
wow run examples/climate_rainfall_alerts.yaml --base-dir examplesOr run from inside the directory:
cd examples
wow run climate_heat_events.yamlIf you want a quick string-cleaning example, the string transform can normalize messy text before cast:
- transform:
op: string
params:
column: Price
action: regex_replace
pattern: "[^0-9.]+"
repl: ""For more examples covering strip, replace, split, format, encode, and zfill, see
String Operation Examples.
WowData™ includes a CLI for running YAML-serialized pipelines.
After installing the package, use:
wow --helpIf wow conflicts with another tool in your environment, use the fallback command:
wowdata --help-
wow run pipeline.yaml(fallback:wowdata run pipeline.yaml)- Executes the pipeline end-to-end.
- Returns non-zero on runtime failures.
-
wow validate pipeline.yaml(fallback:wowdata validate pipeline.yaml)- Parses YAML + IR and runs preflight checks on source/sink paths.
-
wow schema pipeline.yaml(fallback:wowdata schema pipeline.yaml)- Infers output schema without full pipeline execution.
-
wow lock-schema pipeline.yaml -o pipeline.locked.yaml(fallback:wowdata lock-schema ...)- Writes a schema-locked YAML by embedding per-transform
output_schema.
- Writes a schema-locked YAML by embedding per-transform
--base-dir PATHresolve relative paths in YAML from a specific directory.--jsonprint machine-readable JSON output.--sample-rows Nused byschemaandlock-schemafor bounded inference.--forcerecompute schema inference even if cached.
# Run a serialized pipeline
wow run pipeline.yaml
# Run a repository example from the repo root
wow run examples/climate_heat_events.yaml --base-dir examples
# Validate structure and file paths before execution
wow validate pipeline.yaml
# Print inferred output schema as JSON
wow schema pipeline.yaml --json
# Save a locked pipeline snapshot
wow lock-schema pipeline.yaml -o pipeline.locked.yaml0: success2: CLI usage error3: pipeline parse/validation error4: pipeline runtime execution error