Skip to content

Releases: VynFi/VynFi-python

v1.8.0 — SAP + SAF-T export (DS 4.4.3 / API 4.4)

23 Apr 06:50

Choose a tag to compare

Adopts DataSynth 4.4.3 + VynFi API 4.4. SAP Integration Pack and SAF-T export both live end-to-end; 6 of 8 DS 4.4.x baseline fixes picked up. Verified against the production API — see docs/ds-4.4.3-verification.md for the scorecard.

New resources

SapExportConfig (Scale+)

Typed helper for output.sap. Defaults: HANA dialect, client 200, ledger 0L, source system DATASYNTH, portal-default 8-table set. Serialise with .to_dict().

SaftExportConfig (Scale+)

Typed helper for output.saft. Jurisdictions pt / pl / ro / no / lu; optional company_tax_id / company_name.

JobArchive readers

  • sap_tables() — lowercase stems of CSVs in sap_export/
  • sap_table(name) — raw CSV bytes (UTF-8 BOM preserved)
  • sap_table_dataframe(name) — pandas DataFrame with HANA dialect handling
  • saft_file(jurisdiction) — raw XML bytes (tries root first, falls back to saft/)
  • coa_meta() — DS 4.4.1's chart_of_accounts_meta.json sidecar

Constants

  • SAP_DEFAULT_TABLES — 8 portal-default SAP tables
  • SAP_ALL_TABLES — 27+ superset

Live-verified signals

Signal Live
SAP sap_export/ 27 CSVs, BSEG→BKPF FK clean
SAF-T saft_pt.xml 3.0–3.9 MB
is_fraud_propagated 15.6 %
AML typology coverage 1.000
DocRef.from_type 443/443 populated
ShellLink share 0.9 %
chart_of_accounts_meta.json present
data_quality_stats.total_records populated

Still pending upstream (v1.8.1 target)

  • AML customer risk_level (0/620 populated)
  • OCEL object_refs.object_type (0/753 populated)

Both are field-null fills; the underlying values exist as risk_tier and object_type_id.

Examples

Full changelog: CHANGELOG.md

v1.7.0 — full example coverage for DS 4.1.x / API 4.1.x

21 Apr 21:31

Choose a tag to compare

Minor release that fills in the example and notebook coverage for every v1.6.x resource. Every script and notebook (except neural_diffusion.py and gnn_vendor_networks.ipynb, both of which require torch) now runs end-to-end against DS 4.1.x / API 4.1.x.

New example scripts

  • audit_optimizer.py — drives three of the six optimizer endpoints (risk_scope, portfolio, monte_carlo) with a sketched audit engagement. Keeps working across DS 4.1.x stub → 4.2.x real-analytics migrations because OptimizerResponse.report is opaque.
  • template_packs_crud.py — create → upsert vendor_names → validate → fetch → list → cleanup. Shows enrichment and pack-linking patterns as comments.
  • nl_config.pyconfigs.from_description(...) walkthrough plus a commented-out configs.from_company(...) example. Gated dry-run submission behind VYNFI_RUN_NL=1.
  • ds_41_features.py — single generation job that exercises every new DS 4.1 config surface: analyticsMetadata, audit, complianceRegulations, accountingStandards, interconnectivity, llm. Reports what actually landed in the archive rather than predicting, so it keeps working as DS fills in the pipelines.

Notebook fixes

  • 01_quickstart.ipynb — switch from generate_quick(tables=...) + journal-entries (hyphens) to the async generate_config pattern that emits journal_entries.json.
  • counterfactual_simulation.ipynb — reindex baseline/counterfactual period series to the union of their indexes before plotting.
  • sox_compliance_testing.ipynb — trim scenario scope to 300 rows × 2 companies × 2 periods so the paired generation fits the 500 s cell timeout.

Regression status

  • 26 / 26 example scripts pass (1 skip: neural_diffusion needs torch)
  • 9 / 9 notebooks pass (1 skip: gnn_vendor_networks needs torch)

Full changelog: CHANGELOG.md

v1.6.1 — regression-run fixes from live DS 4.1.x exercise

21 Apr 20:58

Choose a tag to compare

Patch release from a live regression run of the full examples suite against DS 4.1.x / API 4.1.x. 26/26 scripts now pass (plus 1 skip: neural_diffusion needs torch).

SDK fixes

  • scenarios.create() — stop sending the legacy top-level interventions field that DS 3.1 removed. Interventions are now folded exclusively into generationConfig.scenarios.interventions (the backward-compat mapping was already there; only the stray top-level key was tripping server validation).
  • jobs.list_files() — retry on 404 up to ~4.5s. The managed_blob file index can lag a second or two behind job completion; a single 404 right after wait() is almost always a race, not a real miss.

Example fixes

  • quickstart.py / pandas_workflow.py — switch from generate_quick (30s server cap, overruns on DS 4.1.x full-domain retail) to the async generate_config + wait pattern. Fix journal-entriesjournal_entries.
  • native_mode.py — drop exportLayout: "flat" (still hanging upstream per docs/ds-3.1.1-verification.md § D); flatten nested output in-script and coerce amounts before summing.
  • multi_period_sessions.py — 1000 rows × 5 companies → 300 × 2 to fit a 5-minute budget.
  • streaming_aggregator.py — 500 → 100 envelopes to fit Scale-tier NDJSON rate.
  • streaming_anomaly_detection.py / streaming_fraud_monitor.py — iterate completed jobs to find one with a live NDJSON stream (not every managed_blob archive exposes it).
  • quality_monitoring.py — iterate jobs for a live archive instead of failing on a GC'd most-recent job.
  • fingerprint_synthesis.py — exit 0 when VYNFI_FINGERPRINT is unset.

Full changelog: CHANGELOG.md

v1.6.0 — DS 4.1.x / VynFi API 4.1.x adoption

21 Apr 17:37

Choose a tag to compare

Adopts DataSynth 4.1.x + VynFi API 4.1.x. Surfaces the portal's new audit-optimizer CLI wrappers, user-uploaded template packs, natural-language config generation, and aggregated audit artifacts.

New resources

client.optimizer — audit optimizer CLI wrappers (Scale+ tier)

Six wrappers for POST /v1/optimizer/*. Each returns a typed OptimizerResponse whose .report carries the CLI's opaque JSON so the SDK doesn't have to track every DS 4.1.x stub → real-analytics migration:

  • risk_scope(engagement, top_n=None)
  • portfolio(candidates, budget_hours)
  • resources(schedule)
  • conformance(trace, blueprint)
  • monte_carlo(engagement, runs=None, seed=None) (defaults: runs=1000, seed=42)
  • calibration(findings)

client.template_packs — user-uploaded template packs (Team+ tier, DS 3.2+)

  • list() / create() / get() / update() / delete()
  • categories() — supported category keys
  • get_category() / upsert_category() / delete_category()
  • validate() — re-validate every category
  • enrich_category() — LLM-enrich (Scale+)

New methods

  • client.configs.from_description(description) — natural-language → validated PortalGenerationConfig (Scale+).
  • client.configs.from_company(uid=..., name=..., periods=None, fraud_rate=None) — Swiss VynCo company profile → config (Scale+).
  • client.jobs.audit_artifacts(job_id) — aggregated reader for audit/audit_opinions.json, audit/key_audit_matters.json, and anomaly_labels.json.

New models

OptimizerResponse, TemplatePack, TemplatePackList, TemplatePackCategorySummary, TemplatePackCategoryContent, TemplatePackValidation, TemplatePackValidationIssue, TemplatePackEnrichResponse, NlConfigResponse, CompanyConfigResponse, BatchCompanyResponse, AuditArtifacts.

DS 4.1 config fields supported via existing passthrough

config.analyticsMetadata, config.audit, config.complianceRegulations, config.accountingStandards, config.interconnectivity, config.templates.packId, config.llm.* — all pass through Jobs.generate_config(config=...) unchanged.

Full changelog: CHANGELOG.md

v1.5.1 — DS 3.1.1 adoption + Jobs.fraud_split()

19 Apr 07:41

Choose a tag to compare

Adopts DataSynth 3.1.1 + VynFi API 3.1.1, which decisively fixed 7 of the 10 findings from the 3.1.0 semantics review. See docs/ds-3.1.1-verification.md for the full scorecard.

Added

  • Jobs.fraud_split(job_id) wraps the new GET /v1/jobs/{id}/fraud-split endpoint. Returns a typed FraudSplit with scheme-propagated vs direct-injection counts, propagation rate, and a by_fraud_type dict of FraudTypeSplit entries. Useful for stratified ML training.
  • New exported models: FraudSplit, FraudTypeSplit.

Verified live (DS 3.1.1)

  • round_dollar_bias — 0× → 170× lift
  • is_weekend bias — 1.83× → 32× lift
  • is_post_close bias — ∞ → ~3,106× lift
  • is_fraud_propagated — now populated (12/33 fraud entries)
  • process_variant_summary.json — 162 variants, 55 % happy-path concentration
  • audit/audit_opinions.json + key_audit_matters.json materialize
  • ✓ AML typology coverage — 0.000 → 0.857 (≥ 0.80 threshold)

Still open (upstream)

  • off_hours_bias bias path not wired
  • ⚠ AML relationship mix still dominated by TransactionCounterparty
  • /v1/scenarios/templates still returns 1 (portal gap)
  • exportLayout: "flat" still hangs (writer bug)

Full changelog: CHANGELOG.md

v1.5.0 — DataSynth 3.1 + Quality-of-Life Helpers

18 Apr 08:48

Choose a tag to compare

Adapts the SDK to DataSynth 3.1, which addressed 5 of the 12 findings
from the insights doc.
Also ships 3 QoL helpers that emerged as patterns across the example suite.

New SDK helpers

DataSynthQualityReport

One-call aggregator of every quality signal a job produces:

from vynfi import VynFi, DataSynthQualityReport

client = VynFi(api_key="...")
archive = client.jobs.download_archive(job_id)
report = DataSynthQualityReport.from_job(client, job_id, archive)
print(report.to_markdown())

Jobs.wait_for_many()

Parallel waiter for paired jobs (baseline + counterfactual, session periods):

scenario = client.scenarios.run(scenario.id)
jobs = client.jobs.wait_for_many([scenario.baseline_job_id, scenario.counterfactual_job_id])

JobArchive.dataframes()

Archive-to-DataFrames with automatic numeric/datetime coercion of common
financial columns (*_amount, *_date, *_at, timestamp*, …):

archive = client.jobs.download_archive(job_id)
frames = archive.dataframes(include=["banking/*"])
# amounts already numeric, timestamps already UTC datetime

JobArchive.audit_opinions() / .key_audit_matters()

DS 3.1 writes ISA 700 + ISA 701 outputs to the archive:

opinions = archive.audit_opinions()     # list[AuditOpinion]
matters = archive.key_audit_matters()   # list[KeyAuditMatter]

New models

  • AuditOpinion (ISA 700), KeyAuditMatter (ISA 701)
  • VariantAnalysis extended with rework_rate / skipped_step_rate /
    out_of_order_rate for DS 3.1's realistic process imperfections

DS 3.1 config fields (usable via existing endpoints, no new SDK methods)

  • fraud.documentFraudRate, fraud.propagateToLines, fraud.propagateToDocument
  • businessProcesses.{o2c,p2p,r2r,h2r,a2r}Weight
  • scenarios.causalModel.presetmanufacturing / retail / financial_services / custom
  • scenarios.causalModel.nodes + edges for BYO DAGs
  • banking.typologies.networkTypologyRate
  • diffusion.neural.{hybridWeight, hybridStrategy, neuralColumns}

New examples

Verified live on DS 3.1

  • ✅ AML network density 0.0014 → 0.053 (38× richer); 0 → 35 mule_links
  • ✅ OCPM coverage on JE headers 47 % → 100 %
  • ✅ Timestamp parsing 95 % row loss → 0 % (microsecond truncation)
  • download_file works for managed_blob backed jobs
  • banking_evaluation.json now in archive

Deploy lag — SDK ready, server not yet

  • process_variant_summary.json not yet in archive
  • ⚠ Behavioral fraud biases (weekend, round, post-close) not yet active
  • is_fraud_propagated flag present but always False in current deploy
  • audit_opinions.json / key_audit_matters.json not yet produced

SDK code is in place for all of these; they'll start returning signal once the rollout rolls.

Full changelog

v1.4.1...v1.5.0

v1.4.0 — DataSynth 3.0 + VynFi API Adoption

16 Apr 18:30

Choose a tag to compare

Minor release adding support for DataSynth 3.0 features: scenario packs,
fingerprint synthesis, adversarial ONNX probing, AI-assisted config tuning,
and the dashboard co-pilot. All features verified end-to-end against the live API.

New Endpoints

Scenario Packs (client.scenarios.packs())

Eleven built-in counterfactual simulations across four categories:

Category Packs
Fraud vendor_collusion_ring, management_override, ghost_employee, procurement_kickback, channel_stuffing
Control failures sox_material_weakness, it_control_breakdown
Macro recession_2008_replay, supply_chain_disruption_q3, interest_rate_shock_300bp
Operational erp_migration_cutover
packs = client.scenarios.packs()
scenario = client.scenarios.create(
    name="Q3 revenue stress",
    generation_config={
        "sector": "retail", "rows": 10000,
        "scenarios": {"enabled": True, "packs": ["channel_stuffing"]},
    },
)
client.scenarios.run(scenario.id)
diff = client.scenarios.diff(scenario.id)

AI Tuning (client.jobs.tune(), Scale+)

suggestion = client.jobs.tune(job_id, target_scores={"overall": 0.95})
print(suggestion.explanation)
# -> {original_config, suggested_config, explanation, quality_summary}

Dashboard Co-pilot (client.ai.chat(), Scale+)

reply = client.ai.chat("Which fraud packs are right for audit training?")

Fingerprint Synthesis (client.fingerprint.synthesize(), Team+)

# Privacy-preserving synthesis from a .dsf fingerprint
submission = client.fingerprint.synthesize(
    "./private_data.dsf",
    rows=10000,
    backend="statistical",  # or "neural"/"hybrid" (Scale+)
)

Adversarial Probing (client.adversarial.probe(), Enterprise)

# Probe an ONNX fraud detector for decision-boundary weaknesses
probe = client.adversarial.probe("./model.onnx", n_probes=10000)
results = client.adversarial.results(probe.id)

Config-side DS 3.0 features (no SDK changes needed)

  • Neural diffusion: diffusion.backend = "neural" | "hybrid" with neural.* subsection (Scale+)
  • Quality gates: qualityGates.profile = "standard" | "strict" | "audit" (Team+)
  • Custom interventions: scenarios.interventions[].target/value/timing (Scale+)

Upstream DataSynth fixes now live

  • OCPM fields populated on JE headersocpm_event_ids, ocpm_object_ids, ocpm_case_id now carry full process mining metadata (was empty in 2.3.x). Verified 209/300 entries on a sample retail job.
  • is_fraud on document flow records, display_name on banking customers, numericMode: native, analytics/labels/process_mining output dirs — all confirmed.

Still upstream

  • exportLayout: flat hangs the DataSynth binary — use the default nested layout until upstream fix lands.

Other changes

  • Default client timeout bumped 30s → 60s (generate_quick server-side limit is 30s; 30s default was too tight with network latency).
  • Scenarios.create() contract updated to DS 3.0 shape ({name, generation_config}). Legacy template_id/interventions kwargs still work — auto-folded into config.

Four new examples

Full Changelog

v1.3.0...v1.4.0

v1.3.0 — DataSynth 2.3 + VynFi API 2.0 Features

13 Apr 07:46

Choose a tag to compare

Major release adding support for DataSynth 2.3 + VynFi API 2.0 features.
All features verified end-to-end against the live API.

New Endpoints

# Pre-built statistical analytics for a completed job
a = client.jobs.analytics(job_id)
print(f"Benford MAD: {a.benford_analysis.mad:.4f}")
print(f"AML coverage: {a.banking_evaluation.aml.typology_coverage:.2%}")

# Rate-controlled NDJSON streaming for TB-scale jobs (Scale tier+)
for envelope in client.jobs.stream_ndjson(job_id, rate=500, progress_interval=1000):
    if envelope.get("type") == "_progress":
        print(f"  {envelope['lines_emitted']:,} lines emitted")
    else:
        my_pipeline.send(envelope)

# Storage quota validation for TB-scale jobs
size = client.configs.estimate_size(config=my_config)
print(f"~{size.estimated_files} files, ~{size.estimated_bytes / 1e9:.1f} GB")
print(f"Tier quota: {size.tier_quota_bytes / 1e12:.1f} TB")

# Raw DataSynth YAML config submission (Scale tier+)
result = client.configs.submit_raw(yaml="rows: 1000\nsector: retail")

Transparent Archive Backends

JobArchive now seamlessly handles both legacy zip archives and the new TB-scale managed_blob manifests with presigned URLs:

archive = client.jobs.download_archive(job_id)
print(archive.backend)          # "zip" or "managed_blob"
entries = archive.json("journal_entries.json")  # lazy fetch via presigned URL if blob

DataSynth 2.3 Output Modes

job = client.jobs.generate_config(config={
    "sector": "retail",
    "rows": 1000,
    "output": {
        "exportLayout": "flat",    # one row per line, header merged ✓ verified live
        "numericMode": "native",   # JSON numbers (upstream DataSynth bug pending)
    },
})

New Models

  • Analytics (15 models): JobAnalytics, BenfordAnalysis, AmountDistributionAnalysis, VariantAnalysis, BankingEvaluation, KycCompletenessAnalysis, AmlDetectabilityAnalysis, CrossLayerCoherenceAnalysis, VelocityQualityAnalysis, FalsePositiveAnalysis, TypologyDetection
  • Sizing: EstimateSizeResponse, SizeBucket
  • Raw config: RawConfigResponse

Bug Fixes

  • CamelCase deserialization for JobFileList, JobFile, EstimateSizeResponse — were silently returning defaults when API actually had data
  • Download timeout extended from 30s → 5min (was breaking on large archive downloads)

Process Mining Notebook Enhanced

05_process_mining_ocel.ipynb now covers:

  • All 8 DataSynth processes (O2C, P2P, S2C, H2R, MFG, Banking, Audit, BankRecon)
  • OCEL 2.0 readiness section
  • Cross-process traceability via cross_process_links.json

New Examples

  • analytics_export.py — pre-built analytics workflow
  • ndjson_streaming.py — rate-controlled streaming for TB-scale
  • native_mode.py — DataSynth 2.3 native + flat layout

New Output Categories (DataSynth 2.3)

Category Description
analytics/ Pre-built statistical evaluations (Benford, distributions, variants, banking)
labels/ Anomaly labels + fraud red flags (CSV/JSON/JSONL formats)
process_mining/ Full OCEL 2.0 event log + objects + relationships (19,974 events + 7,381 objects in a sample retail job)

Verification

10 of 11 server-side fixes verified live. See docs/v1.3.0-verification-report.md for details.

Full Changelog

v1.2.0...v1.3.0

v1.2.0 — File Listing, Output Estimates, Per-File Download

11 Apr 11:23

Choose a tag to compare

What's New

Ships support for 3 API features deployed today by the API team.

File listing with schemas

List all files in a completed job's archive without downloading the full zip:

file_list = client.jobs.list_files(job_id)
print(f"{file_list.total_files} files, {file_list.total_size_bytes / 1e6:.0f} MB")

for f in file_list.files:
    cols = ", ".join(s.name for s in f.schema_[:3])
    print(f"  {f.path} ({f.size_bytes:,} bytes) [{cols}, ...]")

Output size estimates

estimate_cost() now returns expected output dimensions before you run a job:

est = client.configs.estimate_cost(config=my_config)
print(f"Credits: {est.total_credits}")
print(f"Output: ~{est.output.estimated_files} files, ~{est.output.estimated_size_bytes / 1e6:.0f} MB")
print(f"Note: {est.output.note}")

Per-file download (now working)

Download individual files from a job without pulling the full archive:

data = client.jobs.download_file(job_id, "journal_entries.json")
# Also supports subdirectory paths:
data = client.jobs.download_file(job_id, "banking/banking_customers.json")

New types

  • JobFileList, JobFile, FileSchema -- file listing response models
  • OutputEstimate -- output size estimate on EstimateCostResponse.output

Full Changelog

v1.1.0...v1.2.0

v1.1.0 — JobArchive, Examples Suite, Endpoint Fixes

11 Apr 11:01

Choose a tag to compare

What's New

JobArchive — ergonomic archive access

Downloaded job archives are now wrapped in a JobArchive class for easy file access:

archive = client.jobs.download_archive(job_id)
archive.files()            # list all 80+ files
archive.categories()       # ['banking', 'document_flows', 'esg', ...]
archive.json("journal_entries.json")  # parse JSON directly
archive.find("esg/*")     # glob-style search
archive.summary()          # file counts and sizes by category
archive.extract_to("./output")  # extract to disk

pandas: archive_to_dataframes()

Convert all JSON files in an archive to DataFrames in one call, with automatic header/lines flattening for journal entries:

from vynfi.integrations.pandas import archive_to_dataframes

frames = archive_to_dataframes(archive)
# {'journal_entries.json': DataFrame(95881 rows), 'banking/banking_customers.json': DataFrame(620 rows), ...}

14 examples — notebooks + scripts

Notebook Use Case
01_quickstart 5-minute getting started
02_audit_data_deep_dive Benford's law, debit/credit validation, SOX controls
03_fraud_detection_lab Labeled fraud data, RF classifier (98.3% accuracy)
04_document_flow_audit_trail P2P/O2C chains, three-way matching, gap analysis
05_process_mining_ocel Event log reconstruction, variant analysis
06_esg_sustainability_reporting Emissions, energy, diversity, materiality matrix
07_aml_compliance_testing KYC, transaction monitoring, risk scoring, SAR

Plus 7 standalone scripts: quickstart, streaming, pandas workflow, config management, multi-period sessions, what-if scenarios, quality monitoring.

Bug fixes

  • Config endpoints used wrong URL path (/v1/configs/.../v1/config/... for validate, estimate-cost, compose)
  • Sessions generate_next() used wrong URL path (/generate-next/generate)

All fixes confirmed against the Rust SDK reference and live API.

Full Changelog

v1.0.0...v1.1.0