Releases: VynFi/VynFi-python
v1.8.0 — SAP + SAF-T export (DS 4.4.3 / API 4.4)
Adopts DataSynth 4.4.3 + VynFi API 4.4. SAP Integration Pack and SAF-T export both live end-to-end; 6 of 8 DS 4.4.x baseline fixes picked up. Verified against the production API — see docs/ds-4.4.3-verification.md for the scorecard.
New resources
SapExportConfig (Scale+)
Typed helper for output.sap. Defaults: HANA dialect, client 200, ledger 0L, source system DATASYNTH, portal-default 8-table set. Serialise with .to_dict().
SaftExportConfig (Scale+)
Typed helper for output.saft. Jurisdictions pt / pl / ro / no / lu; optional company_tax_id / company_name.
JobArchive readers
sap_tables()— lowercase stems of CSVs insap_export/sap_table(name)— raw CSV bytes (UTF-8 BOM preserved)sap_table_dataframe(name)— pandas DataFrame with HANA dialect handlingsaft_file(jurisdiction)— raw XML bytes (tries root first, falls back tosaft/)coa_meta()— DS 4.4.1'schart_of_accounts_meta.jsonsidecar
Constants
SAP_DEFAULT_TABLES— 8 portal-default SAP tablesSAP_ALL_TABLES— 27+ superset
Live-verified signals
| Signal | Live |
|---|---|
SAP sap_export/ |
27 CSVs, BSEG→BKPF FK clean |
SAF-T saft_pt.xml |
3.0–3.9 MB |
is_fraud_propagated |
15.6 % |
| AML typology coverage | 1.000 |
DocRef.from_type |
443/443 populated |
ShellLink share |
0.9 % |
chart_of_accounts_meta.json |
present |
data_quality_stats.total_records |
populated |
Still pending upstream (v1.8.1 target)
- AML customer
risk_level(0/620 populated) - OCEL
object_refs.object_type(0/753 populated)
Both are field-null fills; the underlying values exist as risk_tier and object_type_id.
Examples
examples/sap_export.py— generate → parse BKPF + BSEG → FK verifyexamples/saft_export.py— PT SAF-T + XSD-validation hint
Full changelog: CHANGELOG.md
v1.7.0 — full example coverage for DS 4.1.x / API 4.1.x
Minor release that fills in the example and notebook coverage for every v1.6.x resource. Every script and notebook (except neural_diffusion.py and gnn_vendor_networks.ipynb, both of which require torch) now runs end-to-end against DS 4.1.x / API 4.1.x.
New example scripts
audit_optimizer.py— drives three of the six optimizer endpoints (risk_scope,portfolio,monte_carlo) with a sketched audit engagement. Keeps working across DS 4.1.x stub → 4.2.x real-analytics migrations becauseOptimizerResponse.reportis opaque.template_packs_crud.py— create → upsertvendor_names→ validate → fetch → list → cleanup. Shows enrichment and pack-linking patterns as comments.nl_config.py—configs.from_description(...)walkthrough plus a commented-outconfigs.from_company(...)example. Gated dry-run submission behindVYNFI_RUN_NL=1.ds_41_features.py— single generation job that exercises every new DS 4.1 config surface:analyticsMetadata,audit,complianceRegulations,accountingStandards,interconnectivity,llm. Reports what actually landed in the archive rather than predicting, so it keeps working as DS fills in the pipelines.
Notebook fixes
01_quickstart.ipynb— switch fromgenerate_quick(tables=...)+journal-entries(hyphens) to the asyncgenerate_configpattern that emitsjournal_entries.json.counterfactual_simulation.ipynb— reindex baseline/counterfactual period series to the union of their indexes before plotting.sox_compliance_testing.ipynb— trim scenario scope to 300 rows × 2 companies × 2 periods so the paired generation fits the 500 s cell timeout.
Regression status
- 26 / 26 example scripts pass (1 skip: neural_diffusion needs torch)
- 9 / 9 notebooks pass (1 skip: gnn_vendor_networks needs torch)
Full changelog: CHANGELOG.md
v1.6.1 — regression-run fixes from live DS 4.1.x exercise
Patch release from a live regression run of the full examples suite against DS 4.1.x / API 4.1.x. 26/26 scripts now pass (plus 1 skip: neural_diffusion needs torch).
SDK fixes
scenarios.create()— stop sending the legacy top-levelinterventionsfield that DS 3.1 removed. Interventions are now folded exclusively intogenerationConfig.scenarios.interventions(the backward-compat mapping was already there; only the stray top-level key was tripping server validation).jobs.list_files()— retry on 404 up to ~4.5s. The managed_blob file index can lag a second or two behind job completion; a single 404 right afterwait()is almost always a race, not a real miss.
Example fixes
quickstart.py/pandas_workflow.py— switch fromgenerate_quick(30s server cap, overruns on DS 4.1.x full-domain retail) to the asyncgenerate_config+waitpattern. Fixjournal-entries→journal_entries.native_mode.py— dropexportLayout: "flat"(still hanging upstream perdocs/ds-3.1.1-verification.md § D); flatten nested output in-script and coerce amounts before summing.multi_period_sessions.py— 1000 rows × 5 companies → 300 × 2 to fit a 5-minute budget.streaming_aggregator.py— 500 → 100 envelopes to fit Scale-tier NDJSON rate.streaming_anomaly_detection.py/streaming_fraud_monitor.py— iterate completed jobs to find one with a live NDJSON stream (not every managed_blob archive exposes it).quality_monitoring.py— iterate jobs for a live archive instead of failing on a GC'd most-recent job.fingerprint_synthesis.py— exit 0 whenVYNFI_FINGERPRINTis unset.
Full changelog: CHANGELOG.md
v1.6.0 — DS 4.1.x / VynFi API 4.1.x adoption
Adopts DataSynth 4.1.x + VynFi API 4.1.x. Surfaces the portal's new audit-optimizer CLI wrappers, user-uploaded template packs, natural-language config generation, and aggregated audit artifacts.
New resources
client.optimizer — audit optimizer CLI wrappers (Scale+ tier)
Six wrappers for POST /v1/optimizer/*. Each returns a typed OptimizerResponse whose .report carries the CLI's opaque JSON so the SDK doesn't have to track every DS 4.1.x stub → real-analytics migration:
risk_scope(engagement, top_n=None)portfolio(candidates, budget_hours)resources(schedule)conformance(trace, blueprint)monte_carlo(engagement, runs=None, seed=None)(defaults: runs=1000, seed=42)calibration(findings)
client.template_packs — user-uploaded template packs (Team+ tier, DS 3.2+)
list() / create() / get() / update() / delete()categories()— supported category keysget_category() / upsert_category() / delete_category()validate()— re-validate every categoryenrich_category()— LLM-enrich (Scale+)
New methods
client.configs.from_description(description)— natural-language → validated PortalGenerationConfig (Scale+).client.configs.from_company(uid=..., name=..., periods=None, fraud_rate=None)— Swiss VynCo company profile → config (Scale+).client.jobs.audit_artifacts(job_id)— aggregated reader foraudit/audit_opinions.json,audit/key_audit_matters.json, andanomaly_labels.json.
New models
OptimizerResponse, TemplatePack, TemplatePackList, TemplatePackCategorySummary, TemplatePackCategoryContent, TemplatePackValidation, TemplatePackValidationIssue, TemplatePackEnrichResponse, NlConfigResponse, CompanyConfigResponse, BatchCompanyResponse, AuditArtifacts.
DS 4.1 config fields supported via existing passthrough
config.analyticsMetadata, config.audit, config.complianceRegulations, config.accountingStandards, config.interconnectivity, config.templates.packId, config.llm.* — all pass through Jobs.generate_config(config=...) unchanged.
Full changelog: CHANGELOG.md
v1.5.1 — DS 3.1.1 adoption + Jobs.fraud_split()
Adopts DataSynth 3.1.1 + VynFi API 3.1.1, which decisively fixed 7 of the 10 findings from the 3.1.0 semantics review. See docs/ds-3.1.1-verification.md for the full scorecard.
Added
Jobs.fraud_split(job_id)wraps the newGET /v1/jobs/{id}/fraud-splitendpoint. Returns a typedFraudSplitwith scheme-propagated vs direct-injection counts, propagation rate, and aby_fraud_typedict ofFraudTypeSplitentries. Useful for stratified ML training.- New exported models:
FraudSplit,FraudTypeSplit.
Verified live (DS 3.1.1)
- ✓
round_dollar_bias— 0× → 170× lift - ✓
is_weekendbias — 1.83× → 32× lift - ✓
is_post_closebias — ∞ → ~3,106× lift - ✓
is_fraud_propagated— now populated (12/33 fraud entries) - ✓
process_variant_summary.json— 162 variants, 55 % happy-path concentration - ✓
audit/audit_opinions.json+key_audit_matters.jsonmaterialize - ✓ AML typology coverage — 0.000 → 0.857 (≥ 0.80 threshold)
Still open (upstream)
- ⚠
off_hours_biasbias path not wired - ⚠ AML relationship mix still dominated by
TransactionCounterparty - ⚠
/v1/scenarios/templatesstill returns 1 (portal gap) - ❌
exportLayout: "flat"still hangs (writer bug)
Full changelog: CHANGELOG.md
v1.5.0 — DataSynth 3.1 + Quality-of-Life Helpers
Adapts the SDK to DataSynth 3.1, which addressed 5 of the 12 findings
from the insights doc.
Also ships 3 QoL helpers that emerged as patterns across the example suite.
New SDK helpers
DataSynthQualityReport
One-call aggregator of every quality signal a job produces:
from vynfi import VynFi, DataSynthQualityReport
client = VynFi(api_key="...")
archive = client.jobs.download_archive(job_id)
report = DataSynthQualityReport.from_job(client, job_id, archive)
print(report.to_markdown())Jobs.wait_for_many()
Parallel waiter for paired jobs (baseline + counterfactual, session periods):
scenario = client.scenarios.run(scenario.id)
jobs = client.jobs.wait_for_many([scenario.baseline_job_id, scenario.counterfactual_job_id])JobArchive.dataframes()
Archive-to-DataFrames with automatic numeric/datetime coercion of common
financial columns (*_amount, *_date, *_at, timestamp*, …):
archive = client.jobs.download_archive(job_id)
frames = archive.dataframes(include=["banking/*"])
# amounts already numeric, timestamps already UTC datetimeJobArchive.audit_opinions() / .key_audit_matters()
DS 3.1 writes ISA 700 + ISA 701 outputs to the archive:
opinions = archive.audit_opinions() # list[AuditOpinion]
matters = archive.key_audit_matters() # list[KeyAuditMatter]New models
AuditOpinion(ISA 700),KeyAuditMatter(ISA 701)VariantAnalysisextended withrework_rate/skipped_step_rate/
out_of_order_ratefor DS 3.1's realistic process imperfections
DS 3.1 config fields (usable via existing endpoints, no new SDK methods)
fraud.documentFraudRate,fraud.propagateToLines,fraud.propagateToDocumentbusinessProcesses.{o2c,p2p,r2r,h2r,a2r}Weightscenarios.causalModel.preset—manufacturing/retail/financial_services/customscenarios.causalModel.nodes+edgesfor BYO DAGsbanking.typologies.networkTypologyRatediffusion.neural.{hybridWeight, hybridStrategy, neuralColumns}
New examples
Verified live on DS 3.1
- ✅ AML network density 0.0014 → 0.053 (38× richer); 0 → 35 mule_links
- ✅ OCPM coverage on JE headers 47 % → 100 %
- ✅ Timestamp parsing 95 % row loss → 0 % (microsecond truncation)
- ✅
download_fileworks formanaged_blobbacked jobs - ✅
banking_evaluation.jsonnow in archive
Deploy lag — SDK ready, server not yet
- ⚠
process_variant_summary.jsonnot yet in archive - ⚠ Behavioral fraud biases (weekend, round, post-close) not yet active
- ⚠
is_fraud_propagatedflag present but always False in current deploy - ⚠
audit_opinions.json/key_audit_matters.jsonnot yet produced
SDK code is in place for all of these; they'll start returning signal once the rollout rolls.
Full changelog
v1.4.0 — DataSynth 3.0 + VynFi API Adoption
Minor release adding support for DataSynth 3.0 features: scenario packs,
fingerprint synthesis, adversarial ONNX probing, AI-assisted config tuning,
and the dashboard co-pilot. All features verified end-to-end against the live API.
New Endpoints
Scenario Packs (client.scenarios.packs())
Eleven built-in counterfactual simulations across four categories:
| Category | Packs |
|---|---|
| Fraud | vendor_collusion_ring, management_override, ghost_employee, procurement_kickback, channel_stuffing |
| Control failures | sox_material_weakness, it_control_breakdown |
| Macro | recession_2008_replay, supply_chain_disruption_q3, interest_rate_shock_300bp |
| Operational | erp_migration_cutover |
packs = client.scenarios.packs()
scenario = client.scenarios.create(
name="Q3 revenue stress",
generation_config={
"sector": "retail", "rows": 10000,
"scenarios": {"enabled": True, "packs": ["channel_stuffing"]},
},
)
client.scenarios.run(scenario.id)
diff = client.scenarios.diff(scenario.id)AI Tuning (client.jobs.tune(), Scale+)
suggestion = client.jobs.tune(job_id, target_scores={"overall": 0.95})
print(suggestion.explanation)
# -> {original_config, suggested_config, explanation, quality_summary}Dashboard Co-pilot (client.ai.chat(), Scale+)
reply = client.ai.chat("Which fraud packs are right for audit training?")Fingerprint Synthesis (client.fingerprint.synthesize(), Team+)
# Privacy-preserving synthesis from a .dsf fingerprint
submission = client.fingerprint.synthesize(
"./private_data.dsf",
rows=10000,
backend="statistical", # or "neural"/"hybrid" (Scale+)
)Adversarial Probing (client.adversarial.probe(), Enterprise)
# Probe an ONNX fraud detector for decision-boundary weaknesses
probe = client.adversarial.probe("./model.onnx", n_probes=10000)
results = client.adversarial.results(probe.id)Config-side DS 3.0 features (no SDK changes needed)
- Neural diffusion:
diffusion.backend = "neural" | "hybrid"withneural.*subsection (Scale+) - Quality gates:
qualityGates.profile = "standard" | "strict" | "audit"(Team+) - Custom interventions:
scenarios.interventions[].target/value/timing(Scale+)
Upstream DataSynth fixes now live
- OCPM fields populated on JE headers —
ocpm_event_ids,ocpm_object_ids,ocpm_case_idnow carry full process mining metadata (was empty in 2.3.x). Verified 209/300 entries on a sample retail job. is_fraudon document flow records,display_nameon banking customers,numericMode: native, analytics/labels/process_mining output dirs — all confirmed.
Still upstream
exportLayout: flathangs the DataSynth binary — use the default nested layout until upstream fix lands.
Other changes
- Default client timeout bumped 30s → 60s (generate_quick server-side limit is 30s; 30s default was too tight with network latency).
Scenarios.create()contract updated to DS 3.0 shape ({name, generation_config}). Legacytemplate_id/interventionskwargs still work — auto-folded into config.
Four new examples
Full Changelog
v1.3.0 — DataSynth 2.3 + VynFi API 2.0 Features
Major release adding support for DataSynth 2.3 + VynFi API 2.0 features.
All features verified end-to-end against the live API.
New Endpoints
# Pre-built statistical analytics for a completed job
a = client.jobs.analytics(job_id)
print(f"Benford MAD: {a.benford_analysis.mad:.4f}")
print(f"AML coverage: {a.banking_evaluation.aml.typology_coverage:.2%}")
# Rate-controlled NDJSON streaming for TB-scale jobs (Scale tier+)
for envelope in client.jobs.stream_ndjson(job_id, rate=500, progress_interval=1000):
if envelope.get("type") == "_progress":
print(f" {envelope['lines_emitted']:,} lines emitted")
else:
my_pipeline.send(envelope)
# Storage quota validation for TB-scale jobs
size = client.configs.estimate_size(config=my_config)
print(f"~{size.estimated_files} files, ~{size.estimated_bytes / 1e9:.1f} GB")
print(f"Tier quota: {size.tier_quota_bytes / 1e12:.1f} TB")
# Raw DataSynth YAML config submission (Scale tier+)
result = client.configs.submit_raw(yaml="rows: 1000\nsector: retail")Transparent Archive Backends
JobArchive now seamlessly handles both legacy zip archives and the new TB-scale managed_blob manifests with presigned URLs:
archive = client.jobs.download_archive(job_id)
print(archive.backend) # "zip" or "managed_blob"
entries = archive.json("journal_entries.json") # lazy fetch via presigned URL if blobDataSynth 2.3 Output Modes
job = client.jobs.generate_config(config={
"sector": "retail",
"rows": 1000,
"output": {
"exportLayout": "flat", # one row per line, header merged ✓ verified live
"numericMode": "native", # JSON numbers (upstream DataSynth bug pending)
},
})New Models
- Analytics (15 models):
JobAnalytics,BenfordAnalysis,AmountDistributionAnalysis,VariantAnalysis,BankingEvaluation,KycCompletenessAnalysis,AmlDetectabilityAnalysis,CrossLayerCoherenceAnalysis,VelocityQualityAnalysis,FalsePositiveAnalysis,TypologyDetection - Sizing:
EstimateSizeResponse,SizeBucket - Raw config:
RawConfigResponse
Bug Fixes
- CamelCase deserialization for
JobFileList,JobFile,EstimateSizeResponse— were silently returning defaults when API actually had data - Download timeout extended from 30s → 5min (was breaking on large archive downloads)
Process Mining Notebook Enhanced
05_process_mining_ocel.ipynb now covers:
- All 8 DataSynth processes (O2C, P2P, S2C, H2R, MFG, Banking, Audit, BankRecon)
- OCEL 2.0 readiness section
- Cross-process traceability via
cross_process_links.json
New Examples
analytics_export.py— pre-built analytics workflowndjson_streaming.py— rate-controlled streaming for TB-scalenative_mode.py— DataSynth 2.3 native + flat layout
New Output Categories (DataSynth 2.3)
| Category | Description |
|---|---|
analytics/ |
Pre-built statistical evaluations (Benford, distributions, variants, banking) |
labels/ |
Anomaly labels + fraud red flags (CSV/JSON/JSONL formats) |
process_mining/ |
Full OCEL 2.0 event log + objects + relationships (19,974 events + 7,381 objects in a sample retail job) |
Verification
10 of 11 server-side fixes verified live. See docs/v1.3.0-verification-report.md for details.
Full Changelog
v1.2.0 — File Listing, Output Estimates, Per-File Download
What's New
Ships support for 3 API features deployed today by the API team.
File listing with schemas
List all files in a completed job's archive without downloading the full zip:
file_list = client.jobs.list_files(job_id)
print(f"{file_list.total_files} files, {file_list.total_size_bytes / 1e6:.0f} MB")
for f in file_list.files:
cols = ", ".join(s.name for s in f.schema_[:3])
print(f" {f.path} ({f.size_bytes:,} bytes) [{cols}, ...]")Output size estimates
estimate_cost() now returns expected output dimensions before you run a job:
est = client.configs.estimate_cost(config=my_config)
print(f"Credits: {est.total_credits}")
print(f"Output: ~{est.output.estimated_files} files, ~{est.output.estimated_size_bytes / 1e6:.0f} MB")
print(f"Note: {est.output.note}")Per-file download (now working)
Download individual files from a job without pulling the full archive:
data = client.jobs.download_file(job_id, "journal_entries.json")
# Also supports subdirectory paths:
data = client.jobs.download_file(job_id, "banking/banking_customers.json")New types
JobFileList,JobFile,FileSchema-- file listing response modelsOutputEstimate-- output size estimate onEstimateCostResponse.output
Full Changelog
v1.1.0 — JobArchive, Examples Suite, Endpoint Fixes
What's New
JobArchive — ergonomic archive access
Downloaded job archives are now wrapped in a JobArchive class for easy file access:
archive = client.jobs.download_archive(job_id)
archive.files() # list all 80+ files
archive.categories() # ['banking', 'document_flows', 'esg', ...]
archive.json("journal_entries.json") # parse JSON directly
archive.find("esg/*") # glob-style search
archive.summary() # file counts and sizes by category
archive.extract_to("./output") # extract to diskpandas: archive_to_dataframes()
Convert all JSON files in an archive to DataFrames in one call, with automatic header/lines flattening for journal entries:
from vynfi.integrations.pandas import archive_to_dataframes
frames = archive_to_dataframes(archive)
# {'journal_entries.json': DataFrame(95881 rows), 'banking/banking_customers.json': DataFrame(620 rows), ...}14 examples — notebooks + scripts
| Notebook | Use Case |
|---|---|
01_quickstart |
5-minute getting started |
02_audit_data_deep_dive |
Benford's law, debit/credit validation, SOX controls |
03_fraud_detection_lab |
Labeled fraud data, RF classifier (98.3% accuracy) |
04_document_flow_audit_trail |
P2P/O2C chains, three-way matching, gap analysis |
05_process_mining_ocel |
Event log reconstruction, variant analysis |
06_esg_sustainability_reporting |
Emissions, energy, diversity, materiality matrix |
07_aml_compliance_testing |
KYC, transaction monitoring, risk scoring, SAR |
Plus 7 standalone scripts: quickstart, streaming, pandas workflow, config management, multi-period sessions, what-if scenarios, quality monitoring.
Bug fixes
- Config endpoints used wrong URL path (
/v1/configs/...→/v1/config/...for validate, estimate-cost, compose) - Sessions
generate_next()used wrong URL path (/generate-next→/generate)
All fixes confirmed against the Rust SDK reference and live API.