42 real-world projects audited. 21 hard baselines with recorded regression/improvement/precision counts. Zero crashes, zero timeouts, zero illegal keys.
python scripts/diff_v1_v2.py tests/fixtures/tested_projects- Projects with a baseline JSON in
tests/fixtures/diff_baselines/must haveR <= baselineandillegal=0. Exceeding the baseline fails the gate. - Projects without a baseline display
[PENDING]and are informational only (not a hard failure).
python scripts/audit_tested_projects.pyFull report: reports/tested-projects-audit.json /
reports/tested-projects-audit.md.
Projects audited: 42
Crashed/timeout: 0
Illegal key projects: 0
Hard baselines: 21 (all [OK])
Total API calls: ~5,700
Unique libraries: 104
Regressions: 303
Improvements: 147
Precision changes: 29
Illegal keys: 0
| Project | Calls | Libs | R | I | P | Ecosystem |
|---|---|---|---|---|---|---|
| AIBO | 647 | 22 | 20 | 26 | 0 | scientific |
| allnews | 986 | 30 | 80 | 22 | 14 | NLP |
| barcoded_yeast_reanalysis | 328 | 11 | 2 | 6 | 0 | scientific |
| click1 | 5 | 1 | 0 | 0 | 0 | CLI |
| covid19 | 89 | 9 | 0 | 1 | 3 | data |
| Deep-Graph-Kernels | 79 | 5 | 0 | 0 | 0 | ML |
| django | 43 | 7 | 0 | 4 | 0 | web |
| ex_4_2 | 207 | 8 | 0 | 0 | 0 | scientific |
| final | 314 | 9 | 1 | 10 | 0 | web |
| flask1 | 7 | 1 | 0 | 0 | 0 | web |
| greenbenchmark | 489 | 13 | 150 | 3 | 0 | data |
| hfhd | 444 | 7 | 4 | 20 | 1 | scientific |
| MAHE_OD_DATASET | 480 | 17 | 19 | 5 | 10 | ML/vision |
| polire | 421 | 16 | 5 | 17 | 1 | scientific |
| political-polarisation | 69 | 6 | 12 | 2 | 0 | data |
| Python-Workshop | 172 | 4 | 0 | 3 | 0 | edu |
| qho | 71 | 4 | 0 | 0 | 0 | scientific |
| scrapping | 110 | 3 | 0 | 16 | 0 | web |
| SDOML | 94 | 10 | 2 | 0 | 0 | ML |
| tensorflow1 | 15 | 1 | 0 | 0 | 0 | ML |
| Youtube | 104 | 12 | 2 | 2 | 0 | web |
TOTAL regressions: 303
third_party_api_loss: 285
third-party -> local: 262
third-party -> unknown: 23
local -> unknown: 18
TOTAL improvements: 147
local -> third-party: 147
TOTAL precision changes: 29
TOTAL illegal keys: 0
v1 leaked function-local bindings into module-level symbol tables.
v2 correctly isolates them. Method chains on local variables that
v1 classified via scope pollution are now local or unknown.
Subscript and chained method calls on local variables holding third-party values cannot always be traced through the full propagation path. SourceSet alternatives preserve the candidates.
Functions returning different third-party objects from different
branches produce conservative primaries with complete alternatives
in library_usage.
v2 correctly resolves provenance that v1 missed: local function return-value propagation, constructor-arg → self.attr → method call, and decorator evidence tracking.