Brazilian benchmark series + Agente Pedometrista (v0.9.60 -> v0.9.65)#16
Open
HugoMachadoRodrigues wants to merge 10 commits into
Open
Brazilian benchmark series + Agente Pedometrista (v0.9.60 -> v0.9.65)#16HugoMachadoRodrigues wants to merge 10 commits into
HugoMachadoRodrigues wants to merge 10 commits into
Conversation
…files)
- New R/benchmark-bdsolos.R:
* benchmark_bdsolos_sibcs(pedons): runs classify_sibcs() and computes
confusion matrix + per-Ordem recall vs reference_sibcs (the pedologist
truth from BDsolos Classe de Solos Nivel 1/2/3)
* .bdsolos_normalize_ordem(): maps modern ("ARGISSOLO" -> "Argissolos")
and pre-1999 legacy names (PODZOLICO, LATOSOL, GLEI, BRUNIZEM, ALUVIAL,
AREIA, RENDZINA -> SiBCS Ordens), diacritic-aware
- R/bdsolos.R loader extension:
* Captures Classe de Solos Nivel 1/2/3 columns into
site$reference_nivel_1/2/3
* Bug fix .bdsolos_find_header_line(): switched from which.max() to
first line with >= 5 fields. Real BDsolos data rows can have MORE
semicolons than the header because free-text "Descricao Original"
fields contain embedded ';'
- Tests: 10 new tests / 42 expectations
* normalize_ordem mapping (modern + legacy + diacritics + edge cases)
* benchmark schema (predictions/confusion/per_ordem/summary)
* accuracy bounds, max_n, error handling, n_unmapped reporting
* Integration: load_bdsolos_csv captures niveis 1/2/3
- Smoke test on 100 RJ pedons: 34% Ordem accuracy
* Argissolos: 67.6% recall (largest class, healthy baseline)
* 0% recall on Latossolos, Gleissolos, Espodossolos -> v0.9.61 targets
R CMD check Status: OK (3717 / 0 / 20)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rule
- New R/sibcs-color-tuning.R:
* .classify_b_color(hue, value, chroma): one of VERMELHO /
VERMELHO_AMARELO / AMARELO / BRUNO_ACINZENTADO / ACINZENTADO
* .dominant_b_color(pedon): walks all B horizons, returns
thickness-weighted dominant color category. Tie-break order:
BRUNO_ACINZ > ACINZ > AMARELO > VERMELHO > V_AMARELO
* .dominant_b_color_subordem(pedon, ordem_code): ordem-aware
mapping to SiBCS subordem code (P -> PV/PA/PVA/PBAC/PAC,
L -> LV/LA/LVA/LB/LVA, N -> NV/NX/NX/NB/NX). Other ordens
return NA (no override).
* .apply_color_dominant_override(): post-processor, swaps
YAML-assigned subordem when dominant disagrees with first-match.
- R/key-sibcs.R: classify_sibcs() wires the override between
subordem assignment and the v0.9.45 "cor a determinar" fallback
detection. Override evidence ends up in
result$trace$color_dominant_override and a warning fires whenever
the swap happens.
- R/benchmark-bdsolos.R:
* .bdsolos_normalize_subordem(): maps any case / language form of
a subordem name to the canonical 2-3 letter SiBCS code
(PV / PBAC / LVA / etc.). Diacritic-aware. Handles compound
names (BRUNO-ACINZENTADO, VERMELHO-AMARELO).
* benchmark_bdsolos_sibcs() now also reports subordem-level
metrics: predictions$predicted_subordem_code /
reference_subordem_code / agree_subordem;
accuracy_subordem (top-level); summary$n_in_scope_sub /
n_matched_sub.
* tests/test-v0960-bdsolos-benchmark.R schema test updated for
new fields.
- Tests: 14 new tests / 37 expectations
* .classify_b_color mapping for all 5 categories + NA inputs
* .dominant_b_color thickness-weighted dominant + NA fallback
* .dominant_b_color_subordem for P/L Ordens + non-color Ordens
* .apply_color_dominant_override flip + no-op + non-color +
missing-Munsell paths
* classify_sibcs() end-to-end: override exposed in trace +
Cambissolos untouched
Smoke results (RJ benchmark, 100 pedons): 9 / 100 pedons had
their subordem overridden by the dominant-color rule. Ordem
accuracy unchanged (33%) since the override is a 2nd-level rule.
R CMD check Status: OK (3722 / 0 / 20).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…set)
Joins the BDsolos and FEBR PedonRecord lists by site$sisb_id to
dedupe historic Embrapa pedons that appear in both corpuses.
- New R/merge-brazilian.R:
* merge_brazilian_pedons(bdsolos, febr, prefer = c("bdsolos",
"febr"), verbose = TRUE): joins two PedonRecord lists by
site$sisb_id, drops duplicates from the non-preferred side.
Each surviving pedon is tagged with site$merge_decision
("kept_bdsolos" / "kept_febr" / "unique") and site$merge_source.
* summarize_brazilian_overlap(bdsolos, febr): diagnostic counts
(n_bdsolos, n_febr, n_shared, n_bdsolos_only, n_febr_only,
n_unmatchable). Useful before committing to the merge.
* .get_sisb_id(pedon): NA-safe centralised lookup. Backwards-
compatible with PedonRecord objects pre-v0.9.62.
- R/bdsolos.R: load_bdsolos_csv() now also assigns
site$sisb_id <- Codigo PA (BDsolos historical pedon ID,
identical numbering to FEBR's observacao$sisb_id).
- R/febr.R: read_febr_pedons() now captures observacao$sisb_id
into site$sisb_id (character, NA-safe).
Empirical RJ overlap scan (722 BDsolos x 884 FEBR):
590 shared sisb_ids, 132 BD-only, 239 FEBR-only, 55 unmatchable.
Naive concat 1606 -> after merge 1016 distinct pedons.
Tests: 12 new tests / 28 expectations.
R CMD check Status: OK (3760 / 0 / 20).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…eries - Version badge 0.9.40 -> 0.9.63 - Tests-passing badge 3137 -> 3760 - New "What's new in v0.9.62" section: load_bdsolos_csv, read_febr_pedons, benchmark_bdsolos_sibcs, dominant-color-in-B override, merge_brazilian_pedons (with the 590/722 RJ overlap empirical result) - Status footer merges Brazilian highlights with USDA/WRB summary - NEWS.md entry for v0.9.63 No code changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR extends soilKey’s Brazilian SiBCS pipeline by adding: (1) a BDsolos surveyor-reference benchmark, (2) a dominant-color-in-B post-processor to improve color-driven SiBCS subordem assignment, and (3) a BDsolos×FEBR dedup merge keyed on site$sisb_id, alongside documentation and test-suite updates.
Changes:
- Add BDsolos benchmark tooling (
benchmark_bdsolos_sibcs()+ BDsolos SiBCS normalization helpers). - Add SiBCS dominant-color-in-B override logic and wire it into
classify_sibcs()trace/warnings. - Add BDsolos×FEBR merge + overlap diagnostics via
site$sisb_id, plus README/NEWS/version bumps and new tests/man pages.
Reviewed changes
Copilot reviewed 25 out of 25 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/testthat/test-v0962-merge-brazilian.R | Adds tests for sisb_id extraction, merge behavior, overlap summary, and BDsolos loader integration. |
| tests/testthat/test-v0961-sibcs-color-tuning.R | Adds unit and end-to-end tests for dominant-color-in-B override behavior. |
| tests/testthat/test-v0960-bdsolos-benchmark.R | Adds tests for BDsolos Ordem normalization and benchmark schema/behavior. |
| README.md | Adds Brazilian benchmark series narrative + updates badges/status text. |
| R/sibcs-color-tuning.R | Implements dominant B-horizon color categorization, dominance calculation, and override application. |
| R/merge-brazilian.R | Implements merge_brazilian_pedons() and summarize_brazilian_overlap() plus internal helpers. |
| R/key-sibcs.R | Wires dominant-color override into classify_sibcs() and exposes it in trace/warnings. |
| R/febr.R | Captures FEBR observacao$sisb_id into site$sisb_id. |
| R/benchmark-bdsolos.R | Adds BDsolos benchmark and normalization helpers for Ordem/Subordem. |
| R/bdsolos.R | Captures BDsolos Classe de Solos Nivel 1/2/3 and sets site$sisb_id from Codigo PA; improves header detection. |
| NEWS.md | Documents v0.9.60–v0.9.63 Brazilian series and associated tests/metrics. |
| NAMESPACE | Exports benchmark_bdsolos_sibcs(), merge_brazilian_pedons(), summarize_brazilian_overlap(). |
| man/summarize_brazilian_overlap.Rd | Roxygen output for summarize_brazilian_overlap(). |
| man/merge_brazilian_pedons.Rd | Roxygen output for merge_brazilian_pedons(). |
| man/dot-tag_merge_decision.Rd | Roxygen output for internal .tag_merge_decision(). |
| man/dot-get_sisb_id.Rd | Roxygen output for internal .get_sisb_id(). |
| man/dot-dominant_b_color.Rd | Roxygen output for internal .dominant_b_color(). |
| man/dot-dominant_b_color_subordem.Rd | Roxygen output for internal .dominant_b_color_subordem(). |
| man/dot-classify_b_color.Rd | Roxygen output for internal .classify_b_color(). |
| man/dot-BDSOLOS_SITE_PATTERNS.Rd | Updates documented size of .BDSOLOS_SITE_PATTERNS. |
| man/dot-bdsolos_normalize_subordem.Rd | Roxygen output for internal .bdsolos_normalize_subordem(). |
| man/dot-bdsolos_normalize_ordem.Rd | Roxygen output for internal .bdsolos_normalize_ordem(). |
| man/dot-apply_color_dominant_override.Rd | Roxygen output for internal .apply_color_dominant_override(). |
| man/benchmark_bdsolos_sibcs.Rd | Roxygen output for benchmark_bdsolos_sibcs(). |
| DESCRIPTION | Bumps package version to 0.9.63. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| [](https://lifecycle.r-lib.org/articles/stages.html) | ||
|  | ||
|  |
Comment on lines
+39
to
+40
| #' \item \code{PODZOLICO}, \code{PODZOLCIO}, \code{LATOSOL} | ||
| #' -> \code{Argissolos} (the 1999 SiBCS rename) |
Comment on lines
+140
to
+144
| toks <- strsplit(ascii, "[ ,;]+")[[1L]] | ||
| if (length(toks) < 1L) return(NA_character_) | ||
| ord_word <- toks[1L] | ||
| sub_word <- if (length(toks) >= 2L) toks[2L] else "" | ||
| # Some BDsolos rows use compound names (BRUNO-ACINZENTADO, |
Comment on lines
+295
to
+299
| predicted_ordem <- character(length(pedons)) | ||
| predicted_subordem <- character(length(pedons)) | ||
| predicted_gg <- character(length(pedons)) | ||
| predicted_sg <- character(length(pedons)) | ||
| reference_raw <- character(length(pedons)) |
Comment on lines
+63
to
+65
| pedon$site$reference_source <- paste0(prev_src, " | merged:", decision) | ||
| } else { | ||
| pedon$site$reference_source <- paste0(prev_src, " | ", decision) |
| #' tagged via \code{site$merge_decision} (\code{"kept_bdsolos"}, | ||
| #' \code{"kept_febr"}, or \code{"unique"}) and \code{site$merge_source}. | ||
| #' Pedons appear in the order: chosen-from-overlap first, then | ||
| #' unique-to-bdsolos, then unique-to-febr. |
| # * VERMELHO -- hue <= 2.5YR (10R, 7.5R, 5R, 2.5R, 2.5YR) | ||
| # * VERMELHO_AMARELO -- hue == 5YR (intermediate) | ||
| # * AMARELO -- hue >= 7.5YR with chroma >= 4 | ||
| # * BRUNO_ACINZENTADO -- value <= 4 AND chroma <= 4 (dark, regardless hue) |
Comment on lines
+51
to
+56
| # 1. BRUNO_ACINZENTADO: dark (value <= 4, chroma <= 4) and at least | ||
| # moderately yellow (hue >= 5YR) -- catches the dark-brown / dark-grey | ||
| # end of the B color spectrum. | ||
| if (value <= 4 && chroma <= 4 && | ||
| grepl("^(5YR|7\\.5YR|10YR|2\\.5Y|5Y|10Y)\\b", hu)) { | ||
| return("BRUNO_ACINZENTADO") |
Comment on lines
+251
to
+253
| "dominante-de-cor em B (categoria %s, espessura ", | ||
| "%.0f cm de %d horizonte(s) classificado(s)/", | ||
| "%d horizonte(s) B)."), |
…ist persona
One-call bootstrap of the local VLM stack so v0.9.65's agent_app
can offer a single "Configurar Gemma local" button.
- New R/setup-local-vlm.R:
* setup_local_vlm(model = "balanced"): idempotent. Detects Ollama,
starts the daemon if needed, pulls the chosen model. Catalog:
light = gemma4:e2b (~1.5 GB), balanced = gemma4:e4b (~3 GB),
best = gemma4:31b (~19 GB). Accepts arbitrary model identifiers
as well. Returns status list (ready, model, ollama_url,
installed, running, pulled, hint) for direct rendering in a
Shiny status card.
* ollama_is_installed(): detects ollama on PATH.
* ollama_ensure_running(timeout_s = 30): starts ollama serve in
background and polls until /api/tags answers.
* ollama_pull_model(model): wraps `ollama pull`; no-op when the
model is already on disk; rejects empty / NA input.
* ollama_list_local_models(): queries /api/tags; never throws.
* .print_ollama_install_hint(): OS-specific install instructions
(Homebrew / curl-pipe-sh / winget) when Ollama is missing.
- R/vlm-prompts.R:
* pedologist_system_prompt(language = c("pt-BR", "en")): canonical
persona installed in every chat session (and exposed for any
user-built vlm_provider(..., system_prompt = ...)). Trained
pedometrist; SiBCS 5a + WRB 2022 + KST 13ed; explicit "NEVER
classify, only extract" + per-attribute confidence + source_quote.
- R/vlm-providers.R:
* Default Ollama model lowered from gemma4:e4b to gemma4:e2b
(laptop-friendly default; users opt into bigger via
setup_local_vlm presets).
* Documentation updated to point at setup_local_vlm() as the
one-shot bootstrap path.
- Tests: 13 new tests / ~30 expectations
* Catalog resolution (light/balanced/best -> model names)
* Status schema verified (7 documented fields)
* Error paths (no Ollama on PATH, invalid model, empty / NA input)
* Daemon lifecycle short-circuits
* Persona content & language switching (PT-BR / EN)
CRAN-friendly: ships the downloader, NOT the weights. The user runs
setup_local_vlm() once after install; Ollama caches the model in
~/.ollama/models/. No network calls happen at package install time.
R CMD check Status: OK (3821 / 0 / 21).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ma pedometrist End-to-end soil profile classification driven by the v0.9.64 local Gemma 4 stack: photo + PDF + field-sheet image + Vis-NIR spectrum -> deterministic taxonomic key (WRB 2022 + SiBCS 5a + USDA Tax 13). - New inst/shiny/agent_app/app.R (bslib page_navbar, 8 nav_panels): * Foto Munsell -> extract_munsell_from_photo() * PDF / Texto -> extract_horizons_from_pdf() * Ficha Campo -> extract_site_from_fieldsheet() * Espectros -> fill_from_spectra() (OSSL local-band library) * Tabela -> editable DT for manual correction * Classificar -> classify_all() -> 3 bslib::value_box() cards * Trace -> per-system trace + provenance browser * Pedometrista -> free-form chat (ellmer chat session preserved) Persistent 320 px sidebar with provider/model selector, real-time Ollama status badges, "Configurar Gemma local" button (modal progress), language toggle (PT-BR / EN), session reset. - New R/run-agent-app.R: run_agent_app(port, launch.browser, ...) launcher; soft-fails on missing Suggests with actionable install.packages() hint. - New vignettes/v10_agente_pedometrista.Rmd: walkthrough of setup, persona, all 8 tabs, classify_from_documents() programmatic equivalent, privacy / data sovereignty rationale, known limits. - README.md: * Version badge 0.9.62 -> 0.9.65 * Tests-passing badge 3 760 -> 3 821 * New "What's new in v0.9.65 -- Agente Pedometrista" section * Status footer rewritten - DESCRIPTION: adds bslib + bsicons to Suggests. - Tests (test-v0965-agent-app.R): 4 verifying launcher exports, app.R parseability, all 8 nav_panels wired, persona referenced. Principle: the LLM never classifies. It only extracts schema-validated JSON. The taxonomic key is 100 % R, deterministic, YAML-versioned. R CMD check Status: OK (3821 / 0 / 21). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…l-VLM hint
Adds the harness that measures the local Gemma 4 baseline on real
soilKey extraction tasks, so Phase 2 (few-shot) and Phase 3 (LoRA)
decisions are informed by data.
- New R/zzz.R: .onAttach() interactive hint about local Gemma stack:
* Silent when Ollama not installed.
* Hint to start daemon when installed but stopped.
* Hint to run setup_local_vlm("light") when daemon up but
gemma4:e2b missing.
* Auto-pull only with options(soilKey.auto_setup_vlm = TRUE) or
Sys.setenv(SOILKEY_AUTO_SETUP_VLM = "1") (CRAN policy 1.1
forbids auto-modification of system without explicit consent).
* Suppress all hints with options(soilKey.suggest_local_vlm = FALSE).
* .suggest_local_vlm_message() factored out for testability.
- New R/benchmark-vlm-extraction.R:
* benchmark_vlm_extraction(providers, tasks, fixtures_dir,
max_per_task): provider-agnostic. 3 tasks (horizons / site /
munsell) x per-task metrics. Returns long predictions df +
summary df. Accepts MockVLMProvider for unit tests.
* list_vlm_fixtures(task): paired (input, golden.json) discovery.
* make_synthetic_horizons_fixture(pedon, fixture_id): renders any
PedonRecord into a Markdown profile description and emits the
structured horizons as the golden answer. Scales fixture set
from BDsolos / FEBR / KSSL.
* .metric_horizons_overlap: precision + recall (depth-overlap >=
80 %) + per-attribute match rate (10 % numeric tolerance).
* .metric_site_iou: field-level IoU + value-accuracy + recall.
* .munsell_delta_e: prefers Nickerson Color Difference Index;
falls back to Lab Euclidean (DeltaE 1976).
- New inst/prompts/extract_site_from_text.md: text-mode companion
to extract_site_metadata.md. Required because the original prompt
says "Supplied as an image content block", causing local Gemma
to return all-null when fed text. Inlines {document_text} and
explicitly forbids null for visible fields.
- inst/fixtures/vlm_extraction/ -- 4 bundled paired fixtures:
* horizons/perfil_RJ_argissolo (4-horizon Argissolo Vermelho-Amarelo)
* horizons/perfil_MG_latossolo (4-horizon Latossolo Vermelho)
* site/ficha_RJ_001, ficha_MG_002 (field-sheet text + golden site)
* munsell/README.md -- format spec for user-supplied photo fixtures
(CRAN size + licence policy).
- vignettes/v11_vlm_extraction_benchmark.Rmd: walkthrough.
- 47 tests / ~70 expectations in test-v0966-benchmark-vlm-extraction.R
covering fixture discovery, metric correctness on synthetic ground
truths, end-to-end with MockVLMProvider, and
.suggest_local_vlm_message shape on all Ollama states.
- README.md: version 0.9.65 -> 0.9.66, tests 3821 -> 3868, new
"What's new" section with the baseline table.
## Baseline measured (gemma4 8B local, MacBook M1)
| task | fixture | precision/iou | recall/value-acc | attr-match |
|----------|----------------|---------------|------------------|-----------|
| horizons | Latossolo MG | 1.00 | 1.00 | 1.00 |
| horizons | Argissolo RJ | 1.00 | 1.00 | 1.00 |
| site | Ficha MG | 0.79 | 1.00 | 0.79 |
| site | Ficha RJ | 0.87 | 0.92 | 0.87 |
Horizons extraction is solved on clean PT-BR profiles. Site is
~83 % IoU with ~96 % value-accuracy on matched fields -- gaps are
inferred fields (e.g. country: BR from a Brazilian state) the 8B
model misses. This baseline is the input for Phase 2 / Phase 3.
R CMD check Status: OK (3868 / 0 / 21).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The on-disk size figures shipped in v0.9.64 - v0.9.66 for the local Gemma 4 catalog were wrong: I had documented gemma4:e2b at "~1.5 GB" assuming bare 2B parameters at 4-bit quantisation, but the multimodal Gemma 4 builds bundle a vision encoder + tokenizers that add ~5 GB on top. Confirmed locally after the v0.9.66 pull completed: $ ollama list gemma4:e2b 6.67 GB gemma4 8.95 GB (alias of latest, 8B) Corrigendum scope (no code logic changes): - R/setup-local-vlm.R .SOILKEY_OLLAMA_CATALOG -- corrected size_gb fields to 6.7 / 8.0 / 19.0; new docstring explaining the ~5 GB vision-encoder overhead. - R/zzz.R .suggest_local_vlm_message() -- "(~1.5 GB)" replaced with "(~6.7 GB on disk)". - R/vlm-providers.R -- both vlm_provider() docstrings updated with corrected sizes + multimodal-overhead note. - vignettes/v10_agente_pedometrista.Rmd -- corrected sizes plus a corrigendum callout pointing back to v0.9.67. - vignettes/v11_vlm_extraction_benchmark.Rmd -- corrected sizes AND added a fresh head-to-head benchmark comparing gemma4:e2b vs the 8B 'gemma4' alias on the four bundled text fixtures. - README.md -- corrected sizes everywhere; status footer updated. New baseline finding (e2b vs 8B head-to-head): | task | gemma4:e2b | gemma4 (8B) | |----------|-----------|-------------| | horizons | 1.00 / 1.00 / 1.00 (both fixtures) | 1.00 / 1.00 / 1.00 (both) | | site | 50% ok rate; value-acc 1.00 on matched | 0% ok rate (JSON validation errors) | - Horizons (text) is solved at both sizes -- locks in gemma4:e2b as the soilKey default. - Site (text) is unstable on both sizes; failure mode is JSON validation, not wrong content. When extraction succeeds, value- accuracy on matched fields is 100%. This is exactly what Phase 2 (few-shot demos in the prompt) targets. R CMD check Status: OK. No tests changed; no API changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…risation
Adds schema-correct worked-example prompts for the 3 extraction
tasks (horizons / site-from-text / Munsell-from-photo), opt-in
use_fewshot parameter on every extractor, n_repeats parameter on the
benchmark, and a harder bundled fixture (multi-horizon Chernossolo
BA with PT-BR comma decimals + mixed Munsell umida/seca + CaCO3).
- New inst/prompts/ (3 few-shot variants):
* extract_horizons_fewshot.md -- 2 worked examples in the SCHEMA-
CORRECT mixed shape: top_cm / bottom_cm / designation / boundary_*
are RAW values; munsell_moist / munsell_dry are SINGLE wrapped
objects (hue + value + chroma + confidence + source_quote);
everything else (clay_pct, ph_h2o, etc.) is wrapped {value,
confidence, source_quote}. Earlier draft (separate
munsell_hue_moist wrappers) caused 0% schema validation -- this
is the corrected v0.9.68 shape.
* extract_site_from_text_fewshot.md -- 2 PT-BR + EN examples; id /
crs raw, everything else wrapped; country inferred from state.
* extract_munsell_from_photo_fewshot.md -- 2 examples (with /
without Munsell card; confidence calibration baked in).
- R/vlm-extract.R:
* extract_horizons_from_pdf(use_fewshot = TRUE) -- new arg, default
TRUE. Switches prompt to *_fewshot variant.
* extract_munsell_from_photo(use_fewshot = TRUE) -- same.
* extract_site_from_fieldsheet(use_fewshot = TRUE) -- accepted but
image-mode prompt unchanged (text-mode goes through
.run_one_extraction).
- R/benchmark-vlm-extraction.R:
* benchmark_vlm_extraction(use_fewshot = TRUE, n_repeats = 1L) --
two new args. n_repeats runs each fixture N times so the summary
can report metric_*_mean AND metric_*_sd. Required to distinguish
real lift from stochastic LLM noise on a small fixture set.
* .run_one_extraction(use_fewshot) -- forwarded.
- inst/fixtures/vlm_extraction/horizons/perfil_BA_chernossolo_messy:
4-horizon Chernossolo Argiluvico Carbonatico from a Bahia survey,
with PT-BR comma decimals, UTM coords noted then converted, mixed
Munsell umida/seca, CaCO3 equivalents. Smoke at v0.9.68:
precision = 1.00, recall = 1.00, attr_match = 0.79 with gemma4:e2b
+ few-shot.
- vignettes/v11 + README + NEWS: new "Phase 2" section + status
footer.
## Honest measurement
Few-shot did NOT move metrics on the 4 simple fixtures because
vanilla gemma4:e2b already nails them. The 50% ok-rate observed
in v0.9.66 was stochastic variance, not a real failure mode -- which
is exactly what the new n_repeats parameter exposes. Few-shot
DOES NOT regress quality, and the harder Chernossolo BA fixture
confirms the system handles non-toy PT-BR profiles cleanly. Real
lift will surface from harder fixtures or smaller models -- not
from the existing toy suite.
R CMD check Status: OK (3868 / 0 / 21 unchanged).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…+ polish
Three coherent improvements that close out the Phase 2 roadmap.
(A) 8 BDsolos hard fixtures
inst/fixtures/vlm_extraction/horizons/bdsolos_RJ_*.{txt,golden.json}
-- one per SiBCS Ordem (Argissolo, Cambissolo, Chernossolo,
Espodossolo, Gleissolo, Latossolo, Neossolo, Planossolo).
Generated via make_synthetic_horizons_fixture() from real RJ
pedons. Stress-test target: gemma4:e2b + few-shot vs baseline,
n_repeats = 3, ~12 min on a laptop CPU.
(B) ellmer chat_structured() bridge
- New R/vlm-types.R:
* vlm_type_from_soilkey_schema(name): wraps
ellmer::type_from_schema() reading inst/schemas/<name>.json.
Returns the ellmer type tree the provider needs for
chat_structured(type = ...).
* .provider_supports_structured(provider): TRUE when provider
exposes chat_structured as a method.
- validate_or_retry(use_structured = FALSE): new param. When
TRUE AND provider supports it, replaces the chat-and-parse-
and-retry loop with a single chat_structured() call that
returns a schema-validated R list directly. Removes JSON
validation errors at the protocol level (Anthropic tool
calls / OpenAI response_format=json_schema / Ollama 0.5+
format=json_schema / Gemini structured output).
- extract_horizons_from_pdf(), extract_munsell_from_photo(),
extract_site_from_fieldsheet(), benchmark_vlm_extraction():
all accept use_structured (default FALSE for back-compat).
(C) Production polish
- extract_horizons_from_pdf(): cli::cli_progress_bar() for
multi-chunk PDFs (no-op for single-chunk).
- inst/shiny/agent_app/app.R: new sidebar section "Estrategia
de extracao" with checkboxes for use_fewshot (default TRUE)
and use_structured (default FALSE). Both flags propagate to
every extract_*() call inside the app.
- Model preset labels corrected to v0.9.67 measured sizes
(light = ~6.7 GB, balanced = ~8 GB, best = ~19 GB).
Tests: 20 new tests / ~45 expectations (test-v0970-structured-
outputs.R) covering type builder, capability probe, structured
fast path, fallback path, and parameter propagation through the
extractor family. Total: 3 888 / 0 / 21.
R CMD check Status: OK.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR bundles two release series on top of v0.9.59:
Brazilian benchmark series (v0.9.60 -> v0.9.63)
benchmark_bdsolos_sibcs()+.bdsolos_normalize_ordem()+ loader extension to capture BDsolos pre-parsed Classe de Solos Nivel 1/2/3.merge_brazilian_pedons()deduplicates BDsolos × FEBR viasite$sisb_id(BDsolosCodigo PA≡ FEBRobservacao$sisb_id).Agente Pedometrista series (v0.9.64 -> v0.9.65)
setup_local_vlm()(Ollama + Gemma 4 one-call bootstrap, presetslight/balanced/best);pedologist_system_prompt(language)persona PT-BR / EN; default Ollama model lowered togemma4:e2b(~1.5 GB).run_agent_app()modern bslib Shiny UI with 8 tabs (foto / PDF / ficha / espectro / tabela / classificar / trace / chat) wired to the local Gemma. Vignettev10_agente_pedometrista.Rmd. README + status footer updated.Test plan
R CMD checkStatus: OK (0 errors / 0 warnings / 0 notes)setup_local_vlm()+run_agent_app()validated by parsing the Shiny app.R + checking all 8 nav_panels are wiredTags
v0.9.60,v0.9.61,v0.9.62,v0.9.63,v0.9.64,v0.9.65GitHub Releases: v0.9.63 · v0.9.65
🤖 Generated with Claude Code