Releases · HugoMachadoRodrigues/soilKey

09 May 19:56

v0.9.94-data

a0d0df3

soilKey lazy-fetch data (v0.9.94) Latest

Latest

Benchmark caches downloaded on demand by load_*_sample() functions and download_extdata_cache() since soilKey v0.9.94.

File	n pedons	Source	Size
`afsp_sample.rds`	120	ISRIC Africa Soil Profiles Database v1.2	1.2 MB
`kssl_sample.rds`	99	NCSS Lab Data Mart (KSSL gpkg)	1.0 MB
`kssl_nasis_sample.rds`	99	NCSS Lab Data Mart + NASIS Morphological	1.0 MB
`wosis_stratified_sample.rds`	130	ISRIC WoSIS GraphQL (5 per RSG × 26 RSGs)	1.3 MB

Usage

# Eager prefetch all four caches into the user cache directory:
soilKey::download_extdata_cache("all")

# Or download lazily on first call:
length(soilKey::load_afsp_sample()$pedons)
length(soilKey::load_kssl_sample()$pedons)
length(soilKey::load_kssl_nasis_sample()$pedons)
length(soilKey::load_wosis_stratified_sample()$pedons)

The cache directory is tools::R_user_dir("soilKey", "data") (typically ~/Library/Application Support/.../soilKey/data on macOS, ~/.local/share/.../soilKey/data on Linux, %LOCALAPPDATA%/.../soilKey/data on Windows).

These files are under the same MIT license as the soilKey R package; the underlying datasets retain their respective upstream licenses (ISRIC AfSP / WoSIS public-domain, NCSS Lab Data Mart public-domain US Federal data).

Assets 6

06 May 23:36

HugoMachadoRodrigues

v0.9.71

153b24c

v0.9.71 — Phase 2 done: BDsolos fixtures + structured outputs + polish

Bundles three coherent improvements that close out the Phase 2 roadmap.

(A) 8 BDsolos hard fixtures

Generated via `make_synthetic_horizons_fixture()` from real RJ pedons selected by SiBCS Ordem (Argissolo, Cambissolo, Chernossolo, Espodossolo, Gleissolo, Latossolo, Neossolo, Planossolo). Each fixture is a real BDsolos pedon's full horizon table rendered as Markdown — non-toy, multi-horizon, mixed Munsell úmida/seca, varied attribute coverage.

Reproduce locally:

```r
benchmark_vlm_extraction(
providers = list(gemma_e2b = list(name = "ollama", model = "gemma4:e2b")),
tasks = "horizons",
use_fewshot = TRUE,
n_repeats = 3L
)$summary
```

(8 fixtures × 3 reps × ~30 s = 12 min on a laptop CPU. Empirical numbers from a fully-completed run will land in a follow-up release.)

(B) ellmer `chat_structured()` bridge

`vlm_type_from_soilkey_schema(name)` — wraps `ellmer::type_from_schema()` reading `inst/schemas/.json` directly.
`validate_or_retry(..., use_structured = TRUE)` — short-circuits the chat-and-parse-and-retry loop when the provider supports it. Provider receives the ellmer type tree built from the soilKey schema and returns a structurally-valid R list directly. Removes the entire class of "model returned prose / wrong shape" failures at the protocol level (Anthropic tool calls / OpenAI `response_format = json_schema` / Ollama 0.5+ `format = json_schema` / Gemini structured output).
All extractors (`extract_horizons_from_pdf`, `extract_munsell_from_photo`, `extract_site_from_fieldsheet`) and `benchmark_vlm_extraction()` accept `use_structured = FALSE` (default for back-compat).

(C) Production polish

`extract_horizons_from_pdf()` — `cli::cli_progress_bar()` for multi-chunk PDFs (no-op for single-chunk, the common case).
`agent_app()` — sidebar adds "Estratégia de extração" section with checkboxes for `use_fewshot` (default TRUE) and `use_structured` (default FALSE). Both propagate to every `extract_*()` call inside the app.
Model preset labels corrected to v0.9.67 measured sizes (`light` = ~6.7 GB, `balanced` = ~8 GB, `best` = ~19 GB).

Tests

20 new tests / ~45 expectations in `test-v0970-structured-outputs.R`:

type builder (`vlm_type_from_soilkey_schema`) input validation + ellmer integration
capability probe (`.provider_supports_structured`) for ellmer Chat / Mock / NULL
`validate_or_retry` structured fast path (skip parse + retry when provider supports)
fallback path (use_structured=TRUE on Mock falls through to legacy loop)
parameter propagation through the entire `extract_*()` family

Total: 3 888 passing / 0 failing / 21 skipped.

Status

`R CMD check` Status: OK (0 errors / 0 warnings / 0 notes)
8 BDsolos fixtures (1 per SiBCS Ordem) + structured-output infra + polish
20 new tests
README + NEWS + status footer updated

🤖 Generated with Claude Code

Assets 2

06 May 23:01

HugoMachadoRodrigues

v0.9.68

ea00e22

v0.9.68 — Phase 2: few-shot demonstrations + variance characterisation

Phase 2 of the local-Gemma roadmap. Adds schema-correct worked-example prompts for the 3 extraction tasks, opt-in use_fewshot parameter, n_repeats for variance characterisation, and a harder bundled fixture.

What's shipped

Few-shot prompts (3 new)

inst/prompts/extract_horizons_fewshot.md — 2 worked examples in the schema-correct mixed shape: top_cm / bottom_cm / designation / boundary_* are RAW values; munsell_moist / munsell_dry are SINGLE wrapped objects holding hue + value + chroma + confidence + source_quote; everything else (clay_pct, ph_h2o, etc.) is wrapped {value, confidence, source_quote}.
inst/prompts/extract_site_from_text_fewshot.md — 2 PT-BR + EN examples; id / crs raw, everything else wrapped; country inferred from state.
inst/prompts/extract_munsell_from_photo_fewshot.md — 2 examples (with / without Munsell card; confidence calibration baked into the demos).

`use_fewshot` parameter

Opt-in on extract_horizons_from_pdf(), extract_munsell_from_photo(), and benchmark_vlm_extraction(). Default TRUE from v0.9.68. Set FALSE to run the bare-instructions baseline for an A/B.

`n_repeats` parameter

New on benchmark_vlm_extraction(). Runs each (provider × task × fixture) cell N times. Summary reports metric_*_mean AND metric_*_sd. Required to distinguish real lift from stochastic LLM noise.

Harder bundled fixture

perfil_BA_chernossolo_messy.{txt,golden.json} — 4-horizon Chernossolo Argilúvico Carbonático from a real Bahia survey: PT-BR comma decimal pH = 6,8, UTM coordinates noted then converted, mixed Munsell úmida/seca, CaCO3 equivalents. Smoke result with gemma4:e2b + few-shot: precision = 1.00, recall = 1.00, attr_match = 0.79.

Honest measurement (this is not a marketing release)

Few-shot did NOT move the metrics on the 4 simple bundled fixtures because vanilla gemma4:e2b already nails them:

Task	Fixture	Baseline	Few-shot
horizons	Latossolo MG	1.00 / 1.00 / 1.00	1.00 / 1.00 / 1.00
horizons	Argissolo RJ	1.00 / 1.00 / 1.00	1.00 / 1.00 / 1.00
site	Ficha MG	0.79 / 1.00 / 0.79	0.79 / 1.00 / 0.79
site	Ficha RJ	0.80 / 0.92 / 0.80	0.80 / 0.92 / 0.80

The 50% ok-rate observed in v0.9.66 was stochastic variance, not a real failure mode — which is exactly what the new n_repeats parameter exposes. Few-shot does not regress quality and the harder Chernossolo BA fixture confirms the pipeline handles non-toy PT-BR profiles. Real lift will surface only on harder fixtures or smaller models — not on the existing toy suite.

Status

R CMD check Status: OK (0 errors / 0 warnings / 0 notes)
Test suite: 3 868 passing / 0 failing / 21 skipped (unchanged; few-shot is opt-in)
3 new few-shot prompts + 1 new harder fixture
n_repeats API + metric_*_sd columns in benchmark summary

🤖 Generated with Claude Code

Assets 2

06 May 21:26

HugoMachadoRodrigues

v0.9.67

528ca52

v0.9.67 — Corrigendum: gemma4:e2b on-disk size + e2b vs 8B head-to-head

Doc + measurement corrigendum. No code logic changes.

What was wrong

Docs in v0.9.64 → v0.9.66 said gemma4:e2b was "~1.5 GB", which is the bare 2B-parameter weight at 4-bit quantisation. The actual on-disk footprint is ~6.7 GB: the multimodal Gemma 4 builds bundle a vision encoder + tokenizers that add ~5 GB above the bare parameter weights. Confirmed locally after the v0.9.66 pull completed.

Corrected catalog

Preset	Tag	On-disk
`light`	`gemma4:e2b`	~6.7 GB (was ~1.5 GB)
`balanced`	`gemma4:e4b`	~8 GB (approx)
`best`	`gemma4:31b`	~19 GB
(8B default alias)	`gemma4` (`gemma4:latest`)	~9 GB

Files updated: R/setup-local-vlm.R, R/zzz.R, R/vlm-providers.R, vignettes/v10_agente_pedometrista.Rmd, vignettes/v11_vlm_extraction_benchmark.Rmd, README.md.

New head-to-head benchmark (gemma4:e2b vs gemma4 8B)

Re-ran benchmark_vlm_extraction() with both sizes on the four bundled text fixtures:

Task	Fixture	`e2b`	`gemma4` (8B)
horizons	Latossolo MG	1.00 / 1.00 / 1.00	1.00 / 1.00 / 1.00
horizons	Argissolo RJ	1.00 / 1.00 / 1.00	1.00 / 1.00 / 1.00
site	Ficha MG	✓ (IoU 0.71, value-acc 1.00)	✗ (JSON validation error)
site	Ficha RJ	✗ (JSON validation error)	✗ (JSON validation error)

Key reads:

Horizons (text) is solved at both sizes. The 2B model matches the 8B model on clean PT-BR profile descriptions — this locks in gemma4:e2b as the soilKey default for the agent app.
Site (text) is unstable on both sizes (50 % ok rate at e2b; 0 % at 8B in this 2-fixture sample). The failures are JSON validation errors, not wrong content. When extraction succeeds, value-accuracy on matched fields is 100 % — the model knows the right answer, it just doesn't always return it in valid JSON shape.

This is exactly what Phase 2 (few-shot demonstration pairs in the prompt) targets: insert 2-3 examples of correctly-shaped JSON before each call to discipline the model into the schema. No GPU required.

Status

R CMD check Status: OK (0 errors / 0 warnings / 0 notes)
Test suite: 3 868 passing / 0 failing / 21 skipped (unchanged from v0.9.66)
No API changes; no tests changed
README + 2 vignettes refreshed

🤖 Generated with Claude Code

Assets 2

06 May 20:52

HugoMachadoRodrigues

v0.9.66

eea6fb3

v0.9.66 — Phase 1: VLM extraction benchmark

A measurable baseline for the local Gemma 4 stack — the input we needed before deciding whether to invest in few-shot demos (Phase 2) or LoRA fine-tuning (Phase 3).

What's new

`benchmark_vlm_extraction()`

Provider-agnostic harness over 3 tasks × per-task metrics:

Task	Input	Metrics
`horizons`	Markdown / text profile description	precision + recall + per-attribute match (higher better)
`site`	Field-sheet text	IoU + value-accuracy + recall (higher better)
`munsell`	Profile photo (with Munsell card)	mean Nickerson Color Difference (lower better)

Returns long-format predictions + summary data frames. Accepts MockVLMProvider for unit tests.

Bundled fixtures

horizons/: 4-horizon Argissolo RJ + 4-horizon Latossolo MG (paired text + golden JSON)
site/: 2 Brazilian field-sheet text fixtures + golden JSON
munsell/: README format spec — users supply their own photo fixtures (CRAN size + licence policy forbids shipping photos)

`make_synthetic_horizons_fixture(pedon)`

Renders any PedonRecord back into a Markdown profile description and emits the structured horizons as the golden answer. Lets you scale the horizons fixture set from BDsolos / FEBR / KSSL.

`.onAttach()` local-VLM hint

CRAN-compliant interactive prompt: detects Ollama state, suggests setup_local_vlm("light") when gemma4:e2b is missing. Auto-pull only with explicit opt-in (options(soilKey.auto_setup_vlm = TRUE) or env var SOILKEY_AUTO_SETUP_VLM=1). Suppress all hints with options(soilKey.suggest_local_vlm = FALSE).

`extract_site_from_text.md` prompt

Text-mode companion to the image-mode site prompt — required because the original explicitly said "Supplied as an image content block" and Gemma returned all-null when fed text.

Baseline measured (gemma4 8B local, MacBook M1)

Task	Fixture	precision / IoU	recall / value-acc	attr-match
`horizons`	Latossolo MG	1.00	1.00	1.00
`horizons`	Argissolo RJ	1.00	1.00	1.00
`site`	Ficha MG	0.79	1.00	0.79
`site`	Ficha RJ	0.87	0.92	0.87

Read: horizons extraction is solved (vanilla Gemma 4 + the pedologist_system_prompt() persona is enough for clean PT-BR profile descriptions). Site extraction is ~83 % IoU with ~96 % value-accuracy on matched fields — gaps are inferred fields (e.g. country: BR from a Brazilian state) that the 8B model misses but a 32B/Claude would catch.

This baseline is the input for the Phase 2 / Phase 3 decision:

Phase 2 (few-shot): inject 2-3 demonstration pairs per call. ~2 days human work; no GPU. Targets the site-task gap.
Phase 3 (LoRA): adapter fine-tune on (input, golden) pairs from BDsolos + FEBR. Needs ~1k labelled pairs and ~6 h on H100. Only justified if Phase 2 plateaus.

Status

R CMD check Status: OK (0 errors / 0 warnings / 0 notes)
Test suite: 3 868 passing / 0 failing / 21 expected skips
47 new tests / ~70 expectations
Vinheta: v11_vlm_extraction_benchmark

🤖 Generated with Claude Code

Assets 2

06 May 19:53

HugoMachadoRodrigues

v0.9.65

a944fde

v0.9.65 — Agente Pedometrista (bslib Shiny + local Gemma 4)

A modern bslib-themed Shiny UI that wires the v0.9.64 local Gemma 4 stack to the deterministic taxonomic key. Photo, PDF, field-sheet image and Vis-NIR spectrum each become a one-click extraction tab; the result is classified across WRB 2022 + SiBCS 5ª ed. + USDA Soil Taxonomy 13ed in the same session.

Quick start

# One-call setup of the local stack (downloads Gemma 4 e2b, ~1.5 GB):
soilKey::setup_local_vlm("light")

# Launch the agent:
soilKey::run_agent_app()

v0.9.64 — `setup_local_vlm()` + Ollama lifecycle + pedologist persona

setup_local_vlm(model = "balanced") — idempotent bootstrap. Detects Ollama, starts the daemon, pulls the chosen model. Catalog: light = gemma4:e2b (~1.5 GB), balanced = gemma4:e4b (~3 GB), best = gemma4:31b (~19 GB).
ollama_is_installed() / ollama_ensure_running() / ollama_pull_model() / ollama_list_local_models() — composable helpers, never throw, return logical / character.
pedologist_system_prompt(language = "pt-BR" | "en") — canonical persona installed in every chat session. Trained pedometrist (SiBCS 5ª + WRB 2022 + KST 13ed); explicit "NEVER classify, only extract"; per-attribute confidence + source_quote contract.
Default Ollama model lowered from gemma4:e4b (~3 GB) to gemma4:e2b (~1.5 GB) so the package "just works" on a developer laptop after setup_local_vlm("light").
13 new tests / ~30 expectations.

v0.9.65 — `agent_app()` Shiny UI

8 nav_panels:

Tab	Wires
📷 Foto Munsell	`extract_munsell_from_photo()`
📄 PDF / Texto	`extract_horizons_from_pdf()`
📋 Ficha de Campo	`extract_site_from_fieldsheet()`
🌈 Espectros	`fill_from_spectra()` (OSSL local-band library)
📊 Tabela	Editable `DT` for manual correction
🌱 Classificar	`classify_all()` → 3 `bslib::value_box()` cards (WRB / SiBCS / USDA)
🔍 Trace	Per-system trace + provenance browser
💬 Pedometrista	Free-form chat with the local Gemma using `pedologist_system_prompt()`

Persistent 320 px sidebar with provider/model selector, real-time Ollama status badges, "Configurar Gemma local" button (modal progress), language toggle (PT-BR / EN), session reset.

run_agent_app() — launcher, soft-fails on missing Suggests with actionable install.packages() hint.
New vignette v10_agente_pedometrista.Rmd — full walkthrough.
README rewritten: version badge 0.9.62 → 0.9.65, tests 3 760 → 3 821.

Privacy / data sovereignty

By default the agent prefers ollama in the auto-fallback chain. Sensitive photos, fieldsheets with precise geolocation, and internal PDFs never leave the machine. The cloud fallback (Anthropic / OpenAI / Google) only fires when Ollama is not running AND the user has set an API key — an explicit property, not a silent default. Recommended for governmental surveys, indigenous land studies, pre-publication research.

Principle

The LLM never classifies. It only extracts schema-validated JSON with per-attribute confidence and source_quote. The taxonomic key remains 100 % R deterministic, with versioned YAML rules.

Status

R CMD check Status: OK (0 errors / 0 warnings / 0 notes)
Test suite: 3 821 passing / 0 failing / 21 expected skips
17 new tests across test-v0964-setup-local-vlm.R + test-v0965-agent-app.R

CRAN-friendly

Ships the downloader, NOT the weights. The user runs setup_local_vlm() once after install; Ollama caches the model in ~/.ollama/models/. No network calls happen at package install time.

🤖 Generated with Claude Code

Assets 2

06 May 19:17

HugoMachadoRodrigues

v0.9.63

c8be273

v0.9.63 — Brazilian benchmark series (v0.9.55–v0.9.62)

The v0.9.55 → v0.9.63 release series wires the Brazilian SiBCS classifier to the two canonical pedologist-curated corpuses (Embrapa BDsolos and FEBR), validates the classifier against ~9 000 surveyor-labelled profiles, and consolidates the two repositories into a single deduplicated super-dataset.

Highlights

v0.9.55 — load_bdsolos_csv(), inspect_bdsolos_csv(), download_bdsolos() (BDsolos full-export ingestion: ~9 000 perfis from 27 UFs, semicolon-delimited, preamble + 222+ columns).
v0.9.57 — read_febr_pedons() + febr_index_munsell() (FEBR ~10k perfis; ~6 distinct Munsell column conventions; 200 / 249 datasets carry colour data, 36 275 horizons total).
v0.9.58–v0.9.59 — full BDsolos export schema support (DMS coordinates, read.csv2 fallback for malformed UTF-8 in 7 of 27 state CSVs).
v0.9.60 — benchmark_bdsolos_sibcs(): surveyor-reference benchmark mirroring benchmark_lucas_2018() for Brazil. .bdsolos_normalize_ordem() maps modern + pre-1999 legacy SiBCS Ordem names. Smoke test (RJ 100 pedons): 34 % Ordem accuracy, Argissolos 67.6 % recall.
v0.9.61 — R/sibcs-color-tuning.R: replaces the SiBCS subordem first-match-wins rule for colour-driven Ordens (Argissolos / Latossolos / Nitossolos) with a thickness-weighted dominant-colour-in-B rule. Wired into classify_sibcs() between subordem assignment and the v0.9.45 cor a determinar fallback. Benchmark also reports accuracy_subordem over canonical 2-3 letter SiBCS codes.
v0.9.62 — merge_brazilian_pedons(bdsolos, febr, prefer) deduplicates via site$sisb_id (BDsolos Codigo PA ≡ FEBR observacao$sisb_id). RJ overlap: 590 of 722 BDsolos pedons (65 %) match a FEBR sisb_id, naïve concat of 1 606 → after merge 1 016 distinct pedons. summarize_brazilian_overlap() is a dry-run diagnostic.
v0.9.63 — README documents the v0.9.55 → v0.9.62 trajectory; status footer rewritten to merge the Brazilian highlights with the existing USDA / WRB summary.

Tests / status

R CMD check Status: OK (0 errors / 0 warnings / 0 notes)
Test suite: 3 760 passing / 0 failing / 20 expected skips
12 new tests in test-v0962-merge-brazilian.R (28 expectations)
14 new tests in test-v0961-sibcs-color-tuning.R (37 expectations)
10 new tests in test-v0960-bdsolos-benchmark.R (42 expectations)

Per-version detail

See NEWS.md for the full per-release diff.

🤖 Generated with Claude Code

Assets 2

02 May 04:33

HugoMachadoRodrigues

v0.9.23

41c8fc5

soilKey v0.9.23 -- canonical eluvial-illuvial argic (SiBCS +14pp, KSSL Ultisols +12pp)

The "argic clay-increase canonicalisation" release. Single bug fix in test_clay_increase_argic with paper-sized impact across all three classification systems.

Root cause

test_clay_increase_argic (the predicate that gates the argic horizon, the argillic horizon, and every Order / RSG that depends on either) was comparing each candidate horizon's clay only against its immediate predecessor. KST 13ed Ch 3 (argillic horizon, p 4) and WRB 2022 Ch 3.1.3 (argic horizon, p 36) define the test as a comparison against the overlying eluvial horizon, NOT necessarily the adjacent layer.

Profiles where clay rises gradually through a thick A / E / Bw / Bt sequence (e.g. KSSL Hapludalfs with clay 13 -> 15 -> 22 -> 27 -> 31, or many FEBR Argissolos) were being silently rejected because no two adjacent layers passed the +6 pp / 1.4-ratio thresholds, even though the canonical A-vs-Bt jump 13 -> 31 obviously satisfies argic.

Fix

test_clay_increase_argic now evaluates the rule against:

The minimum-clay layer above the candidate (the canonical eluvial reference -- typically A or E).
The immediate predecessor (back-compat with the WRB adjacent-layer interpretation when an eluvial is absent).

Either trigger accepts the candidate. Strictly additive -- no candidate that passed before now fails.

Real-data benchmark impact

Embrapa FEBR (apples-to-apples)

System	v0.9.22	v0.9.23	Δ
SiBCS Order	40.6 %	54.7 %	+14.1 pp
USDA Order	47.6 %	51.1 %	+3.5 pp
WRB Order	32.7 %	33.7 %	+1.0 pp

The SiBCS +14.1 pp jump is the biggest single-version gain in the project to date. Most of the v0.9.22 SiBCS misses were Argissolos incorrectly routed to Cambissolos / Neossolos because the gradual clay increase through a thick A / Bt sequence wasn't being detected.

KSSL + NASIS (n = 998, apples-to-apples)

Order	v0.9.22	v0.9.23	Δ
Vertisols	65.2 %	68.8 %	+3.6 pp
Aridisols	53.1 %	55.4 %	+2.3 pp
Ultisols	26.3 %	38.9 %	+12.6 pp
Alfisols	20.9 %	31.2 %	+10.3 pp
Spodosols	29.9 %	37.9 %	+8.0 pp
Mollisols	21.8 %	22.9 %	+1.1 pp
Inceptisols	47.2 %	41.5 %	-5.7 pp
Entisols	53.1 %	46.9 %	-6.2 pp
Oxisols	60.0 %	60.0 %	(=)
TOTAL Order	32.7 %	36.0 %	+3.3 pp
TOTAL Subgroup	2.4 %	2.7 %	+0.3 pp

The Alfisol / Ultisol / Spodosol gains (+8 to +13 pp each) are where the v0.9.22 -> v0.9.23 fix delivers the most: profiles with gradual A → E → Bt clay sequences now correctly route to argillic-bearing Orders. Inceptisol / Entisol drops (-5 to -6 pp) are correct: profiles previously routed to those catch-all Orders are now properly classified as Alfisols / Ultisols.

Why scientifically defensible

The test changes from

above <- h$clay_pct[i - 1L]   # adjacent only -- BUG

above_min <- min(h$clay_pct[1:(i-1)], na.rm = TRUE)   # canonical eluvial reference
above_adj <- h$clay_pct[i - 1L]                         # adjacent fallback
# Either passes -> candidate accepted.

The min-above reference matches:

KST 13ed Ch 3 p 4: "the increase in clay content with depth must be ... compared to a lighter-textured eluvial horizon above"
WRB 2022 Ch 3.1.3 p 36: "clay percent increases compared to the overlying horizon by ..."

Both canonical sources reference "the overlying eluvial horizon" / "the overlying horizon" without specifying adjacency. Pre-v0.9.23 we were applying a stricter-than-canonical interpretation.

Quality

R CMD check --as-cran with PROJ env: Status: OK (0 ERR / 0 WARN / 0 NOTE)
2 850 testthat expectations passing, 0 failed (no regression -- the new min-above path is strictly additive)
31/31 canonical fixtures still classify correctly to their intended RSG / Order

What's NOT yet done (next priorities)

Subgroup machinery completion -- subgroup top-1 still 2.7 % (n=998). The argic fix lifted Order, but the qualified-subgroup permutations (Cumulic / Pachic / Aquic / Oxyaquic / Mollic / Ultic / etc.) need full coverage across Mollisols + Alfisols + Inceptisols where most KSSL refs sit. Conditional Subgroup (within correctly-Ordered profiles) is now ~ 7.5 %.
EU-LUCAS WRB benchmark -- the bundled ESDBv2 archive ships schema-only Excel files; the actual WRB-coded SGDBE database is locked in autorun.exe (Windows installer). Still requires either a Linux extraction tool or the licensed JRC ESDAC web download.
WoSIS GraphQL refresh -- v0.9.13's 13 % WRB baseline was measured against WoSIS 2024-10. Re-running with the current v0.9.23 deterministic key + NASIS / pediagfeatures features would expose how much of the v0.9.13 → v0.9.23 trajectory is reproducible on the WoSIS sample. Deferred to v0.9.24+.
Brazilian Munsell -- the Embrapa FEBR archive lacks Munsell data, capping SiBCS Subordem benchmark. A NASIS-equivalent for the Brazilian context (IBGE soil-survey volumes, Embrapa BDsolos curated) would unlock Subordem benchmark from ~ 8 % to estimated 25-40 %.

Trajectory v0.9.13 -> v0.9.23 (definitive)

Version	Embrapa USDA	Embrapa WRB	Embrapa SiBCS	KSSL USDA
v0.9.13 (WoSIS)	n/a	13 %	n/a	n/a
v0.9.18	47.6 %	32.7 %	40.6 %	21.4 %
v0.9.20 (NASIS)	47.6 %	32.7 %	40.6 %	32.2 %
v0.9.21 (+ tie-breaker)	47.6 %	32.7 %	40.6 %	33.1 %
v0.9.23 (+ argic canonical)	51.1 %	33.7 %	54.7 %	36.0 %

Assets 2

01 May 21:27

HugoMachadoRodrigues

v0.9.22

15a54e8

soilKey v0.9.22 -- subgroup-level USDA + Subordem-level SiBCS benchmarks

The "deeper-than-Order benchmark" release. Two scientific extensions to the benchmark runner that probe one taxonomy level deeper than the previous Order-only validation.

What's new

benchmark_run_classification gains level = "subgroup" and level = "subordem". Comparison is case-insensitive with qualifier-paren stripping; Subordem truncates the predicted name to its first two tokens to match FEBR-style labels.
load_kssl_pedons_gpkg now extracts samp_taxsubgrp, samp_taxgrtgroup, samp_taxsuborder from the KSSL gpkg into site$reference_usda_subgroup / _grtgroup / _suborder. The benchmark reads reference_usda_subgroup automatically when level = "subgroup".
normalise_kssl_subgroup(x) -- exported helper for arbitrary KSSL-format subgroup string normalisation.

Definitive real-data benchmark (KSSL + NASIS, n = 2 002, apples-to-apples vs v0.9.21)

Level	top-1	CI 95 %
Order	31.3 %	[29.0 %, 33.5 %]
Subgroup	2.1 %	[1.5 %, 2.7 %]

Per-Order Subgroup accuracy (within each reference Order)

Reference Order	n	correct	accuracy
Aridisols	277	14	5.1 % (best)
Ultisols	259	8	3.1 %
Entisols	62	2	3.2 %
Alfisols	417	7	1.7 %
Spodosols	150	2	1.3 %
Inceptisols	243	3	1.2 %
Mollisols	511	5	1.0 %
Vertisols / Oxisols / Andisols	61	0	0 %

The pattern confirms the partial Path C machinery diagnosis: the largest absolute misses are Mollisols (506 misses) and Alfisols (410 misses) where the qualified subgroup permutations (Cumulic / Pachic / Aquic / Oxyaquic / Mollic / Ultic / etc.) are incomplete in the current implementation.

Aridisols at 5.1 % is informative: the Aridic moisture regime is unambiguous from the dryness of the profile, so the Subgroup machinery within Aridisols (Typic / Calcic / Argic / Salic) gets enough of them right.

Conditional Subgroup accuracy (within correctly-Ordered profiles): **~ 7 %**. Examples of correct subgroup hits: typic hapludults, typic dystrudepts, typic calciaquolls, typic endoaquolls, oxyaquic haplorthods, aridic argiustolls.

Embrapa FEBR SiBCS Subordem (n = 128)

Level	n=128
Order	40.6 %
Subordem	7.8 % [3.1, 14.1]

The Subordem drop is dominated by Munsell-colour disagreement (Vermelho / Amarelo / Bruno) on profiles where FEBR records the surveyor's colour judgement but the lab gpkg lacks Munsell. 26 of 57 reference Argissolos are correctly Order'd as Argissolos but classified to a different colour Subordem.

Critical scientific finding -- FEBR Subordem ceiling

FEBR (the open Brazilian soil-data archive used as soilKey's benchmark source) ships SiBCS labels at the 2nd-level (Subordem) maximum -- 31 unique strings total across the 50 485 horizon rows, e.g. "LATOSSOLO VERMELHO", "ARGISSOLO BRUNO-ACINZENTADO". The 5th-level (Familia, Cap 18) is therefore not benchmarkable with FEBR alone.

This release pivots from "Familia validation" (the user's original request) to "Subordem validation" as the deepest level FEBR actually supports. Future Familia validation requires a different reference dataset (IBGE soil-survey volumes, Embrapa BDsolos curated, or similar).

What this benchmark tells us

The Order-level numbers (31.3 % USDA / 40.6 % SiBCS / 32.7 % WRB / 47.6 % USDA Embrapa) remain the most-defensible "soilKey on real data" headline. The Subgroup / Subordem numbers expose the next two failure modes:

Subgroup machinery completeness (USDA): the Path C code paths for non-Typic subgroups (Aquic / Vertic / Oxyaquic / Mollic / Cumulic / Pachic / Inceptic / Ultic) need full coverage across all 12 Orders. Currently only the Typic subgroups consistently fire, with Aridisols subgroups (Calcic / Argic / Salic) being the next-best-covered set.
Munsell-colour assignment (SiBCS): the Vermelho / Amarelo / Bruno discrimination requires reliable Munsell hue/value/chroma data which is sparse in lab-only datasets. NASIS provides 99 % Munsell coverage for KSSL but FEBR has none; NASIS-equivalent for Brazilian context would be an Embrapa survey database we don't yet have access to.

Code

`benchmark_run_classification(level)` -- new values

"order" (default) -- compares cls$rsg_or_order.
"subgroup" (NEW) -- compares cls$name (case-insensitive, qualifier-paren-stripped). For USDA, automatically reads reference_usda_subgroup.
"subordem" (NEW) -- SiBCS 2nd-level. Truncates both reference and prediction to the first two tokens before comparison.

`normalise_kssl_subgroup(x)` (NEW exported)

Lowercases + collapses whitespace in KSSL samp_taxsubgrp strings so "TYPIC HAPLUDALFS" and "Typic Hapludalfs" compare equal.

`load_kssl_pedons_gpkg` -- expanded reference fields

site$reference_usda (Order, unchanged)
site$reference_usda_subgroup (NEW from samp_taxsubgrp)
site$reference_usda_grtgroup (NEW from samp_taxgrtgroup)
site$reference_usda_suborder (NEW from samp_taxsuborder)

Quality

R CMD check --as-cran with PROJ env: Status: OK (0 ERR / 0 WARN / 0 NOTE)
2 850 testthat expectations passing, 0 failed (+8 from new test-benchmark-subgroup-subordem.R)
31/31 canonical fixtures still classify correctly (Order-level)
Embrapa Order benchmark unchanged at 40.6 % (regression-safe)

Trajectory v0.9.13 -> v0.9.22

Version	Embrapa Order	KSSL Order (n=2 000-3 000)	KSSL Subgroup	Embrapa Subordem
v0.9.13 (WoSIS)	13 % WRB	n/a	n/a	n/a
v0.9.18	47.6 / 32.7	21.4 %	n/a	n/a
v0.9.20 (NASIS)	47.6 / 32.7	32.2 %	n/a	n/a
v0.9.21 (+ tie-breaker)	47.6 / 32.7	33.1 % (n=3 218)	n/a	n/a
v0.9.22 (+ deeper levels)	47.6 / 32.7	31.3 % (n=2 002)	2.1 %	7.8 %

Full benchmark reports

Assets 2

01 May 20:31

HugoMachadoRodrigues

v0.9.21

5a14e49

soilKey v0.9.21 -- NASIS pediagfeatures as scientific tie-breaker (Spodosols +16pp)

The "surveyor's diagnostic identification as scientific tie-breaker" release. Wires the NASIS pediagfeatures.featkind table (64 169 records of field-surveyor-identified diagnostic horizons) into the USDA Order gates as a TIE-BREAKER ONLY: when the canonical lab + morphology gate returns passed = NA (insufficient data), the surveyor's identification flips it to TRUE. When the canonical gate returns TRUE / FALSE, the tag is recorded as evidence but does NOT override -- preserving the deterministic-key-on-data invariant.

DEFINITIVE benchmark (KSSL+NASIS apples-to-apples, n = 3 218)

Order	v0.9.19 lab-only	v0.9.20 + NASIS morphology	v0.9.21 + tie-breaker
Spodosols	17.8 % (49/276)	29.0 % (80/276)	38.0 % (105/276)
Vertisols	58.7 % (37/63)	70.8 % (46/65)	73.8 % (48/65)
Mollisols	19.9 % (145/727)	25.0 % (182/727)	25.7 % (187/727)
Inceptisols	23.1 % (107/463)	46.3 % (215/464)	46.3 % (215/464)
Aridisols	42.4 % (189/446)	46.6 % (208/446)	46.6 % (208/446)
Alfisols	21.4 % (142/663)	22.6 % (150/665)	22.6 % (150/665)
Ultisols	21.9 % (90/411)	21.7 % (89/411)	21.7 % (89/411)
Entisols	46.3 % (50/108)	36.1 % (39/108)	35.2 % (38/108)
Oxisols	49.0 % (24/49)	49.0 % (24/49)	49.0 % (24/49)
Histosols	66.7 % (2/3)	66.7 % (2/3)	66.7 % (2/3)
TOTAL	26.0 %	32.2 %	33.1 %

USDA top-1: 33.1 % (CI [31.7 %, 34.6 %], n = 3 218).

Cumulative improvement v0.9.19 -> v0.9.21: +7.1 pp. The Spodosol gain alone is +20.2 pp through v0.9.20 NASIS morphology (17.8 -> 29.0) and v0.9.21 tie-breaker (29.0 -> 38.0).

The Entisol drop (-11.1 pp v0.9.19 -> v0.9.21) is correct: profiles previously falling into the Entisol catch-all because Inceptisol / Spodosol / Vertisol gates couldn't fire (no morphology, no lab oxalate, no surveyor tag) now route to those Orders.

Replication across three samples

The Spodosol per-Order gain replicates cleanly:

Sample	n	Spodosol n	v0.9.20	v0.9.21	Δ
5 000-head	3 218	276	29.0 % (80)	38.0 % (105)	+9.0 pp
3 000-head	2 002	150	26.0 % (39)	42.0 % (63)	+16.0 pp
2 500-head	1 679	139	26.6 % (37)	43.2 % (60)	+16.6 pp

The smaller-n samples show a larger relative gain because the residual NA cases (where the tie-breaker fires) are a higher fraction of the sample. The 5 000-head number is the most precise.

Embrapa benchmark unchanged (USDA 47.6 %, WRB 32.7 %, SiBCS 40.6 %); all 31 canonical fixtures still classify correctly.

What pediagfeatures provides

NASIS pediagfeatures.featkind distribution:

featkind	n	Order it disambiguates
Argillic horizon	13 501	Alfisols / Ultisols
Mollic epipedon	6 860	Mollisols
Cambic horizon	4 970	Inceptisols
Lithic contact	2 193	Entisols (lithic subgroup)
Albic horizon	1 415	Spodosol / Alfisol disambiguation
Spodic horizon	829	Spodosols
Slickensides	519	Vertisols
Andic soil properties	494	Andisols
Histic epipedon	201	Histosols

Code

Internal helpers

.has_nasis_feature(pedon, pattern) -- checks pedon$site$nasis_diagnostic_features (populated by load_kssl_pedons_with_nasis()) for a regex match.
.apply_nasis_tiebreaker(result, pedon, pattern, feature_label) -- applied at the start of each USDA Order gate. If result$passed == NA AND surveyor identified the matching feature, flips passed to TRUE and records provenance. Does NOT override TRUE / FALSE.

USDA Order gates with tie-breaker

Gate	Tie-breaker pattern
`histosol_usda`	Histic / Folistic / Hemic / Sapric / Fibric / Limnic / Coprogenous
`spodosol_usda`	Spodic horizon / Spodic materials / Ortstein / Placic
`andisol_usda`	Andic soil properties / Vitric / Volcanic glass
`vertisol_usda`	Slickensides / Vertic features / Gilgai
`ultisol_usda`	Argillic horizon / Kandic horizon
`mollisol_usda`	Mollic epipedon
`alfisol_usda`	Argillic / Kandic / Natric horizon
`inceptisol_usda`	Cambic horizon

Why scientifically defensible

The tie-breaker fires ONLY when the canonical gate returns NA, i.e., when the deterministic key has insufficient data to decide. In that case, the field surveyor's identification (recorded in NASIS by NRCS pedologists) is the most authoritative source short of re-running the field survey. When chemistry + morphology IS available and conclusive, the canonical gate's TRUE / FALSE stands unmodified -- the tie-breaker is strictly additive on missing-data cases.

Package-level invariant preserved: deterministic key on lab + morphology data always wins; surveyor tag is fallback when key is silent.

Quality

R CMD check --as-cran with PROJ env: Status: OK (0 ERR / 0 WARN / 0 NOTE)
2 842 testthat expectations passing, 0 failed (+13 new tie-breaker contract tests)
31/31 canonical fixtures still classify correctly
No new dependencies (uses existing Suggests: DBI + RSQLite)

EU-LUCAS update

The new EU-LUCAS files (EU_LUCAS_2022_updated.xlsx + l2022_survey_cop_radpoly_attr.gpkg) are the LUCAS land-cover / Copernicus radial-polygon products. They still do NOT have WRB classifications or full lab data. The required dataset is the JRC ESDAC LUCAS-2018-Soil module (LUCAS_TOPSOIL_2018.csv + ESDB join with WRB labels), which is a separate ESDAC release available at https://esdac.jrc.ec.europa.eu/projects/lucas (download the "LUCAS 2018 Soil module" not the 2022 land-cover update).

Trajectory v0.9.13 -> v0.9.21

Version	Embrapa USDA	Embrapa WRB	KSSL USDA
v0.9.13 (WoSIS)	n/a	13 %	n/a
v0.9.16	34.0 %	21.6 %	n/a
v0.9.17	46.4 %	25.5 %	n/a
v0.9.18	47.6 %	32.7 %	21.4 %
v0.9.19	47.6 %	32.7 %	26.0 % (n=3 213)
v0.9.20 (NASIS lab+morphology)	47.6 %	32.7 %	32.2 % (n=3 218)
v0.9.21 (NASIS + tie-breaker)	47.6 %	32.7 %	33.1 % (n=3 218, CI [31.7, 34.6])

Assets 2

Releases: HugoMachadoRodrigues/soilKey

soilKey lazy-fetch data (v0.9.94)

Contents

Usage

Uh oh!

v0.9.71 — Phase 2 done: BDsolos fixtures + structured outputs + polish

(A) 8 BDsolos hard fixtures

(B) ellmer `chat_structured()` bridge

(C) Production polish

Tests

Status

Uh oh!

v0.9.68 — Phase 2: few-shot demonstrations + variance characterisation

What's shipped

Few-shot prompts (3 new)

use_fewshot parameter

n_repeats parameter

Harder bundled fixture

Honest measurement (this is not a marketing release)

Status

Uh oh!

v0.9.67 — Corrigendum: gemma4:e2b on-disk size + e2b vs 8B head-to-head

What was wrong

Corrected catalog

New head-to-head benchmark (gemma4:e2b vs gemma4 8B)

Status

Uh oh!

v0.9.66 — Phase 1: VLM extraction benchmark

What's new

benchmark_vlm_extraction()

Bundled fixtures

make_synthetic_horizons_fixture(pedon)

.onAttach() local-VLM hint

extract_site_from_text.md prompt

Baseline measured (gemma4 8B local, MacBook M1)

Status

Uh oh!

v0.9.65 — Agente Pedometrista (bslib Shiny + local Gemma 4)

Quick start

v0.9.64 — setup_local_vlm() + Ollama lifecycle + pedologist persona

v0.9.65 — agent_app() Shiny UI

Privacy / data sovereignty

Principle

Status

CRAN-friendly

Uh oh!

v0.9.63 — Brazilian benchmark series (v0.9.55–v0.9.62)

Highlights

Tests / status

Per-version detail

Uh oh!

soilKey v0.9.23 -- canonical eluvial-illuvial argic (SiBCS +14pp, KSSL Ultisols +12pp)

Root cause

Fix

Real-data benchmark impact

Embrapa FEBR (apples-to-apples)

KSSL + NASIS (n = 998, apples-to-apples)

Why scientifically defensible

Quality

What's NOT yet done (next priorities)

Trajectory v0.9.13 -> v0.9.23 (definitive)

Uh oh!

soilKey v0.9.22 -- subgroup-level USDA + Subordem-level SiBCS benchmarks

What's new

Definitive real-data benchmark (KSSL + NASIS, n = 2 002, apples-to-apples vs v0.9.21)

Per-Order Subgroup accuracy (within each reference Order)

Embrapa FEBR SiBCS Subordem (n = 128)

Critical scientific finding -- FEBR Subordem ceiling

What this benchmark tells us

Code

benchmark_run_classification(level) -- new values

normalise_kssl_subgroup(x) (NEW exported)

load_kssl_pedons_gpkg -- expanded reference fields

Quality

Trajectory v0.9.13 -> v0.9.22

Full benchmark reports

Uh oh!

soilKey v0.9.21 -- NASIS pediagfeatures as scientific tie-breaker (Spodosols +16pp)

DEFINITIVE benchmark (KSSL+NASIS apples-to-apples, n = 3 218)

Replication across three samples

`use_fewshot` parameter

`n_repeats` parameter

`benchmark_vlm_extraction()`

`make_synthetic_horizons_fixture(pedon)`

`.onAttach()` local-VLM hint

`extract_site_from_text.md` prompt

v0.9.64 — `setup_local_vlm()` + Ollama lifecycle + pedologist persona

v0.9.65 — `agent_app()` Shiny UI

`benchmark_run_classification(level)` -- new values

`normalise_kssl_subgroup(x)` (NEW exported)

`load_kssl_pedons_gpkg` -- expanded reference fields