Releases: HugoMachadoRodrigues/soilKey
soilKey lazy-fetch data (v0.9.94)
Benchmark caches downloaded on demand by load_*_sample() functions and download_extdata_cache() since soilKey v0.9.94.
Contents
| File | n pedons | Source | Size |
|---|---|---|---|
afsp_sample.rds |
120 | ISRIC Africa Soil Profiles Database v1.2 | 1.2 MB |
kssl_sample.rds |
99 | NCSS Lab Data Mart (KSSL gpkg) | 1.0 MB |
kssl_nasis_sample.rds |
99 | NCSS Lab Data Mart + NASIS Morphological | 1.0 MB |
wosis_stratified_sample.rds |
130 | ISRIC WoSIS GraphQL (5 per RSG × 26 RSGs) | 1.3 MB |
Usage
# Eager prefetch all four caches into the user cache directory:
soilKey::download_extdata_cache("all")
# Or download lazily on first call:
length(soilKey::load_afsp_sample()$pedons)
length(soilKey::load_kssl_sample()$pedons)
length(soilKey::load_kssl_nasis_sample()$pedons)
length(soilKey::load_wosis_stratified_sample()$pedons)The cache directory is tools::R_user_dir("soilKey", "data") (typically ~/Library/Application Support/.../soilKey/data on macOS, ~/.local/share/.../soilKey/data on Linux, %LOCALAPPDATA%/.../soilKey/data on Windows).
These files are under the same MIT license as the soilKey R package; the underlying datasets retain their respective upstream licenses (ISRIC AfSP / WoSIS public-domain, NCSS Lab Data Mart public-domain US Federal data).
v0.9.71 — Phase 2 done: BDsolos fixtures + structured outputs + polish
Bundles three coherent improvements that close out the Phase 2 roadmap.
(A) 8 BDsolos hard fixtures
Generated via `make_synthetic_horizons_fixture()` from real RJ pedons selected by SiBCS Ordem (Argissolo, Cambissolo, Chernossolo, Espodossolo, Gleissolo, Latossolo, Neossolo, Planossolo). Each fixture is a real BDsolos pedon's full horizon table rendered as Markdown — non-toy, multi-horizon, mixed Munsell úmida/seca, varied attribute coverage.
Reproduce locally:
```r
benchmark_vlm_extraction(
providers = list(gemma_e2b = list(name = "ollama", model = "gemma4:e2b")),
tasks = "horizons",
use_fewshot = TRUE,
n_repeats = 3L
)$summary
```
(8 fixtures × 3 reps × ~30 s = 12 min on a laptop CPU. Empirical numbers from a fully-completed run will land in a follow-up release.)
(B) ellmer `chat_structured()` bridge
- `vlm_type_from_soilkey_schema(name)` — wraps `ellmer::type_from_schema()` reading `inst/schemas/.json` directly.
- `validate_or_retry(..., use_structured = TRUE)` — short-circuits the chat-and-parse-and-retry loop when the provider supports it. Provider receives the ellmer type tree built from the soilKey schema and returns a structurally-valid R list directly. Removes the entire class of "model returned prose / wrong shape" failures at the protocol level (Anthropic tool calls / OpenAI `response_format = json_schema` / Ollama 0.5+ `format = json_schema` / Gemini structured output).
- All extractors (`extract_horizons_from_pdf`, `extract_munsell_from_photo`, `extract_site_from_fieldsheet`) and `benchmark_vlm_extraction()` accept `use_structured = FALSE` (default for back-compat).
(C) Production polish
- `extract_horizons_from_pdf()` — `cli::cli_progress_bar()` for multi-chunk PDFs (no-op for single-chunk, the common case).
- `agent_app()` — sidebar adds "Estratégia de extração" section with checkboxes for `use_fewshot` (default TRUE) and `use_structured` (default FALSE). Both propagate to every `extract_*()` call inside the app.
- Model preset labels corrected to v0.9.67 measured sizes (`light` = ~6.7 GB, `balanced` = ~8 GB, `best` = ~19 GB).
Tests
20 new tests / ~45 expectations in `test-v0970-structured-outputs.R`:
- type builder (`vlm_type_from_soilkey_schema`) input validation + ellmer integration
- capability probe (`.provider_supports_structured`) for ellmer Chat / Mock / NULL
- `validate_or_retry` structured fast path (skip parse + retry when provider supports)
- fallback path (use_structured=TRUE on Mock falls through to legacy loop)
- parameter propagation through the entire `extract_*()` family
Total: 3 888 passing / 0 failing / 21 skipped.
Status
- `R CMD check` Status: OK (0 errors / 0 warnings / 0 notes)
- 8 BDsolos fixtures (1 per SiBCS Ordem) + structured-output infra + polish
- 20 new tests
- README + NEWS + status footer updated
🤖 Generated with Claude Code
v0.9.68 — Phase 2: few-shot demonstrations + variance characterisation
Phase 2 of the local-Gemma roadmap. Adds schema-correct worked-example prompts for the 3 extraction tasks, opt-in use_fewshot parameter, n_repeats for variance characterisation, and a harder bundled fixture.
What's shipped
Few-shot prompts (3 new)
inst/prompts/extract_horizons_fewshot.md— 2 worked examples in the schema-correct mixed shape:top_cm/bottom_cm/designation/boundary_*are RAW values;munsell_moist/munsell_dryare SINGLE wrapped objects holdinghue + value + chroma + confidence + source_quote; everything else (clay_pct,ph_h2o, etc.) is wrapped{value, confidence, source_quote}.inst/prompts/extract_site_from_text_fewshot.md— 2 PT-BR + EN examples;id/crsraw, everything else wrapped; country inferred from state.inst/prompts/extract_munsell_from_photo_fewshot.md— 2 examples (with / without Munsell card; confidence calibration baked into the demos).
use_fewshot parameter
Opt-in on extract_horizons_from_pdf(), extract_munsell_from_photo(), and benchmark_vlm_extraction(). Default TRUE from v0.9.68. Set FALSE to run the bare-instructions baseline for an A/B.
n_repeats parameter
New on benchmark_vlm_extraction(). Runs each (provider × task × fixture) cell N times. Summary reports metric_*_mean AND metric_*_sd. Required to distinguish real lift from stochastic LLM noise.
Harder bundled fixture
perfil_BA_chernossolo_messy.{txt,golden.json} — 4-horizon Chernossolo Argilúvico Carbonático from a real Bahia survey: PT-BR comma decimal pH = 6,8, UTM coordinates noted then converted, mixed Munsell úmida/seca, CaCO3 equivalents. Smoke result with gemma4:e2b + few-shot: precision = 1.00, recall = 1.00, attr_match = 0.79.
Honest measurement (this is not a marketing release)
Few-shot did NOT move the metrics on the 4 simple bundled fixtures because vanilla gemma4:e2b already nails them:
| Task | Fixture | Baseline | Few-shot | Δ |
|---|---|---|---|---|
| horizons | Latossolo MG | 1.00 / 1.00 / 1.00 | 1.00 / 1.00 / 1.00 | 0 |
| horizons | Argissolo RJ | 1.00 / 1.00 / 1.00 | 1.00 / 1.00 / 1.00 | 0 |
| site | Ficha MG | 0.79 / 1.00 / 0.79 | 0.79 / 1.00 / 0.79 | 0 |
| site | Ficha RJ | 0.80 / 0.92 / 0.80 | 0.80 / 0.92 / 0.80 | 0 |
The 50% ok-rate observed in v0.9.66 was stochastic variance, not a real failure mode — which is exactly what the new n_repeats parameter exposes. Few-shot does not regress quality and the harder Chernossolo BA fixture confirms the pipeline handles non-toy PT-BR profiles. Real lift will surface only on harder fixtures or smaller models — not on the existing toy suite.
Status
R CMD checkStatus: OK (0 errors / 0 warnings / 0 notes)- Test suite: 3 868 passing / 0 failing / 21 skipped (unchanged; few-shot is opt-in)
- 3 new few-shot prompts + 1 new harder fixture
n_repeatsAPI +metric_*_sdcolumns in benchmark summary
🤖 Generated with Claude Code
v0.9.67 — Corrigendum: gemma4:e2b on-disk size + e2b vs 8B head-to-head
Doc + measurement corrigendum. No code logic changes.
What was wrong
Docs in v0.9.64 → v0.9.66 said gemma4:e2b was "~1.5 GB", which is the bare 2B-parameter weight at 4-bit quantisation. The actual on-disk footprint is ~6.7 GB: the multimodal Gemma 4 builds bundle a vision encoder + tokenizers that add ~5 GB above the bare parameter weights. Confirmed locally after the v0.9.66 pull completed.
Corrected catalog
| Preset | Tag | On-disk |
|---|---|---|
light |
gemma4:e2b |
~6.7 GB (was ~1.5 GB) |
balanced |
gemma4:e4b |
~8 GB (approx) |
best |
gemma4:31b |
~19 GB |
| (8B default alias) | gemma4 (gemma4:latest) |
~9 GB |
Files updated: R/setup-local-vlm.R, R/zzz.R, R/vlm-providers.R, vignettes/v10_agente_pedometrista.Rmd, vignettes/v11_vlm_extraction_benchmark.Rmd, README.md.
New head-to-head benchmark (gemma4:e2b vs gemma4 8B)
Re-ran benchmark_vlm_extraction() with both sizes on the four bundled text fixtures:
| Task | Fixture | e2b |
gemma4 (8B) |
|---|---|---|---|
| horizons | Latossolo MG | 1.00 / 1.00 / 1.00 | 1.00 / 1.00 / 1.00 |
| horizons | Argissolo RJ | 1.00 / 1.00 / 1.00 | 1.00 / 1.00 / 1.00 |
| site | Ficha MG | ✓ (IoU 0.71, value-acc 1.00) | ✗ (JSON validation error) |
| site | Ficha RJ | ✗ (JSON validation error) | ✗ (JSON validation error) |
Key reads:
- Horizons (text) is solved at both sizes. The 2B model matches the 8B model on clean PT-BR profile descriptions — this locks in
gemma4:e2bas the soilKey default for the agent app. - Site (text) is unstable on both sizes (50 %
okrate at e2b; 0 % at 8B in this 2-fixture sample). The failures are JSON validation errors, not wrong content. When extraction succeeds, value-accuracy on matched fields is 100 % — the model knows the right answer, it just doesn't always return it in valid JSON shape.
This is exactly what Phase 2 (few-shot demonstration pairs in the prompt) targets: insert 2-3 examples of correctly-shaped JSON before each call to discipline the model into the schema. No GPU required.
Status
R CMD checkStatus: OK (0 errors / 0 warnings / 0 notes)- Test suite: 3 868 passing / 0 failing / 21 skipped (unchanged from v0.9.66)
- No API changes; no tests changed
- README + 2 vignettes refreshed
🤖 Generated with Claude Code
v0.9.66 — Phase 1: VLM extraction benchmark
A measurable baseline for the local Gemma 4 stack — the input we needed before deciding whether to invest in few-shot demos (Phase 2) or LoRA fine-tuning (Phase 3).
What's new
benchmark_vlm_extraction()
Provider-agnostic harness over 3 tasks × per-task metrics:
| Task | Input | Metrics |
|---|---|---|
horizons |
Markdown / text profile description | precision + recall + per-attribute match (higher better) |
site |
Field-sheet text | IoU + value-accuracy + recall (higher better) |
munsell |
Profile photo (with Munsell card) | mean Nickerson Color Difference (lower better) |
Returns long-format predictions + summary data frames. Accepts MockVLMProvider for unit tests.
Bundled fixtures
horizons/: 4-horizon Argissolo RJ + 4-horizon Latossolo MG (paired text + golden JSON)site/: 2 Brazilian field-sheet text fixtures + golden JSONmunsell/: README format spec — users supply their own photo fixtures (CRAN size + licence policy forbids shipping photos)
make_synthetic_horizons_fixture(pedon)
Renders any PedonRecord back into a Markdown profile description and emits the structured horizons as the golden answer. Lets you scale the horizons fixture set from BDsolos / FEBR / KSSL.
.onAttach() local-VLM hint
CRAN-compliant interactive prompt: detects Ollama state, suggests setup_local_vlm("light") when gemma4:e2b is missing. Auto-pull only with explicit opt-in (options(soilKey.auto_setup_vlm = TRUE) or env var SOILKEY_AUTO_SETUP_VLM=1). Suppress all hints with options(soilKey.suggest_local_vlm = FALSE).
extract_site_from_text.md prompt
Text-mode companion to the image-mode site prompt — required because the original explicitly said "Supplied as an image content block" and Gemma returned all-null when fed text.
Baseline measured (gemma4 8B local, MacBook M1)
| Task | Fixture | precision / IoU | recall / value-acc | attr-match |
|---|---|---|---|---|
horizons |
Latossolo MG | 1.00 | 1.00 | 1.00 |
horizons |
Argissolo RJ | 1.00 | 1.00 | 1.00 |
site |
Ficha MG | 0.79 | 1.00 | 0.79 |
site |
Ficha RJ | 0.87 | 0.92 | 0.87 |
Read: horizons extraction is solved (vanilla Gemma 4 + the pedologist_system_prompt() persona is enough for clean PT-BR profile descriptions). Site extraction is ~83 % IoU with ~96 % value-accuracy on matched fields — gaps are inferred fields (e.g. country: BR from a Brazilian state) that the 8B model misses but a 32B/Claude would catch.
This baseline is the input for the Phase 2 / Phase 3 decision:
- Phase 2 (few-shot): inject 2-3 demonstration pairs per call. ~2 days human work; no GPU. Targets the site-task gap.
- Phase 3 (LoRA): adapter fine-tune on (input, golden) pairs from BDsolos + FEBR. Needs ~1k labelled pairs and ~6 h on H100. Only justified if Phase 2 plateaus.
Status
R CMD checkStatus: OK (0 errors / 0 warnings / 0 notes)- Test suite: 3 868 passing / 0 failing / 21 expected skips
- 47 new tests / ~70 expectations
- Vinheta:
v11_vlm_extraction_benchmark
🤖 Generated with Claude Code
v0.9.65 — Agente Pedometrista (bslib Shiny + local Gemma 4)
A modern bslib-themed Shiny UI that wires the v0.9.64 local Gemma 4 stack to the deterministic taxonomic key. Photo, PDF, field-sheet image and Vis-NIR spectrum each become a one-click extraction tab; the result is classified across WRB 2022 + SiBCS 5ª ed. + USDA Soil Taxonomy 13ed in the same session.
Quick start
# One-call setup of the local stack (downloads Gemma 4 e2b, ~1.5 GB):
soilKey::setup_local_vlm("light")
# Launch the agent:
soilKey::run_agent_app()v0.9.64 — setup_local_vlm() + Ollama lifecycle + pedologist persona
setup_local_vlm(model = "balanced")— idempotent bootstrap. Detects Ollama, starts the daemon, pulls the chosen model. Catalog:light=gemma4:e2b(~1.5 GB),balanced=gemma4:e4b(~3 GB),best=gemma4:31b(~19 GB).ollama_is_installed()/ollama_ensure_running()/ollama_pull_model()/ollama_list_local_models()— composable helpers, never throw, return logical / character.pedologist_system_prompt(language = "pt-BR" | "en")— canonical persona installed in every chat session. Trained pedometrist (SiBCS 5ª + WRB 2022 + KST 13ed); explicit "NEVER classify, only extract"; per-attributeconfidence+source_quotecontract.- Default Ollama model lowered from
gemma4:e4b(~3 GB) togemma4:e2b(~1.5 GB) so the package "just works" on a developer laptop aftersetup_local_vlm("light"). - 13 new tests / ~30 expectations.
v0.9.65 — agent_app() Shiny UI
8 nav_panels:
| Tab | Wires |
|---|---|
| 📷 Foto Munsell | extract_munsell_from_photo() |
| 📄 PDF / Texto | extract_horizons_from_pdf() |
| 📋 Ficha de Campo | extract_site_from_fieldsheet() |
| 🌈 Espectros | fill_from_spectra() (OSSL local-band library) |
| 📊 Tabela | Editable DT for manual correction |
| 🌱 Classificar | classify_all() → 3 bslib::value_box() cards (WRB / SiBCS / USDA) |
| 🔍 Trace | Per-system trace + provenance browser |
| 💬 Pedometrista | Free-form chat with the local Gemma using pedologist_system_prompt() |
Persistent 320 px sidebar with provider/model selector, real-time Ollama status badges, "Configurar Gemma local" button (modal progress), language toggle (PT-BR / EN), session reset.
run_agent_app()— launcher, soft-fails on missing Suggests with actionableinstall.packages()hint.- New vignette
v10_agente_pedometrista.Rmd— full walkthrough. - README rewritten: version badge 0.9.62 → 0.9.65, tests 3 760 → 3 821.
Privacy / data sovereignty
By default the agent prefers ollama in the auto-fallback chain. Sensitive photos, fieldsheets with precise geolocation, and internal PDFs never leave the machine. The cloud fallback (Anthropic / OpenAI / Google) only fires when Ollama is not running AND the user has set an API key — an explicit property, not a silent default. Recommended for governmental surveys, indigenous land studies, pre-publication research.
Principle
The LLM never classifies. It only extracts schema-validated JSON with per-attribute confidence and source_quote. The taxonomic key remains 100 % R deterministic, with versioned YAML rules.
Status
R CMD checkStatus: OK (0 errors / 0 warnings / 0 notes)- Test suite: 3 821 passing / 0 failing / 21 expected skips
- 17 new tests across
test-v0964-setup-local-vlm.R+test-v0965-agent-app.R
CRAN-friendly
Ships the downloader, NOT the weights. The user runs setup_local_vlm() once after install; Ollama caches the model in ~/.ollama/models/. No network calls happen at package install time.
🤖 Generated with Claude Code
v0.9.63 — Brazilian benchmark series (v0.9.55–v0.9.62)
The v0.9.55 → v0.9.63 release series wires the Brazilian SiBCS classifier to the two canonical pedologist-curated corpuses (Embrapa BDsolos and FEBR), validates the classifier against ~9 000 surveyor-labelled profiles, and consolidates the two repositories into a single deduplicated super-dataset.
Highlights
- v0.9.55 —
load_bdsolos_csv(),inspect_bdsolos_csv(),download_bdsolos()(BDsolos full-export ingestion: ~9 000 perfis from 27 UFs, semicolon-delimited, preamble + 222+ columns). - v0.9.57 —
read_febr_pedons()+febr_index_munsell()(FEBR ~10k perfis; ~6 distinct Munsell column conventions; 200 / 249 datasets carry colour data, 36 275 horizons total). - v0.9.58–v0.9.59 — full BDsolos export schema support (DMS coordinates,
read.csv2fallback for malformed UTF-8 in 7 of 27 state CSVs). - v0.9.60 —
benchmark_bdsolos_sibcs(): surveyor-reference benchmark mirroringbenchmark_lucas_2018()for Brazil..bdsolos_normalize_ordem()maps modern + pre-1999 legacy SiBCS Ordem names. Smoke test (RJ 100 pedons): 34 % Ordem accuracy, Argissolos 67.6 % recall. - v0.9.61 —
R/sibcs-color-tuning.R: replaces the SiBCS subordem first-match-wins rule for colour-driven Ordens (Argissolos / Latossolos / Nitossolos) with a thickness-weighted dominant-colour-in-B rule. Wired intoclassify_sibcs()between subordem assignment and the v0.9.45 cor a determinar fallback. Benchmark also reportsaccuracy_subordemover canonical 2-3 letter SiBCS codes. - v0.9.62 —
merge_brazilian_pedons(bdsolos, febr, prefer)deduplicates viasite$sisb_id(BDsolosCodigo PA≡ FEBRobservacao$sisb_id). RJ overlap: 590 of 722 BDsolos pedons (65 %) match a FEBR sisb_id, naïve concat of 1 606 → after merge 1 016 distinct pedons.summarize_brazilian_overlap()is a dry-run diagnostic. - v0.9.63 — README documents the v0.9.55 → v0.9.62 trajectory; status footer rewritten to merge the Brazilian highlights with the existing USDA / WRB summary.
Tests / status
R CMD checkStatus: OK (0 errors / 0 warnings / 0 notes)- Test suite: 3 760 passing / 0 failing / 20 expected skips
- 12 new tests in
test-v0962-merge-brazilian.R(28 expectations) - 14 new tests in
test-v0961-sibcs-color-tuning.R(37 expectations) - 10 new tests in
test-v0960-bdsolos-benchmark.R(42 expectations)
Per-version detail
See NEWS.md for the full per-release diff.
🤖 Generated with Claude Code
soilKey v0.9.23 -- canonical eluvial-illuvial argic (SiBCS +14pp, KSSL Ultisols +12pp)
The "argic clay-increase canonicalisation" release. Single bug fix in test_clay_increase_argic with paper-sized impact across all three classification systems.
Root cause
test_clay_increase_argic (the predicate that gates the argic horizon, the argillic horizon, and every Order / RSG that depends on either) was comparing each candidate horizon's clay only against its immediate predecessor. KST 13ed Ch 3 (argillic horizon, p 4) and WRB 2022 Ch 3.1.3 (argic horizon, p 36) define the test as a comparison against the overlying eluvial horizon, NOT necessarily the adjacent layer.
Profiles where clay rises gradually through a thick A / E / Bw / Bt sequence (e.g. KSSL Hapludalfs with clay 13 -> 15 -> 22 -> 27 -> 31, or many FEBR Argissolos) were being silently rejected because no two adjacent layers passed the +6 pp / 1.4-ratio thresholds, even though the canonical A-vs-Bt jump 13 -> 31 obviously satisfies argic.
Fix
test_clay_increase_argic now evaluates the rule against:
- The minimum-clay layer above the candidate (the canonical eluvial reference -- typically A or E).
- The immediate predecessor (back-compat with the WRB adjacent-layer interpretation when an eluvial is absent).
Either trigger accepts the candidate. Strictly additive -- no candidate that passed before now fails.
Real-data benchmark impact
Embrapa FEBR (apples-to-apples)
| System | v0.9.22 | v0.9.23 | Δ |
|---|---|---|---|
| SiBCS Order | 40.6 % | 54.7 % | +14.1 pp |
| USDA Order | 47.6 % | 51.1 % | +3.5 pp |
| WRB Order | 32.7 % | 33.7 % | +1.0 pp |
The SiBCS +14.1 pp jump is the biggest single-version gain in the project to date. Most of the v0.9.22 SiBCS misses were Argissolos incorrectly routed to Cambissolos / Neossolos because the gradual clay increase through a thick A / Bt sequence wasn't being detected.
KSSL + NASIS (n = 998, apples-to-apples)
| Order | v0.9.22 | v0.9.23 | Δ |
|---|---|---|---|
| Vertisols | 65.2 % | 68.8 % | +3.6 pp |
| Aridisols | 53.1 % | 55.4 % | +2.3 pp |
| Ultisols | 26.3 % | 38.9 % | +12.6 pp |
| Alfisols | 20.9 % | 31.2 % | +10.3 pp |
| Spodosols | 29.9 % | 37.9 % | +8.0 pp |
| Mollisols | 21.8 % | 22.9 % | +1.1 pp |
| Inceptisols | 47.2 % | 41.5 % | -5.7 pp |
| Entisols | 53.1 % | 46.9 % | -6.2 pp |
| Oxisols | 60.0 % | 60.0 % | (=) |
| TOTAL Order | 32.7 % | 36.0 % | +3.3 pp |
| TOTAL Subgroup | 2.4 % | 2.7 % | +0.3 pp |
The Alfisol / Ultisol / Spodosol gains (+8 to +13 pp each) are where the v0.9.22 -> v0.9.23 fix delivers the most: profiles with gradual A → E → Bt clay sequences now correctly route to argillic-bearing Orders. Inceptisol / Entisol drops (-5 to -6 pp) are correct: profiles previously routed to those catch-all Orders are now properly classified as Alfisols / Ultisols.
Why scientifically defensible
The test changes from
above <- h$clay_pct[i - 1L] # adjacent only -- BUGto
above_min <- min(h$clay_pct[1:(i-1)], na.rm = TRUE) # canonical eluvial reference
above_adj <- h$clay_pct[i - 1L] # adjacent fallback
# Either passes -> candidate accepted.The min-above reference matches:
- KST 13ed Ch 3 p 4: "the increase in clay content with depth must be ... compared to a lighter-textured eluvial horizon above"
- WRB 2022 Ch 3.1.3 p 36: "clay percent increases compared to the overlying horizon by ..."
Both canonical sources reference "the overlying eluvial horizon" / "the overlying horizon" without specifying adjacency. Pre-v0.9.23 we were applying a stricter-than-canonical interpretation.
Quality
R CMD check --as-cranwith PROJ env: Status: OK (0 ERR / 0 WARN / 0 NOTE)- 2 850 testthat expectations passing, 0 failed (no regression -- the new min-above path is strictly additive)
- 31/31 canonical fixtures still classify correctly to their intended RSG / Order
What's NOT yet done (next priorities)
-
Subgroup machinery completion -- subgroup top-1 still 2.7 % (n=998). The argic fix lifted Order, but the qualified-subgroup permutations (Cumulic / Pachic / Aquic / Oxyaquic / Mollic / Ultic / etc.) need full coverage across Mollisols + Alfisols + Inceptisols where most KSSL refs sit. Conditional Subgroup (within correctly-Ordered profiles) is now ~ 7.5 %.
-
EU-LUCAS WRB benchmark -- the bundled ESDBv2 archive ships schema-only Excel files; the actual WRB-coded SGDBE database is locked in
autorun.exe(Windows installer). Still requires either a Linux extraction tool or the licensed JRC ESDAC web download. -
WoSIS GraphQL refresh -- v0.9.13's 13 % WRB baseline was measured against WoSIS 2024-10. Re-running with the current v0.9.23 deterministic key + NASIS / pediagfeatures features would expose how much of the v0.9.13 → v0.9.23 trajectory is reproducible on the WoSIS sample. Deferred to v0.9.24+.
-
Brazilian Munsell -- the Embrapa FEBR archive lacks Munsell data, capping SiBCS Subordem benchmark. A NASIS-equivalent for the Brazilian context (IBGE soil-survey volumes, Embrapa BDsolos curated) would unlock Subordem benchmark from ~ 8 % to estimated 25-40 %.
Trajectory v0.9.13 -> v0.9.23 (definitive)
| Version | Embrapa USDA | Embrapa WRB | Embrapa SiBCS | KSSL USDA |
|---|---|---|---|---|
| v0.9.13 (WoSIS) | n/a | 13 % | n/a | n/a |
| v0.9.18 | 47.6 % | 32.7 % | 40.6 % | 21.4 % |
| v0.9.20 (NASIS) | 47.6 % | 32.7 % | 40.6 % | 32.2 % |
| v0.9.21 (+ tie-breaker) | 47.6 % | 32.7 % | 40.6 % | 33.1 % |
| v0.9.23 (+ argic canonical) | 51.1 % | 33.7 % | 54.7 % | 36.0 % |
soilKey v0.9.22 -- subgroup-level USDA + Subordem-level SiBCS benchmarks
The "deeper-than-Order benchmark" release. Two scientific extensions to the benchmark runner that probe one taxonomy level deeper than the previous Order-only validation.
What's new
-
benchmark_run_classificationgainslevel = "subgroup"andlevel = "subordem". Comparison is case-insensitive with qualifier-paren stripping; Subordem truncates the predicted name to its first two tokens to match FEBR-style labels. -
load_kssl_pedons_gpkgnow extractssamp_taxsubgrp,samp_taxgrtgroup,samp_taxsuborderfrom the KSSL gpkg intosite$reference_usda_subgroup/_grtgroup/_suborder. The benchmark readsreference_usda_subgroupautomatically whenlevel = "subgroup". -
normalise_kssl_subgroup(x)-- exported helper for arbitrary KSSL-format subgroup string normalisation.
Definitive real-data benchmark (KSSL + NASIS, n = 2 002, apples-to-apples vs v0.9.21)
| Level | top-1 | CI 95 % |
|---|---|---|
| Order | 31.3 % | [29.0 %, 33.5 %] |
| Subgroup | 2.1 % | [1.5 %, 2.7 %] |
Per-Order Subgroup accuracy (within each reference Order)
| Reference Order | n | correct | accuracy |
|---|---|---|---|
| Aridisols | 277 | 14 | 5.1 % (best) |
| Ultisols | 259 | 8 | 3.1 % |
| Entisols | 62 | 2 | 3.2 % |
| Alfisols | 417 | 7 | 1.7 % |
| Spodosols | 150 | 2 | 1.3 % |
| Inceptisols | 243 | 3 | 1.2 % |
| Mollisols | 511 | 5 | 1.0 % |
| Vertisols / Oxisols / Andisols | 61 | 0 | 0 % |
The pattern confirms the partial Path C machinery diagnosis: the largest absolute misses are Mollisols (506 misses) and Alfisols (410 misses) where the qualified subgroup permutations (Cumulic / Pachic / Aquic / Oxyaquic / Mollic / Ultic / etc.) are incomplete in the current implementation.
Aridisols at 5.1 % is informative: the Aridic moisture regime is unambiguous from the dryness of the profile, so the Subgroup machinery within Aridisols (Typic / Calcic / Argic / Salic) gets enough of them right.
Conditional Subgroup accuracy (within correctly-Ordered profiles): **~ 7 %**. Examples of correct subgroup hits: typic hapludults, typic dystrudepts, typic calciaquolls, typic endoaquolls, oxyaquic haplorthods, aridic argiustolls.
Embrapa FEBR SiBCS Subordem (n = 128)
| Level | n=128 |
|---|---|
| Order | 40.6 % |
| Subordem | 7.8 % [3.1, 14.1] |
The Subordem drop is dominated by Munsell-colour disagreement (Vermelho / Amarelo / Bruno) on profiles where FEBR records the surveyor's colour judgement but the lab gpkg lacks Munsell. 26 of 57 reference Argissolos are correctly Order'd as Argissolos but classified to a different colour Subordem.
Critical scientific finding -- FEBR Subordem ceiling
FEBR (the open Brazilian soil-data archive used as soilKey's benchmark source) ships SiBCS labels at the 2nd-level (Subordem) maximum -- 31 unique strings total across the 50 485 horizon rows, e.g. "LATOSSOLO VERMELHO", "ARGISSOLO BRUNO-ACINZENTADO". The 5th-level (Familia, Cap 18) is therefore not benchmarkable with FEBR alone.
This release pivots from "Familia validation" (the user's original request) to "Subordem validation" as the deepest level FEBR actually supports. Future Familia validation requires a different reference dataset (IBGE soil-survey volumes, Embrapa BDsolos curated, or similar).
What this benchmark tells us
The Order-level numbers (31.3 % USDA / 40.6 % SiBCS / 32.7 % WRB / 47.6 % USDA Embrapa) remain the most-defensible "soilKey on real data" headline. The Subgroup / Subordem numbers expose the next two failure modes:
-
Subgroup machinery completeness (USDA): the Path C code paths for non-Typic subgroups (Aquic / Vertic / Oxyaquic / Mollic / Cumulic / Pachic / Inceptic / Ultic) need full coverage across all 12 Orders. Currently only the Typic subgroups consistently fire, with Aridisols subgroups (Calcic / Argic / Salic) being the next-best-covered set.
-
Munsell-colour assignment (SiBCS): the Vermelho / Amarelo / Bruno discrimination requires reliable Munsell hue/value/chroma data which is sparse in lab-only datasets. NASIS provides 99 % Munsell coverage for KSSL but FEBR has none; NASIS-equivalent for Brazilian context would be an Embrapa survey database we don't yet have access to.
Code
benchmark_run_classification(level) -- new values
"order"(default) -- comparescls$rsg_or_order."subgroup"(NEW) -- comparescls$name(case-insensitive, qualifier-paren-stripped). For USDA, automatically readsreference_usda_subgroup."subordem"(NEW) -- SiBCS 2nd-level. Truncates both reference and prediction to the first two tokens before comparison.
normalise_kssl_subgroup(x) (NEW exported)
Lowercases + collapses whitespace in KSSL samp_taxsubgrp strings so "TYPIC HAPLUDALFS" and "Typic Hapludalfs" compare equal.
load_kssl_pedons_gpkg -- expanded reference fields
site$reference_usda(Order, unchanged)site$reference_usda_subgroup(NEW fromsamp_taxsubgrp)site$reference_usda_grtgroup(NEW fromsamp_taxgrtgroup)site$reference_usda_suborder(NEW fromsamp_taxsuborder)
Quality
R CMD check --as-cranwith PROJ env: Status: OK (0 ERR / 0 WARN / 0 NOTE)- 2 850 testthat expectations passing, 0 failed (+8 from new
test-benchmark-subgroup-subordem.R) - 31/31 canonical fixtures still classify correctly (Order-level)
- Embrapa Order benchmark unchanged at 40.6 % (regression-safe)
Trajectory v0.9.13 -> v0.9.22
| Version | Embrapa Order | KSSL Order (n=2 000-3 000) | KSSL Subgroup | Embrapa Subordem |
|---|---|---|---|---|
| v0.9.13 (WoSIS) | 13 % WRB | n/a | n/a | n/a |
| v0.9.18 | 47.6 / 32.7 | 21.4 % | n/a | n/a |
| v0.9.20 (NASIS) | 47.6 / 32.7 | 32.2 % | n/a | n/a |
| v0.9.21 (+ tie-breaker) | 47.6 / 32.7 | 33.1 % (n=3 218) | n/a | n/a |
| v0.9.22 (+ deeper levels) | 47.6 / 32.7 | 31.3 % (n=2 002) | 2.1 % | 7.8 % |
Full benchmark reports
soilKey v0.9.21 -- NASIS pediagfeatures as scientific tie-breaker (Spodosols +16pp)
The "surveyor's diagnostic identification as scientific tie-breaker" release. Wires the NASIS pediagfeatures.featkind table (64 169 records of field-surveyor-identified diagnostic horizons) into the USDA Order gates as a TIE-BREAKER ONLY: when the canonical lab + morphology gate returns passed = NA (insufficient data), the surveyor's identification flips it to TRUE. When the canonical gate returns TRUE / FALSE, the tag is recorded as evidence but does NOT override -- preserving the deterministic-key-on-data invariant.
DEFINITIVE benchmark (KSSL+NASIS apples-to-apples, n = 3 218)
| Order | v0.9.19 lab-only | v0.9.20 + NASIS morphology | v0.9.21 + tie-breaker |
|---|---|---|---|
| Spodosols | 17.8 % (49/276) | 29.0 % (80/276) | 38.0 % (105/276) |
| Vertisols | 58.7 % (37/63) | 70.8 % (46/65) | 73.8 % (48/65) |
| Mollisols | 19.9 % (145/727) | 25.0 % (182/727) | 25.7 % (187/727) |
| Inceptisols | 23.1 % (107/463) | 46.3 % (215/464) | 46.3 % (215/464) |
| Aridisols | 42.4 % (189/446) | 46.6 % (208/446) | 46.6 % (208/446) |
| Alfisols | 21.4 % (142/663) | 22.6 % (150/665) | 22.6 % (150/665) |
| Ultisols | 21.9 % (90/411) | 21.7 % (89/411) | 21.7 % (89/411) |
| Entisols | 46.3 % (50/108) | 36.1 % (39/108) | 35.2 % (38/108) |
| Oxisols | 49.0 % (24/49) | 49.0 % (24/49) | 49.0 % (24/49) |
| Histosols | 66.7 % (2/3) | 66.7 % (2/3) | 66.7 % (2/3) |
| TOTAL | 26.0 % | 32.2 % | 33.1 % |
USDA top-1: 33.1 % (CI [31.7 %, 34.6 %], n = 3 218).
Cumulative improvement v0.9.19 -> v0.9.21: +7.1 pp. The Spodosol gain alone is +20.2 pp through v0.9.20 NASIS morphology (17.8 -> 29.0) and v0.9.21 tie-breaker (29.0 -> 38.0).
The Entisol drop (-11.1 pp v0.9.19 -> v0.9.21) is correct: profiles previously falling into the Entisol catch-all because Inceptisol / Spodosol / Vertisol gates couldn't fire (no morphology, no lab oxalate, no surveyor tag) now route to those Orders.
Replication across three samples
The Spodosol per-Order gain replicates cleanly:
| Sample | n | Spodosol n | v0.9.20 | v0.9.21 | Δ |
|---|---|---|---|---|---|
| 5 000-head | 3 218 | 276 | 29.0 % (80) | 38.0 % (105) | +9.0 pp |
| 3 000-head | 2 002 | 150 | 26.0 % (39) | 42.0 % (63) | +16.0 pp |
| 2 500-head | 1 679 | 139 | 26.6 % (37) | 43.2 % (60) | +16.6 pp |
The smaller-n samples show a larger relative gain because the residual NA cases (where the tie-breaker fires) are a higher fraction of the sample. The 5 000-head number is the most precise.
Embrapa benchmark unchanged (USDA 47.6 %, WRB 32.7 %, SiBCS 40.6 %); all 31 canonical fixtures still classify correctly.
What pediagfeatures provides
NASIS pediagfeatures.featkind distribution:
| featkind | n | Order it disambiguates |
|---|---|---|
| Argillic horizon | 13 501 | Alfisols / Ultisols |
| Mollic epipedon | 6 860 | Mollisols |
| Cambic horizon | 4 970 | Inceptisols |
| Lithic contact | 2 193 | Entisols (lithic subgroup) |
| Albic horizon | 1 415 | Spodosol / Alfisol disambiguation |
| Spodic horizon | 829 | Spodosols |
| Slickensides | 519 | Vertisols |
| Andic soil properties | 494 | Andisols |
| Histic epipedon | 201 | Histosols |
Code
Internal helpers
.has_nasis_feature(pedon, pattern)-- checkspedon$site$nasis_diagnostic_features(populated byload_kssl_pedons_with_nasis()) for a regex match..apply_nasis_tiebreaker(result, pedon, pattern, feature_label)-- applied at the start of each USDA Order gate. Ifresult$passed == NAAND surveyor identified the matching feature, flipspassedto TRUE and records provenance. Does NOT override TRUE / FALSE.
USDA Order gates with tie-breaker
| Gate | Tie-breaker pattern |
|---|---|
histosol_usda |
Histic / Folistic / Hemic / Sapric / Fibric / Limnic / Coprogenous |
spodosol_usda |
Spodic horizon / Spodic materials / Ortstein / Placic |
andisol_usda |
Andic soil properties / Vitric / Volcanic glass |
vertisol_usda |
Slickensides / Vertic features / Gilgai |
ultisol_usda |
Argillic horizon / Kandic horizon |
mollisol_usda |
Mollic epipedon |
alfisol_usda |
Argillic / Kandic / Natric horizon |
inceptisol_usda |
Cambic horizon |
Why scientifically defensible
The tie-breaker fires ONLY when the canonical gate returns NA, i.e., when the deterministic key has insufficient data to decide. In that case, the field surveyor's identification (recorded in NASIS by NRCS pedologists) is the most authoritative source short of re-running the field survey. When chemistry + morphology IS available and conclusive, the canonical gate's TRUE / FALSE stands unmodified -- the tie-breaker is strictly additive on missing-data cases.
Package-level invariant preserved: deterministic key on lab + morphology data always wins; surveyor tag is fallback when key is silent.
Quality
R CMD check --as-cranwith PROJ env: Status: OK (0 ERR / 0 WARN / 0 NOTE)- 2 842 testthat expectations passing, 0 failed (+13 new tie-breaker contract tests)
- 31/31 canonical fixtures still classify correctly
- No new dependencies (uses existing
Suggests: DBI + RSQLite)
EU-LUCAS update
The new EU-LUCAS files (EU_LUCAS_2022_updated.xlsx + l2022_survey_cop_radpoly_attr.gpkg) are the LUCAS land-cover / Copernicus radial-polygon products. They still do NOT have WRB classifications or full lab data. The required dataset is the JRC ESDAC LUCAS-2018-Soil module (LUCAS_TOPSOIL_2018.csv + ESDB join with WRB labels), which is a separate ESDAC release available at https://esdac.jrc.ec.europa.eu/projects/lucas (download the "LUCAS 2018 Soil module" not the 2022 land-cover update).
Trajectory v0.9.13 -> v0.9.21
| Version | Embrapa USDA | Embrapa WRB | KSSL USDA |
|---|---|---|---|
| v0.9.13 (WoSIS) | n/a | 13 % | n/a |
| v0.9.16 | 34.0 % | 21.6 % | n/a |
| v0.9.17 | 46.4 % | 25.5 % | n/a |
| v0.9.18 | 47.6 % | 32.7 % | 21.4 % |
| v0.9.19 | 47.6 % | 32.7 % | 26.0 % (n=3 213) |
| v0.9.20 (NASIS lab+morphology) | 47.6 % | 32.7 % | 32.2 % (n=3 218) |
| v0.9.21 (NASIS + tie-breaker) | 47.6 % | 32.7 % | 33.1 % (n=3 218, CI [31.7, 34.6]) |