Skip to content

Releases: HugoMachadoRodrigues/soilKey

soilKey lazy-fetch data (v0.9.94)

09 May 19:56
a0d0df3

Choose a tag to compare

Benchmark caches downloaded on demand by load_*_sample() functions and download_extdata_cache() since soilKey v0.9.94.

Contents

File n pedons Source Size
afsp_sample.rds 120 ISRIC Africa Soil Profiles Database v1.2 1.2 MB
kssl_sample.rds 99 NCSS Lab Data Mart (KSSL gpkg) 1.0 MB
kssl_nasis_sample.rds 99 NCSS Lab Data Mart + NASIS Morphological 1.0 MB
wosis_stratified_sample.rds 130 ISRIC WoSIS GraphQL (5 per RSG × 26 RSGs) 1.3 MB

Usage

# Eager prefetch all four caches into the user cache directory:
soilKey::download_extdata_cache("all")

# Or download lazily on first call:
length(soilKey::load_afsp_sample()$pedons)
length(soilKey::load_kssl_sample()$pedons)
length(soilKey::load_kssl_nasis_sample()$pedons)
length(soilKey::load_wosis_stratified_sample()$pedons)

The cache directory is tools::R_user_dir("soilKey", "data") (typically ~/Library/Application Support/.../soilKey/data on macOS, ~/.local/share/.../soilKey/data on Linux, %LOCALAPPDATA%/.../soilKey/data on Windows).

These files are under the same MIT license as the soilKey R package; the underlying datasets retain their respective upstream licenses (ISRIC AfSP / WoSIS public-domain, NCSS Lab Data Mart public-domain US Federal data).

v0.9.71 — Phase 2 done: BDsolos fixtures + structured outputs + polish

06 May 23:36

Choose a tag to compare

Bundles three coherent improvements that close out the Phase 2 roadmap.

(A) 8 BDsolos hard fixtures

Generated via `make_synthetic_horizons_fixture()` from real RJ pedons selected by SiBCS Ordem (Argissolo, Cambissolo, Chernossolo, Espodossolo, Gleissolo, Latossolo, Neossolo, Planossolo). Each fixture is a real BDsolos pedon's full horizon table rendered as Markdown — non-toy, multi-horizon, mixed Munsell úmida/seca, varied attribute coverage.

Reproduce locally:

```r
benchmark_vlm_extraction(
providers = list(gemma_e2b = list(name = "ollama", model = "gemma4:e2b")),
tasks = "horizons",
use_fewshot = TRUE,
n_repeats = 3L
)$summary
```

(8 fixtures × 3 reps × ~30 s = 12 min on a laptop CPU. Empirical numbers from a fully-completed run will land in a follow-up release.)

(B) ellmer `chat_structured()` bridge

  • `vlm_type_from_soilkey_schema(name)` — wraps `ellmer::type_from_schema()` reading `inst/schemas/.json` directly.
  • `validate_or_retry(..., use_structured = TRUE)` — short-circuits the chat-and-parse-and-retry loop when the provider supports it. Provider receives the ellmer type tree built from the soilKey schema and returns a structurally-valid R list directly. Removes the entire class of "model returned prose / wrong shape" failures at the protocol level (Anthropic tool calls / OpenAI `response_format = json_schema` / Ollama 0.5+ `format = json_schema` / Gemini structured output).
  • All extractors (`extract_horizons_from_pdf`, `extract_munsell_from_photo`, `extract_site_from_fieldsheet`) and `benchmark_vlm_extraction()` accept `use_structured = FALSE` (default for back-compat).

(C) Production polish

  • `extract_horizons_from_pdf()` — `cli::cli_progress_bar()` for multi-chunk PDFs (no-op for single-chunk, the common case).
  • `agent_app()` — sidebar adds "Estratégia de extração" section with checkboxes for `use_fewshot` (default TRUE) and `use_structured` (default FALSE). Both propagate to every `extract_*()` call inside the app.
  • Model preset labels corrected to v0.9.67 measured sizes (`light` = ~6.7 GB, `balanced` = ~8 GB, `best` = ~19 GB).

Tests

20 new tests / ~45 expectations in `test-v0970-structured-outputs.R`:

  • type builder (`vlm_type_from_soilkey_schema`) input validation + ellmer integration
  • capability probe (`.provider_supports_structured`) for ellmer Chat / Mock / NULL
  • `validate_or_retry` structured fast path (skip parse + retry when provider supports)
  • fallback path (use_structured=TRUE on Mock falls through to legacy loop)
  • parameter propagation through the entire `extract_*()` family

Total: 3 888 passing / 0 failing / 21 skipped.

Status

  • `R CMD check` Status: OK (0 errors / 0 warnings / 0 notes)
  • 8 BDsolos fixtures (1 per SiBCS Ordem) + structured-output infra + polish
  • 20 new tests
  • README + NEWS + status footer updated

🤖 Generated with Claude Code

v0.9.68 — Phase 2: few-shot demonstrations + variance characterisation

06 May 23:01

Choose a tag to compare

Phase 2 of the local-Gemma roadmap. Adds schema-correct worked-example prompts for the 3 extraction tasks, opt-in use_fewshot parameter, n_repeats for variance characterisation, and a harder bundled fixture.

What's shipped

Few-shot prompts (3 new)

  • inst/prompts/extract_horizons_fewshot.md — 2 worked examples in the schema-correct mixed shape: top_cm / bottom_cm / designation / boundary_* are RAW values; munsell_moist / munsell_dry are SINGLE wrapped objects holding hue + value + chroma + confidence + source_quote; everything else (clay_pct, ph_h2o, etc.) is wrapped {value, confidence, source_quote}.
  • inst/prompts/extract_site_from_text_fewshot.md — 2 PT-BR + EN examples; id / crs raw, everything else wrapped; country inferred from state.
  • inst/prompts/extract_munsell_from_photo_fewshot.md — 2 examples (with / without Munsell card; confidence calibration baked into the demos).

use_fewshot parameter

Opt-in on extract_horizons_from_pdf(), extract_munsell_from_photo(), and benchmark_vlm_extraction(). Default TRUE from v0.9.68. Set FALSE to run the bare-instructions baseline for an A/B.

n_repeats parameter

New on benchmark_vlm_extraction(). Runs each (provider × task × fixture) cell N times. Summary reports metric_*_mean AND metric_*_sd. Required to distinguish real lift from stochastic LLM noise.

Harder bundled fixture

perfil_BA_chernossolo_messy.{txt,golden.json} — 4-horizon Chernossolo Argilúvico Carbonático from a real Bahia survey: PT-BR comma decimal pH = 6,8, UTM coordinates noted then converted, mixed Munsell úmida/seca, CaCO3 equivalents. Smoke result with gemma4:e2b + few-shot: precision = 1.00, recall = 1.00, attr_match = 0.79.

Honest measurement (this is not a marketing release)

Few-shot did NOT move the metrics on the 4 simple bundled fixtures because vanilla gemma4:e2b already nails them:

Task Fixture Baseline Few-shot Δ
horizons Latossolo MG 1.00 / 1.00 / 1.00 1.00 / 1.00 / 1.00 0
horizons Argissolo RJ 1.00 / 1.00 / 1.00 1.00 / 1.00 / 1.00 0
site Ficha MG 0.79 / 1.00 / 0.79 0.79 / 1.00 / 0.79 0
site Ficha RJ 0.80 / 0.92 / 0.80 0.80 / 0.92 / 0.80 0

The 50% ok-rate observed in v0.9.66 was stochastic variance, not a real failure mode — which is exactly what the new n_repeats parameter exposes. Few-shot does not regress quality and the harder Chernossolo BA fixture confirms the pipeline handles non-toy PT-BR profiles. Real lift will surface only on harder fixtures or smaller models — not on the existing toy suite.

Status

  • R CMD check Status: OK (0 errors / 0 warnings / 0 notes)
  • Test suite: 3 868 passing / 0 failing / 21 skipped (unchanged; few-shot is opt-in)
  • 3 new few-shot prompts + 1 new harder fixture
  • n_repeats API + metric_*_sd columns in benchmark summary

🤖 Generated with Claude Code

v0.9.67 — Corrigendum: gemma4:e2b on-disk size + e2b vs 8B head-to-head

06 May 21:26

Choose a tag to compare

Doc + measurement corrigendum. No code logic changes.

What was wrong

Docs in v0.9.64 → v0.9.66 said gemma4:e2b was "~1.5 GB", which is the bare 2B-parameter weight at 4-bit quantisation. The actual on-disk footprint is ~6.7 GB: the multimodal Gemma 4 builds bundle a vision encoder + tokenizers that add ~5 GB above the bare parameter weights. Confirmed locally after the v0.9.66 pull completed.

Corrected catalog

Preset Tag On-disk
light gemma4:e2b ~6.7 GB (was ~1.5 GB)
balanced gemma4:e4b ~8 GB (approx)
best gemma4:31b ~19 GB
(8B default alias) gemma4 (gemma4:latest) ~9 GB

Files updated: R/setup-local-vlm.R, R/zzz.R, R/vlm-providers.R, vignettes/v10_agente_pedometrista.Rmd, vignettes/v11_vlm_extraction_benchmark.Rmd, README.md.

New head-to-head benchmark (gemma4:e2b vs gemma4 8B)

Re-ran benchmark_vlm_extraction() with both sizes on the four bundled text fixtures:

Task Fixture e2b gemma4 (8B)
horizons Latossolo MG 1.00 / 1.00 / 1.00 1.00 / 1.00 / 1.00
horizons Argissolo RJ 1.00 / 1.00 / 1.00 1.00 / 1.00 / 1.00
site Ficha MG ✓ (IoU 0.71, value-acc 1.00) ✗ (JSON validation error)
site Ficha RJ ✗ (JSON validation error) ✗ (JSON validation error)

Key reads:

  • Horizons (text) is solved at both sizes. The 2B model matches the 8B model on clean PT-BR profile descriptions — this locks in gemma4:e2b as the soilKey default for the agent app.
  • Site (text) is unstable on both sizes (50 % ok rate at e2b; 0 % at 8B in this 2-fixture sample). The failures are JSON validation errors, not wrong content. When extraction succeeds, value-accuracy on matched fields is 100 % — the model knows the right answer, it just doesn't always return it in valid JSON shape.

This is exactly what Phase 2 (few-shot demonstration pairs in the prompt) targets: insert 2-3 examples of correctly-shaped JSON before each call to discipline the model into the schema. No GPU required.

Status

  • R CMD check Status: OK (0 errors / 0 warnings / 0 notes)
  • Test suite: 3 868 passing / 0 failing / 21 skipped (unchanged from v0.9.66)
  • No API changes; no tests changed
  • README + 2 vignettes refreshed

🤖 Generated with Claude Code

v0.9.66 — Phase 1: VLM extraction benchmark

06 May 20:52

Choose a tag to compare

A measurable baseline for the local Gemma 4 stack — the input we needed before deciding whether to invest in few-shot demos (Phase 2) or LoRA fine-tuning (Phase 3).

What's new

benchmark_vlm_extraction()

Provider-agnostic harness over 3 tasks × per-task metrics:

Task Input Metrics
horizons Markdown / text profile description precision + recall + per-attribute match (higher better)
site Field-sheet text IoU + value-accuracy + recall (higher better)
munsell Profile photo (with Munsell card) mean Nickerson Color Difference (lower better)

Returns long-format predictions + summary data frames. Accepts MockVLMProvider for unit tests.

Bundled fixtures

  • horizons/: 4-horizon Argissolo RJ + 4-horizon Latossolo MG (paired text + golden JSON)
  • site/: 2 Brazilian field-sheet text fixtures + golden JSON
  • munsell/: README format spec — users supply their own photo fixtures (CRAN size + licence policy forbids shipping photos)

make_synthetic_horizons_fixture(pedon)

Renders any PedonRecord back into a Markdown profile description and emits the structured horizons as the golden answer. Lets you scale the horizons fixture set from BDsolos / FEBR / KSSL.

.onAttach() local-VLM hint

CRAN-compliant interactive prompt: detects Ollama state, suggests setup_local_vlm("light") when gemma4:e2b is missing. Auto-pull only with explicit opt-in (options(soilKey.auto_setup_vlm = TRUE) or env var SOILKEY_AUTO_SETUP_VLM=1). Suppress all hints with options(soilKey.suggest_local_vlm = FALSE).

extract_site_from_text.md prompt

Text-mode companion to the image-mode site prompt — required because the original explicitly said "Supplied as an image content block" and Gemma returned all-null when fed text.

Baseline measured (gemma4 8B local, MacBook M1)

Task Fixture precision / IoU recall / value-acc attr-match
horizons Latossolo MG 1.00 1.00 1.00
horizons Argissolo RJ 1.00 1.00 1.00
site Ficha MG 0.79 1.00 0.79
site Ficha RJ 0.87 0.92 0.87

Read: horizons extraction is solved (vanilla Gemma 4 + the pedologist_system_prompt() persona is enough for clean PT-BR profile descriptions). Site extraction is ~83 % IoU with ~96 % value-accuracy on matched fields — gaps are inferred fields (e.g. country: BR from a Brazilian state) that the 8B model misses but a 32B/Claude would catch.

This baseline is the input for the Phase 2 / Phase 3 decision:

  • Phase 2 (few-shot): inject 2-3 demonstration pairs per call. ~2 days human work; no GPU. Targets the site-task gap.
  • Phase 3 (LoRA): adapter fine-tune on (input, golden) pairs from BDsolos + FEBR. Needs ~1k labelled pairs and ~6 h on H100. Only justified if Phase 2 plateaus.

Status

  • R CMD check Status: OK (0 errors / 0 warnings / 0 notes)
  • Test suite: 3 868 passing / 0 failing / 21 expected skips
  • 47 new tests / ~70 expectations
  • Vinheta: v11_vlm_extraction_benchmark

🤖 Generated with Claude Code

v0.9.65 — Agente Pedometrista (bslib Shiny + local Gemma 4)

06 May 19:53

Choose a tag to compare

A modern bslib-themed Shiny UI that wires the v0.9.64 local Gemma 4 stack to the deterministic taxonomic key. Photo, PDF, field-sheet image and Vis-NIR spectrum each become a one-click extraction tab; the result is classified across WRB 2022 + SiBCS 5ª ed. + USDA Soil Taxonomy 13ed in the same session.

Quick start

# One-call setup of the local stack (downloads Gemma 4 e2b, ~1.5 GB):
soilKey::setup_local_vlm("light")

# Launch the agent:
soilKey::run_agent_app()

v0.9.64 — setup_local_vlm() + Ollama lifecycle + pedologist persona

  • setup_local_vlm(model = "balanced") — idempotent bootstrap. Detects Ollama, starts the daemon, pulls the chosen model. Catalog: light = gemma4:e2b (~1.5 GB), balanced = gemma4:e4b (~3 GB), best = gemma4:31b (~19 GB).
  • ollama_is_installed() / ollama_ensure_running() / ollama_pull_model() / ollama_list_local_models() — composable helpers, never throw, return logical / character.
  • pedologist_system_prompt(language = "pt-BR" | "en") — canonical persona installed in every chat session. Trained pedometrist (SiBCS 5ª + WRB 2022 + KST 13ed); explicit "NEVER classify, only extract"; per-attribute confidence + source_quote contract.
  • Default Ollama model lowered from gemma4:e4b (~3 GB) to gemma4:e2b (~1.5 GB) so the package "just works" on a developer laptop after setup_local_vlm("light").
  • 13 new tests / ~30 expectations.

v0.9.65 — agent_app() Shiny UI

8 nav_panels:

Tab Wires
📷 Foto Munsell extract_munsell_from_photo()
📄 PDF / Texto extract_horizons_from_pdf()
📋 Ficha de Campo extract_site_from_fieldsheet()
🌈 Espectros fill_from_spectra() (OSSL local-band library)
📊 Tabela Editable DT for manual correction
🌱 Classificar classify_all() → 3 bslib::value_box() cards (WRB / SiBCS / USDA)
🔍 Trace Per-system trace + provenance browser
💬 Pedometrista Free-form chat with the local Gemma using pedologist_system_prompt()

Persistent 320 px sidebar with provider/model selector, real-time Ollama status badges, "Configurar Gemma local" button (modal progress), language toggle (PT-BR / EN), session reset.

  • run_agent_app() — launcher, soft-fails on missing Suggests with actionable install.packages() hint.
  • New vignette v10_agente_pedometrista.Rmd — full walkthrough.
  • README rewritten: version badge 0.9.62 → 0.9.65, tests 3 760 → 3 821.

Privacy / data sovereignty

By default the agent prefers ollama in the auto-fallback chain. Sensitive photos, fieldsheets with precise geolocation, and internal PDFs never leave the machine. The cloud fallback (Anthropic / OpenAI / Google) only fires when Ollama is not running AND the user has set an API key — an explicit property, not a silent default. Recommended for governmental surveys, indigenous land studies, pre-publication research.

Principle

The LLM never classifies. It only extracts schema-validated JSON with per-attribute confidence and source_quote. The taxonomic key remains 100 % R deterministic, with versioned YAML rules.

Status

  • R CMD check Status: OK (0 errors / 0 warnings / 0 notes)
  • Test suite: 3 821 passing / 0 failing / 21 expected skips
  • 17 new tests across test-v0964-setup-local-vlm.R + test-v0965-agent-app.R

CRAN-friendly

Ships the downloader, NOT the weights. The user runs setup_local_vlm() once after install; Ollama caches the model in ~/.ollama/models/. No network calls happen at package install time.

🤖 Generated with Claude Code

v0.9.63 — Brazilian benchmark series (v0.9.55–v0.9.62)

06 May 19:17

Choose a tag to compare

The v0.9.55 → v0.9.63 release series wires the Brazilian SiBCS classifier to the two canonical pedologist-curated corpuses (Embrapa BDsolos and FEBR), validates the classifier against ~9 000 surveyor-labelled profiles, and consolidates the two repositories into a single deduplicated super-dataset.

Highlights

  • v0.9.55load_bdsolos_csv(), inspect_bdsolos_csv(), download_bdsolos() (BDsolos full-export ingestion: ~9 000 perfis from 27 UFs, semicolon-delimited, preamble + 222+ columns).
  • v0.9.57read_febr_pedons() + febr_index_munsell() (FEBR ~10k perfis; ~6 distinct Munsell column conventions; 200 / 249 datasets carry colour data, 36 275 horizons total).
  • v0.9.58–v0.9.59 — full BDsolos export schema support (DMS coordinates, read.csv2 fallback for malformed UTF-8 in 7 of 27 state CSVs).
  • v0.9.60benchmark_bdsolos_sibcs(): surveyor-reference benchmark mirroring benchmark_lucas_2018() for Brazil. .bdsolos_normalize_ordem() maps modern + pre-1999 legacy SiBCS Ordem names. Smoke test (RJ 100 pedons): 34 % Ordem accuracy, Argissolos 67.6 % recall.
  • v0.9.61R/sibcs-color-tuning.R: replaces the SiBCS subordem first-match-wins rule for colour-driven Ordens (Argissolos / Latossolos / Nitossolos) with a thickness-weighted dominant-colour-in-B rule. Wired into classify_sibcs() between subordem assignment and the v0.9.45 cor a determinar fallback. Benchmark also reports accuracy_subordem over canonical 2-3 letter SiBCS codes.
  • v0.9.62merge_brazilian_pedons(bdsolos, febr, prefer) deduplicates via site$sisb_id (BDsolos Codigo PA ≡ FEBR observacao$sisb_id). RJ overlap: 590 of 722 BDsolos pedons (65 %) match a FEBR sisb_id, naïve concat of 1 606 → after merge 1 016 distinct pedons. summarize_brazilian_overlap() is a dry-run diagnostic.
  • v0.9.63 — README documents the v0.9.55 → v0.9.62 trajectory; status footer rewritten to merge the Brazilian highlights with the existing USDA / WRB summary.

Tests / status

  • R CMD check Status: OK (0 errors / 0 warnings / 0 notes)
  • Test suite: 3 760 passing / 0 failing / 20 expected skips
  • 12 new tests in test-v0962-merge-brazilian.R (28 expectations)
  • 14 new tests in test-v0961-sibcs-color-tuning.R (37 expectations)
  • 10 new tests in test-v0960-bdsolos-benchmark.R (42 expectations)

Per-version detail

See NEWS.md for the full per-release diff.

🤖 Generated with Claude Code

soilKey v0.9.23 -- canonical eluvial-illuvial argic (SiBCS +14pp, KSSL Ultisols +12pp)

02 May 04:33

Choose a tag to compare

The "argic clay-increase canonicalisation" release. Single bug fix in test_clay_increase_argic with paper-sized impact across all three classification systems.

Root cause

test_clay_increase_argic (the predicate that gates the argic horizon, the argillic horizon, and every Order / RSG that depends on either) was comparing each candidate horizon's clay only against its immediate predecessor. KST 13ed Ch 3 (argillic horizon, p 4) and WRB 2022 Ch 3.1.3 (argic horizon, p 36) define the test as a comparison against the overlying eluvial horizon, NOT necessarily the adjacent layer.

Profiles where clay rises gradually through a thick A / E / Bw / Bt sequence (e.g. KSSL Hapludalfs with clay 13 -> 15 -> 22 -> 27 -> 31, or many FEBR Argissolos) were being silently rejected because no two adjacent layers passed the +6 pp / 1.4-ratio thresholds, even though the canonical A-vs-Bt jump 13 -> 31 obviously satisfies argic.

Fix

test_clay_increase_argic now evaluates the rule against:

  1. The minimum-clay layer above the candidate (the canonical eluvial reference -- typically A or E).
  2. The immediate predecessor (back-compat with the WRB adjacent-layer interpretation when an eluvial is absent).

Either trigger accepts the candidate. Strictly additive -- no candidate that passed before now fails.

Real-data benchmark impact

Embrapa FEBR (apples-to-apples)

System v0.9.22 v0.9.23 Δ
SiBCS Order 40.6 % 54.7 % +14.1 pp
USDA Order 47.6 % 51.1 % +3.5 pp
WRB Order 32.7 % 33.7 % +1.0 pp

The SiBCS +14.1 pp jump is the biggest single-version gain in the project to date. Most of the v0.9.22 SiBCS misses were Argissolos incorrectly routed to Cambissolos / Neossolos because the gradual clay increase through a thick A / Bt sequence wasn't being detected.

KSSL + NASIS (n = 998, apples-to-apples)

Order v0.9.22 v0.9.23 Δ
Vertisols 65.2 % 68.8 % +3.6 pp
Aridisols 53.1 % 55.4 % +2.3 pp
Ultisols 26.3 % 38.9 % +12.6 pp
Alfisols 20.9 % 31.2 % +10.3 pp
Spodosols 29.9 % 37.9 % +8.0 pp
Mollisols 21.8 % 22.9 % +1.1 pp
Inceptisols 47.2 % 41.5 % -5.7 pp
Entisols 53.1 % 46.9 % -6.2 pp
Oxisols 60.0 % 60.0 % (=)
TOTAL Order 32.7 % 36.0 % +3.3 pp
TOTAL Subgroup 2.4 % 2.7 % +0.3 pp

The Alfisol / Ultisol / Spodosol gains (+8 to +13 pp each) are where the v0.9.22 -> v0.9.23 fix delivers the most: profiles with gradual A → E → Bt clay sequences now correctly route to argillic-bearing Orders. Inceptisol / Entisol drops (-5 to -6 pp) are correct: profiles previously routed to those catch-all Orders are now properly classified as Alfisols / Ultisols.

Why scientifically defensible

The test changes from

above <- h$clay_pct[i - 1L]   # adjacent only -- BUG

to

above_min <- min(h$clay_pct[1:(i-1)], na.rm = TRUE)   # canonical eluvial reference
above_adj <- h$clay_pct[i - 1L]                         # adjacent fallback
# Either passes -> candidate accepted.

The min-above reference matches:

  • KST 13ed Ch 3 p 4: "the increase in clay content with depth must be ... compared to a lighter-textured eluvial horizon above"
  • WRB 2022 Ch 3.1.3 p 36: "clay percent increases compared to the overlying horizon by ..."

Both canonical sources reference "the overlying eluvial horizon" / "the overlying horizon" without specifying adjacency. Pre-v0.9.23 we were applying a stricter-than-canonical interpretation.

Quality

  • R CMD check --as-cran with PROJ env: Status: OK (0 ERR / 0 WARN / 0 NOTE)
  • 2 850 testthat expectations passing, 0 failed (no regression -- the new min-above path is strictly additive)
  • 31/31 canonical fixtures still classify correctly to their intended RSG / Order

What's NOT yet done (next priorities)

  1. Subgroup machinery completion -- subgroup top-1 still 2.7 % (n=998). The argic fix lifted Order, but the qualified-subgroup permutations (Cumulic / Pachic / Aquic / Oxyaquic / Mollic / Ultic / etc.) need full coverage across Mollisols + Alfisols + Inceptisols where most KSSL refs sit. Conditional Subgroup (within correctly-Ordered profiles) is now ~ 7.5 %.

  2. EU-LUCAS WRB benchmark -- the bundled ESDBv2 archive ships schema-only Excel files; the actual WRB-coded SGDBE database is locked in autorun.exe (Windows installer). Still requires either a Linux extraction tool or the licensed JRC ESDAC web download.

  3. WoSIS GraphQL refresh -- v0.9.13's 13 % WRB baseline was measured against WoSIS 2024-10. Re-running with the current v0.9.23 deterministic key + NASIS / pediagfeatures features would expose how much of the v0.9.13 → v0.9.23 trajectory is reproducible on the WoSIS sample. Deferred to v0.9.24+.

  4. Brazilian Munsell -- the Embrapa FEBR archive lacks Munsell data, capping SiBCS Subordem benchmark. A NASIS-equivalent for the Brazilian context (IBGE soil-survey volumes, Embrapa BDsolos curated) would unlock Subordem benchmark from ~ 8 % to estimated 25-40 %.

Trajectory v0.9.13 -> v0.9.23 (definitive)

Version Embrapa USDA Embrapa WRB Embrapa SiBCS KSSL USDA
v0.9.13 (WoSIS) n/a 13 % n/a n/a
v0.9.18 47.6 % 32.7 % 40.6 % 21.4 %
v0.9.20 (NASIS) 47.6 % 32.7 % 40.6 % 32.2 %
v0.9.21 (+ tie-breaker) 47.6 % 32.7 % 40.6 % 33.1 %
v0.9.23 (+ argic canonical) 51.1 % 33.7 % 54.7 % 36.0 %

soilKey v0.9.22 -- subgroup-level USDA + Subordem-level SiBCS benchmarks

01 May 21:27

Choose a tag to compare

The "deeper-than-Order benchmark" release. Two scientific extensions to the benchmark runner that probe one taxonomy level deeper than the previous Order-only validation.

What's new

  1. benchmark_run_classification gains level = "subgroup" and level = "subordem". Comparison is case-insensitive with qualifier-paren stripping; Subordem truncates the predicted name to its first two tokens to match FEBR-style labels.

  2. load_kssl_pedons_gpkg now extracts samp_taxsubgrp, samp_taxgrtgroup, samp_taxsuborder from the KSSL gpkg into site$reference_usda_subgroup / _grtgroup / _suborder. The benchmark reads reference_usda_subgroup automatically when level = "subgroup".

  3. normalise_kssl_subgroup(x) -- exported helper for arbitrary KSSL-format subgroup string normalisation.

Definitive real-data benchmark (KSSL + NASIS, n = 2 002, apples-to-apples vs v0.9.21)

Level top-1 CI 95 %
Order 31.3 % [29.0 %, 33.5 %]
Subgroup 2.1 % [1.5 %, 2.7 %]

Per-Order Subgroup accuracy (within each reference Order)

Reference Order n correct accuracy
Aridisols 277 14 5.1 % (best)
Ultisols 259 8 3.1 %
Entisols 62 2 3.2 %
Alfisols 417 7 1.7 %
Spodosols 150 2 1.3 %
Inceptisols 243 3 1.2 %
Mollisols 511 5 1.0 %
Vertisols / Oxisols / Andisols 61 0 0 %

The pattern confirms the partial Path C machinery diagnosis: the largest absolute misses are Mollisols (506 misses) and Alfisols (410 misses) where the qualified subgroup permutations (Cumulic / Pachic / Aquic / Oxyaquic / Mollic / Ultic / etc.) are incomplete in the current implementation.

Aridisols at 5.1 % is informative: the Aridic moisture regime is unambiguous from the dryness of the profile, so the Subgroup machinery within Aridisols (Typic / Calcic / Argic / Salic) gets enough of them right.

Conditional Subgroup accuracy (within correctly-Ordered profiles): **~ 7 %**. Examples of correct subgroup hits: typic hapludults, typic dystrudepts, typic calciaquolls, typic endoaquolls, oxyaquic haplorthods, aridic argiustolls.

Embrapa FEBR SiBCS Subordem (n = 128)

Level n=128
Order 40.6 %
Subordem 7.8 % [3.1, 14.1]

The Subordem drop is dominated by Munsell-colour disagreement (Vermelho / Amarelo / Bruno) on profiles where FEBR records the surveyor's colour judgement but the lab gpkg lacks Munsell. 26 of 57 reference Argissolos are correctly Order'd as Argissolos but classified to a different colour Subordem.

Critical scientific finding -- FEBR Subordem ceiling

FEBR (the open Brazilian soil-data archive used as soilKey's benchmark source) ships SiBCS labels at the 2nd-level (Subordem) maximum -- 31 unique strings total across the 50 485 horizon rows, e.g. "LATOSSOLO VERMELHO", "ARGISSOLO BRUNO-ACINZENTADO". The 5th-level (Familia, Cap 18) is therefore not benchmarkable with FEBR alone.

This release pivots from "Familia validation" (the user's original request) to "Subordem validation" as the deepest level FEBR actually supports. Future Familia validation requires a different reference dataset (IBGE soil-survey volumes, Embrapa BDsolos curated, or similar).

What this benchmark tells us

The Order-level numbers (31.3 % USDA / 40.6 % SiBCS / 32.7 % WRB / 47.6 % USDA Embrapa) remain the most-defensible "soilKey on real data" headline. The Subgroup / Subordem numbers expose the next two failure modes:

  1. Subgroup machinery completeness (USDA): the Path C code paths for non-Typic subgroups (Aquic / Vertic / Oxyaquic / Mollic / Cumulic / Pachic / Inceptic / Ultic) need full coverage across all 12 Orders. Currently only the Typic subgroups consistently fire, with Aridisols subgroups (Calcic / Argic / Salic) being the next-best-covered set.

  2. Munsell-colour assignment (SiBCS): the Vermelho / Amarelo / Bruno discrimination requires reliable Munsell hue/value/chroma data which is sparse in lab-only datasets. NASIS provides 99 % Munsell coverage for KSSL but FEBR has none; NASIS-equivalent for Brazilian context would be an Embrapa survey database we don't yet have access to.

Code

benchmark_run_classification(level) -- new values

  • "order" (default) -- compares cls$rsg_or_order.
  • "subgroup" (NEW) -- compares cls$name (case-insensitive, qualifier-paren-stripped). For USDA, automatically reads reference_usda_subgroup.
  • "subordem" (NEW) -- SiBCS 2nd-level. Truncates both reference and prediction to the first two tokens before comparison.

normalise_kssl_subgroup(x) (NEW exported)

Lowercases + collapses whitespace in KSSL samp_taxsubgrp strings so "TYPIC HAPLUDALFS" and "Typic Hapludalfs" compare equal.

load_kssl_pedons_gpkg -- expanded reference fields

  • site$reference_usda (Order, unchanged)
  • site$reference_usda_subgroup (NEW from samp_taxsubgrp)
  • site$reference_usda_grtgroup (NEW from samp_taxgrtgroup)
  • site$reference_usda_suborder (NEW from samp_taxsuborder)

Quality

  • R CMD check --as-cran with PROJ env: Status: OK (0 ERR / 0 WARN / 0 NOTE)
  • 2 850 testthat expectations passing, 0 failed (+8 from new test-benchmark-subgroup-subordem.R)
  • 31/31 canonical fixtures still classify correctly (Order-level)
  • Embrapa Order benchmark unchanged at 40.6 % (regression-safe)

Trajectory v0.9.13 -> v0.9.22

Version Embrapa Order KSSL Order (n=2 000-3 000) KSSL Subgroup Embrapa Subordem
v0.9.13 (WoSIS) 13 % WRB n/a n/a n/a
v0.9.18 47.6 / 32.7 21.4 % n/a n/a
v0.9.20 (NASIS) 47.6 / 32.7 32.2 % n/a n/a
v0.9.21 (+ tie-breaker) 47.6 / 32.7 33.1 % (n=3 218) n/a n/a
v0.9.22 (+ deeper levels) 47.6 / 32.7 31.3 % (n=2 002) 2.1 % 7.8 %

Full benchmark reports

soilKey v0.9.21 -- NASIS pediagfeatures as scientific tie-breaker (Spodosols +16pp)

01 May 20:31

Choose a tag to compare

The "surveyor's diagnostic identification as scientific tie-breaker" release. Wires the NASIS pediagfeatures.featkind table (64 169 records of field-surveyor-identified diagnostic horizons) into the USDA Order gates as a TIE-BREAKER ONLY: when the canonical lab + morphology gate returns passed = NA (insufficient data), the surveyor's identification flips it to TRUE. When the canonical gate returns TRUE / FALSE, the tag is recorded as evidence but does NOT override -- preserving the deterministic-key-on-data invariant.

DEFINITIVE benchmark (KSSL+NASIS apples-to-apples, n = 3 218)

Order v0.9.19 lab-only v0.9.20 + NASIS morphology v0.9.21 + tie-breaker
Spodosols 17.8 % (49/276) 29.0 % (80/276) 38.0 % (105/276)
Vertisols 58.7 % (37/63) 70.8 % (46/65) 73.8 % (48/65)
Mollisols 19.9 % (145/727) 25.0 % (182/727) 25.7 % (187/727)
Inceptisols 23.1 % (107/463) 46.3 % (215/464) 46.3 % (215/464)
Aridisols 42.4 % (189/446) 46.6 % (208/446) 46.6 % (208/446)
Alfisols 21.4 % (142/663) 22.6 % (150/665) 22.6 % (150/665)
Ultisols 21.9 % (90/411) 21.7 % (89/411) 21.7 % (89/411)
Entisols 46.3 % (50/108) 36.1 % (39/108) 35.2 % (38/108)
Oxisols 49.0 % (24/49) 49.0 % (24/49) 49.0 % (24/49)
Histosols 66.7 % (2/3) 66.7 % (2/3) 66.7 % (2/3)
TOTAL 26.0 % 32.2 % 33.1 %

USDA top-1: 33.1 % (CI [31.7 %, 34.6 %], n = 3 218).

Cumulative improvement v0.9.19 -> v0.9.21: +7.1 pp. The Spodosol gain alone is +20.2 pp through v0.9.20 NASIS morphology (17.8 -> 29.0) and v0.9.21 tie-breaker (29.0 -> 38.0).

The Entisol drop (-11.1 pp v0.9.19 -> v0.9.21) is correct: profiles previously falling into the Entisol catch-all because Inceptisol / Spodosol / Vertisol gates couldn't fire (no morphology, no lab oxalate, no surveyor tag) now route to those Orders.

Replication across three samples

The Spodosol per-Order gain replicates cleanly:

Sample n Spodosol n v0.9.20 v0.9.21 Δ
5 000-head 3 218 276 29.0 % (80) 38.0 % (105) +9.0 pp
3 000-head 2 002 150 26.0 % (39) 42.0 % (63) +16.0 pp
2 500-head 1 679 139 26.6 % (37) 43.2 % (60) +16.6 pp

The smaller-n samples show a larger relative gain because the residual NA cases (where the tie-breaker fires) are a higher fraction of the sample. The 5 000-head number is the most precise.

Embrapa benchmark unchanged (USDA 47.6 %, WRB 32.7 %, SiBCS 40.6 %); all 31 canonical fixtures still classify correctly.

What pediagfeatures provides

NASIS pediagfeatures.featkind distribution:

featkind n Order it disambiguates
Argillic horizon 13 501 Alfisols / Ultisols
Mollic epipedon 6 860 Mollisols
Cambic horizon 4 970 Inceptisols
Lithic contact 2 193 Entisols (lithic subgroup)
Albic horizon 1 415 Spodosol / Alfisol disambiguation
Spodic horizon 829 Spodosols
Slickensides 519 Vertisols
Andic soil properties 494 Andisols
Histic epipedon 201 Histosols

Code

Internal helpers

  • .has_nasis_feature(pedon, pattern) -- checks pedon$site$nasis_diagnostic_features (populated by load_kssl_pedons_with_nasis()) for a regex match.
  • .apply_nasis_tiebreaker(result, pedon, pattern, feature_label) -- applied at the start of each USDA Order gate. If result$passed == NA AND surveyor identified the matching feature, flips passed to TRUE and records provenance. Does NOT override TRUE / FALSE.

USDA Order gates with tie-breaker

Gate Tie-breaker pattern
histosol_usda Histic / Folistic / Hemic / Sapric / Fibric / Limnic / Coprogenous
spodosol_usda Spodic horizon / Spodic materials / Ortstein / Placic
andisol_usda Andic soil properties / Vitric / Volcanic glass
vertisol_usda Slickensides / Vertic features / Gilgai
ultisol_usda Argillic horizon / Kandic horizon
mollisol_usda Mollic epipedon
alfisol_usda Argillic / Kandic / Natric horizon
inceptisol_usda Cambic horizon

Why scientifically defensible

The tie-breaker fires ONLY when the canonical gate returns NA, i.e., when the deterministic key has insufficient data to decide. In that case, the field surveyor's identification (recorded in NASIS by NRCS pedologists) is the most authoritative source short of re-running the field survey. When chemistry + morphology IS available and conclusive, the canonical gate's TRUE / FALSE stands unmodified -- the tie-breaker is strictly additive on missing-data cases.

Package-level invariant preserved: deterministic key on lab + morphology data always wins; surveyor tag is fallback when key is silent.

Quality

  • R CMD check --as-cran with PROJ env: Status: OK (0 ERR / 0 WARN / 0 NOTE)
  • 2 842 testthat expectations passing, 0 failed (+13 new tie-breaker contract tests)
  • 31/31 canonical fixtures still classify correctly
  • No new dependencies (uses existing Suggests: DBI + RSQLite)

EU-LUCAS update

The new EU-LUCAS files (EU_LUCAS_2022_updated.xlsx + l2022_survey_cop_radpoly_attr.gpkg) are the LUCAS land-cover / Copernicus radial-polygon products. They still do NOT have WRB classifications or full lab data. The required dataset is the JRC ESDAC LUCAS-2018-Soil module (LUCAS_TOPSOIL_2018.csv + ESDB join with WRB labels), which is a separate ESDAC release available at https://esdac.jrc.ec.europa.eu/projects/lucas (download the "LUCAS 2018 Soil module" not the 2022 land-cover update).

Trajectory v0.9.13 -> v0.9.21

Version Embrapa USDA Embrapa WRB KSSL USDA
v0.9.13 (WoSIS) n/a 13 % n/a
v0.9.16 34.0 % 21.6 % n/a
v0.9.17 46.4 % 25.5 % n/a
v0.9.18 47.6 % 32.7 % 21.4 %
v0.9.19 47.6 % 32.7 % 26.0 % (n=3 213)
v0.9.20 (NASIS lab+morphology) 47.6 % 32.7 % 32.2 % (n=3 218)
v0.9.21 (NASIS + tie-breaker) 47.6 % 32.7 % 33.1 % (n=3 218, CI [31.7, 34.6])