Skip to content

Release v0.2.0: salary extension, WAR loaders, MCP config#2

Merged
luceydav merged 9 commits into
mainfrom
dev
Mar 30, 2026
Merged

Release v0.2.0: salary extension, WAR loaders, MCP config#2
luceydav merged 9 commits into
mainfrom
dev

Conversation

@luceydav

Copy link
Copy Markdown
Owner

lahmanTools 0.2.0

What's new

Salary data extended to 2025

  • Spotrac player-level actuals (2017–2021) via data-raw/salaries.R
  • USA Today player-level actuals (2022–2025) via scrape_salaries()
  • Unified SalariesAll view unions all three sources; filter is_actual = TRUE

FanGraphs WAR loaders (1985–present)

  • load_fangraphs_war() — batting + pitching WAR via baseballr
  • load_chadwick_ids() — Chadwick Bureau player ID crosswalk (ODC-BY 1.0)
  • load_statcast() — Baseball Savant pitch-level data (2015+)
  • New views: PlayerIDs, PlayerWAR, SalaryPerWAR (dollars/WAR by era)

Multi-pass player name matcher

  • match_player_ids() — 4-pass matching (exact → normalised → year-constrained → team-constrained)
  • normalise_player_name(), team_name_map()

MCP config for AI-assisted querying

  • write_mcp_config() — generates config to connect GitHub Copilot CLI or Claude to baseball.duckdb via DuckDB MCP server

New analytical views + macro

  • PlayerAcquisitionType, LeagueMedianSalary, TeamPayroll (now implemented)
  • era_label(yr) SQL macro — replaces repeated CASE WHEN era blocks

Tests: 227 passing, 0 failures

Attribution: expanded to cover Lahman (CC BY-SA 3.0), Chadwick (ODC-BY 1.0), FanGraphs, Statcast, and baseballr

David Lucey and others added 9 commits March 29, 2026 21:27
Add normalise_player_name() and match_player_ids() to R/utils.R.
Three-pass matching strategy:
  Pass 1: exact 'Last, First' match
  Pass 2: normalised names (strips accents, suffixes, asterisks,
           expands initials JD->J D, fixes UTF-8 mojibake)
  Pass 3: year-active disambiguation for ambiguous names

Improves USA Today match rate from ~77% to 95.5% and Spotrac from
~83% to 95.4%. Stars like Harper, Acuna, Altuve, Realmuto, Tatis
now correctly matched.

Update R/scrape.R and data-raw/salaries.R to use new matcher.
Add 13 new tests in test-utils.R (35 total, 147 suite-wide).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add team_name_map() -- maps 60+ team display names (USA Today, Spotrac,
standard abbreviations) to Lahman teamID codes.

Add Pass 4 to match_player_ids(): when team column present, constrain
candidates to team-year roster (~50 players). Within a team-year,
last name alone resolves 96.4% and last+initial resolves 99.6% --
no nickname table or complex normalization needed.

Results:
  USA Today: 95.5% -> 99.0% rows, 97.4% -> 99.6% payroll
  Spotrac:   95.4% -> 98.2% rows, 97.6% -> 99.5% payroll

Remaining ~1% are genuine edge cases: Jr. in last name position,
hyphenated names (Kepler-Rozycki), two same-name teammates.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Remove years >= 2002 restriction on FanGraphs pitching WAR fetch
  (API works back to 1985, adding 8,481 pitcher-seasons of WAR data)
- Extract loaders.R from utils.R (WAR + ChadwickIDs loading)
- Add write_mcp_config() helper for AI tool database access
- Add analytical views: PlayerAcquisitionType, TeamPayroll,
  LeagueMedianSalary, SalaryPerWAR, PlayerWAR, era_label() macro
- Update BattingStats/PitchingStats/FieldingStats with COALESCE fixes
- Add AGENTS.md, update CONTRIBUTING.md, README.md, NEWS.md
- All 147 tests pass

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
roxygenise() adds 4 missing exports: load_chadwick_ids,
load_fangraphs_war, load_statcast, write_mcp_config.
Generates .Rd man pages for all exported functions.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- BattingStats: verify AVG, OBP, SLG, OPS, ISO, BABIP, BB%, K% with
  hand-calculated values; test zero-AB edge case returns NULL
- PitchingStats: verify IP, WHIP, K/9, BB/9, HR/9, K/BB, Win%, FIP
  with era-adjusted constant; test zero-IPouts edge case
- FieldingStats: verify FPCT, RF/9, RF/G
- match_player_ids Pass 4a: team + last name resolution
- match_player_ids Pass 4b: same-lastname teammates disambiguated
  by first initial
- match_player_ids Pass 4: teamID column path + wrong-team failure
- team_name_map: 30 franchises, no duplicates, common abbreviation
  mappings (NYM->NYN, CHC->CHN, etc.)
- scrape_salaries: input validation for unknown year slugs

147 -> 227 tests (0 failures)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Intro now highlights salary extension (2025), WAR (1985+), and MCP querying
- Added WAR views section (PlayerIDs, PlayerWAR, SalaryPerWAR) to derived views
- Fixed FangraphsPitchingWAR date range: 2002 -> 1985
- Updated war_reliable note (now always TRUE for salary era)
- Fixed view count: 8 -> 10; table+view count: 3+2 -> 3+3
- Added mcp_config.R to package structure listing
- Updated NEWS.md pitching WAR date range

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Attribution section now covers all data sources with license/obligations:
  Lahman (CC BY-SA 3.0), Chadwick (ODC-BY 1.0), FanGraphs, Statcast, scrapers
- Clarifies package is a tooling layer that does not bundle third-party data
- Credits baseballr (MIT, Bill Petti) as data-fetching layer
- DESCRIPTION updated to mention FanGraphs WAR, Chadwick, MCP config, and
  that no data is bundled

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
New since 0.1.0:
- Extended salary coverage 1985-2025 (Spotrac + USA Today)
- FanGraphs WAR loaders (batting + pitching, 1985+)
- Chadwick Bureau player ID crosswalk
- Multi-pass player name matcher (4 passes, team-constrained)
- Statcast pitch-level data loader
- 6 new analytical views + era_label() macro
- write_mcp_config() for GitHub Copilot CLI / Claude integration
- 227 tests (0 failures)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
# Conflicts:
#	.github/copilot-instructions.md
#	.gitignore
#	AGENTS.md
#	R/loaders.R
#	R/setup_db.R
#	README.md
#	tests/testthat/test-connect.R
#	tests/testthat/test-loaders.R
@luceydav luceydav merged commit 517052a into main Mar 30, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant