Skip to content

feat: add MolarityConverter and TapeStation CompactRegionTable parser#79

Merged
simbig merged 15 commits intomasterfrom
concentration-and-tapestation
Mar 16, 2026
Merged

feat: add MolarityConverter and TapeStation CompactRegionTable parser#79
simbig merged 15 commits intomasterfrom
concentration-and-tapestation

Conversation

@simbig
Copy link
Contributor

@simbig simbig commented Mar 10, 2026

Summary

  • Add MolarityConverter for converting between mass concentration (ng/µL) and molar concentration (nmol/L) for dsDNA fragments
  • Add CompactRegionTableParser to parse Agilent TapeStation "Compact Region Table" CSV exports with delimiter detection and µ-encoding resilience
  • Add CompactRegionTableRecord as value object for parsed records
  • Defensive rejection of High Sensitivity assays (pg/µl + pmol/l) — prevents silent 1000× unit errors
  • $from is nullable for exports where the "From" column is absent
  • PHPDoc aligned with the official Agilent 4200 TapeStation System Manual

Test plan

  • 10 tests covering semicolon/comma delimiters, bp/nt units, Latin-1 µ encoding, missing From column, HS assay rejection
  • 6 tests for MolarityConverter
  • PHPStan level max passes

🤖 Generated with Claude Code

@simbig simbig requested review from KingKong1213 and Copilot and removed request for Copilot March 10, 2026 14:57
@simbig simbig force-pushed the concentration-and-tapestation branch from 081a444 to 343f764 Compare March 13, 2026 09:00
simbig and others added 13 commits March 16, 2026 10:32
MolarityConverter converts between mass concentration (ng/µL) and
molar concentration (nmol/L) for double-stranded DNA fragments.

CompactRegionTableParser reads Agilent TapeStation Compact Region Table
CSV exports, handling delimiter detection and encoding-corrupted µ characters.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove `readonly class` (PHP 8.2+) from CompactRegionTableRecord
- Remove named arguments (PHP 8.0+) from constructor call
- Replace `str_starts_with()` (PHP 8.0+) with `strpos() === 0`
- Add `@dataProvider` annotations alongside `#[DataProvider]` attributes

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
MolarityConverter:
- Replace assert() with \InvalidArgumentException for zero/negative
  fragment sizes — assert() is disabled in production (zend.assertions=-1),
  which would silently produce INF/NAN on division by zero
- Accept float for fragment size to support weighted averages from pooling

CompactRegionTableRecord:
- Rename fromBp/toBp/averageSizeBp to from/to/averageSize — these fields
  hold nucleotides (nt) for RNA TapeStation data, not base pairs (bp)

Tests:
- Fix corrupted µ test: use actual Latin-1 byte (0xB5) instead of valid
  UTF-8 sequence (0xC2 0xB5) which is not corrupted
- Add edge case tests: zero/negative fragment size, headers-only CSV

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The From [bp/nt] column is absent in some TapeStation Compact Region
Table exports. Return null instead of 0 to distinguish "missing" from
"starts at position 0".

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
HSD1000 exports use pg/µl and pmol/l (1000× smaller than standard D1000).
Silently parsing these values would produce dangerously wrong pooling volumes.
Throw early with a descriptive message instead.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Rename concentrationNgPerUl → concentration and
regionMolarityNmolPerL → regionMolarity.

The Compact Region Table format is shared across assay types (D1000,
HSD1000, RNA) — only the units in column headers differ. Embedding
ng/µl in the property name would force a breaking change when HS
support is added later.

Document the design decision on both parser and record: the caller
knows which assay it is parsing and interprets units accordingly.

BREAKING CHANGE: Property names on CompactRegionTableRecord changed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
All supported assays (D1000, D5000, RNA) use ng/µl + nmol/l.
The property names are truthful for every value the parser returns.

High Sensitivity assays (pg/µl, pmol/l) are rejected by the parser,
so a future HSD1000 integration would force a deliberate breaking
change — ensuring all consumers revisit their unit assumptions.

Size columns (from, to, averageSize) remain unit-agnostic because
bp and nt are numerically equivalent within a workflow context.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- List all standard assays (D1000, D5000, RNA, Genomic DNA, Cell-free DNA)
- Document averageSize as center of mass (Region view), not peak maximum

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ant comments

- CompactRegionTableRecord: typed properties replace @var annotations
- CompactRegionTableParser: extract rejectHighSensitivityAssay(), use Collection::map()
- MolarityConverter: remove PHPDoc that restated method names

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
TapeStations always use 96-well plates (12x8). Parsing the WellId
from CSV into a validated Coordinates object catches invalid positions
at import time instead of downstream.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Delegate string-to-float casting to SafeCast::tryFloat(), removing the
private parseFloat() method that reimplemented the same logic.
Rename wellID to coordinates to match the property type.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Extract the dsDNA-specific 660 Da constant as a parameter so the
converter becomes pure chemistry math.  Add constants for dsDNA,
ssDNA, and RNA on the class for convenient co-location.  Rename
methods to concentrationToMolarity / molarityToConcentration.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@simbig simbig force-pushed the concentration-and-tapestation branch from 3ea0888 to 5d9ce11 Compare March 16, 2026 09:33
…aults

Replace silent 0/0.0 defaults with strict SafeCast::toInt/toFloat that
throw on invalid input.  Remove parseInt helper (inlined).  Replace
strtok with stateless explode for delimiter detection.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@simbig simbig requested a review from KingKong1213 March 16, 2026 09:46
"Concentration" is ambiguous since molarity itself is a concentration.
Use the precise chemistry term "mass concentration" (ng/µl) to
eliminate ambiguity.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@simbig simbig requested a review from KingKong1213 March 16, 2026 10:01
@simbig simbig merged commit 98a9600 into master Mar 16, 2026
29 checks passed
@simbig simbig deleted the concentration-and-tapestation branch March 16, 2026 10:43
@github-actions
Copy link

🎉 This PR is included in version 6.8.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Development

Successfully merging this pull request may close these issues.

2 participants