Skip to content

Add ArrayDiscretization strategy for vectorized interior equations#531

Open
ctessum-claude wants to merge 38 commits into
SciML:masterfrom
ctessum-claude:array-discretization-phase1
Open

Add ArrayDiscretization strategy for vectorized interior equations#531
ctessum-claude wants to merge 38 commits into
SciML:masterfrom
ctessum-claude:array-discretization-phase1

Conversation

@ctessum-claude

@ctessum-claude ctessum-claude commented Mar 3, 2026

Copy link
Copy Markdown

Summary

This PR introduces ArrayDiscretization, a new discretization strategy that generates interior PDE equations as single symbolic ArrayOp expressions instead of one scalar equation per grid point. This dramatically reduces symbolic processing time during discretization while producing numerically identical solutions (mostly).

  • 22 commits, ~8,300 lines added across 19 files (2,770-line template engine + 5,437-line test suite)
  • Fully backwards-compatible: existing ScalarizedDiscretization behavior is unchanged
  • Opt-in via MOLFiniteDifference([x => dx], t; discretization_strategy = ArrayDiscretization())

How It Works

Original approach (ScalarizedDiscretization)

For N interior grid points, the discretizer calls discretize_equation_at_point() N times, building fresh symbolic substitution rules at each point. This produces N independent scalar equations and has super-linear symbolic overhead.

New approach (ArrayDiscretization)

  1. Pre-compute stencil weights and offsets once for all derivative orders
  2. Introduce symbolic index variables (_i1, _i2, ...) — one per spatial dimension
  3. Build finite-difference substitution rules using these symbolic indices
  4. Apply rules to the PDE once to get a single template expression
  5. Wrap in a SymbolicUtils.ArrayOp with range parameters (e.g., _i ∈ 1:n_interior)

Boundary equations are handled by the existing scalar infrastructure unchanged.

Two operating modes

  • Standard mode: ArrayOp covers the "centered interior" (excluding boundary-proximity frame points which fall back to per-point scalar discretization)
  • Full-interior mode: Pre-expands weight/offset matrices to cover ALL interior points using position-dependent Const-wrapped matrices, eliminating the frame entirely

Automatic fallback

If the PDE contains patterns the template engine cannot handle (e.g., FunctionalScheme), or if template validation against the scalar path fails, the system automatically falls back to per-point scalar discretization.

Supported PDE Patterns

Pattern Uniform Grid Non-Uniform Grid
Centered (even-order) derivatives
Upwind (odd-order) derivatives
Mixed cross-derivatives Dxy(u)
Nonlinear Laplacian Dx(f·Dx(u))
Spherical Laplacian r⁻²Dr(r²Dr(u))
WENO5 (Jiang-Shu) first-order
Staggered grid (odd-order)
Periodic BCs
Algebraic equations (PDAE)

Performance Benchmarks

Benchmarks were run across 7 PDE systems (1D and 2D, linear and nonlinear) at multiple grid resolutions, measuring both discretization time and solve time.

Discretization Time Speedup (Scalar / Array)

System Smallest Grid Largest Grid
1D Linear Diffusion 1.9× (10 pts) 38.5× (200 pts)
1D Burgers (Upwind) 1.6× (10 pts) 29.4× (200 pts)
1D Nonlinear Diffusion 1.7× (10 pts) 17.7× (200 pts)
1D Linear Convection (Periodic) 2.0× (20 pts) 8.2× (160 pts)
1D Wave (Staggered) 0.7× (20 pts) 7.3× (128 pts)
2D Linear Diffusion 1.7× (5×5) 4.9× (20×20)
2D Brusselator 3.8× (8×8) 4.5× (24×24)

Absolute Discretization Times at Largest Grid Sizes

System Grid Pts Scalar (ms) Array (ms)
1D Linear Diffusion 200 9,025 234
1D Burgers (Upwind) 200 8,404 286
1D Nonlinear Diffusion 200 8,191 463
2D Brusselator 576 20,226 4,451
2D Linear Diffusion 400 8,621 1,763
1D Wave (Staggered) 128 5,595 761
1D Linear Convection 160 1,275 156

Solve Time

Solve times are essentially unchanged (speedup ratio ≈ 1.0× across all systems and grid sizes).

Test Coverage

The new test file (test/pde_systems/array_disc_tests.jl, 5,437 lines) covers 15 phases:

  • Phases 1–3: Basic 1D/2D/3D diffusion, structural verification, fallback behavior
  • Phases 4–6: Upwind, mixed derivatives, nonlinear Laplacian, spherical Laplacian
  • Phases 7–9: Non-uniform grids (all derivative types)
  • Phase 10: WENO5 scheme
  • Phase 11: Periodic BCs, multi-variable systems, Neumann/Robin BCs
  • Phases 12–13: Complex PDEs (KdV, Brusselator, Schrödinger, beam equation, PDAEs)
  • Phases 14–15: Full-interior ArrayOp mode, staggered grids

Primary validation: Array vs. Scalar numerical comparison at rtol=1e-10 in most cases, plus analytical solution verification and structural checks.

🤖 Generated with Claude Code

ctessum-claude and others added 22 commits February 27, 2026 05:13
Store Symbolics.Arr directly in discvars instead of materializing to
Vector{BasicSymbolic} via unwrap.(collect(...)). This preserves the
symbolic array structure through discretization.

Since Symbolics.Arr does not support fancy indexing with
Vector{CartesianIndex}, add a _disc_gather utility that handles both
scalar and vector index types via comprehensions. Also add a
RefCartesianIndex method for interface boundary stencils.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implement ArrayDiscretization as a new discretization strategy that
pre-computes stencil information once and applies it per-point, bypassing
the expensive per-point scheme-detection logic used by ScalarizedDiscretization.
Supports 1D uniform grids with centered differences and Dirichlet BCs,
including higher-order stencils (approx_order=4) with boundary proximity
handling.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…x template

Replace per-point equation generation with a template-based approach:
- Build FD stencil and variable/grid rules once using a symbolic index
  _i from SymbolicUtils.idxs_for_arrayop(SymReal)
- Transform the PDE once to produce a template expression
- Instantiate at each interior point via pde_substitute(_i => k)

This gives O(1) symbolic construction cost for the centered-stencil
region instead of O(N). Near-boundary interior points and non-uniform
grids fall back to the per-point approach.

Add tests for uniform grid with parameter and symbolic structure
verification.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace hand-rolled centered-difference logic in _per_point_eqs with
delegation to discretize_equation_at_point from the scalar path. This
unlocks ALL scheme types for ArrayDiscretization: upwind, nonlinear
Laplacian, spherical, mixed derivatives, WENO, callbacks, and multi-D.

Key changes:
- _per_point_eqs now calls discretize_equation_at_point (12 lines vs 35)
- generate_array_interior_eqs adds can_template gate: uniform + even
  orders + 1D required for template path; otherwise full per-point
- Template validation: compares first template eq against scalar path
  to catch nonlinear Laplacian and other compound derivative forms
- New tests: upwind convection (Burgers), nonlinear diffusion, 2D
  diffusion — all using ArrayDiscretization

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… differences

Remove the length(interior_ranges) == 1 guard so 2D/3D uniform-grid PDEs with
even-order derivatives use the template path instead of per-point fallback.
Key changes:
- Per-dimension boundary-point-count vectors instead of scalars
- N symbolic indices (_i1, _i2, ...) with per-depvar dimension mapping
- CartesianIndices iteration for boundary frame and centered region
- Add 2D template-vs-scalar, 2D structural, and 3D comparison tests

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the template-instantiate approach (which produced N scalar equations
via pde_substitute loop) with actual SymbolicUtils.ArrayOp expressions. The
centered-stencil region now produces a single array-valued equation:
  D(u_arrayop) ~ stencil_arrayop

Key changes:
- Build RHS as ArrayOp parameterized by symbolic indices
- Build LHS as D(u_arrayop) with derivative outside the ArrayOp
  (scalarize/substitute cannot penetrate Differential nodes)
- Extract spatial RHS from rearranged PDE (lhs-rhs~0 form) by
  subtracting the known time-derivative template
- Validation still uses pde_substitute (which penetrates Differential)
- MTK currently scalarizes ArrayOps via flatten_equations, producing
  identical scalar equations; when MTK adds native array support,
  this will compile to vectorized loops

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add UpwindStencilInfo and precompute_upwind_stencils to cache both
wind-direction operators for odd-order derivatives. Implement
_build_upwind_rules using @rule pattern matching (mirroring the scalar
path's generate_winding_rules) to produce IfElse.ifelse wind-switching
expressions with ArrayOp-parameterized stencils. Add
_build_mixed_derivative_rules for cross-derivatives using Cartesian
products of 1D centered stencils. Relax can_template guard to allow
upwind (UpwindScheme) and mixed derivatives on uniform grids, with
upwind BPC included in boundary-proximity calculation.

Three new tests: upwind Burgers structural (ArrayOp produced), upwind
Burgers ArrayOp-vs-scalar match, and 2D mixed derivative ArrayOp-vs-scalar
match. All 15 array_disc_tests pass.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…itution

Phase 3: Add ArrayOp template support for the spherical Laplacian pattern
r^{-2} * Dr(r^2 * Dr(u)). This builds on the nonlinear Laplacian
infrastructure, combining centered D1 derivatives with the existing
half-offset nonlinlap scheme.

Key changes:
- _detect_spherical_terms: Uses split_additive_terms (not split_terms)
  to preserve the complete spherical expression as a single matchable term
- _spherical_template: Builds ArrayOp-indexed discretization using linear
  grid formula (avoids Const-wrapped Float64 vector indexing issues)
- _substitute_terms: New helper that processes additive terms individually,
  applying termlevel_dict to matched terms and rdict to unmatched terms.
  This avoids passing template values (containing Const-wrapped arrays)
  through pde_substitute, whose maketerm reconstruction tries to literally
  index concrete arrays with symbolic indices.
- Handles PDE pipeline rearrangement to lhs ~ 0 form by processing both
  sides of the equation via _substitute_terms.

Tests: 4 new spherical ArrayOp tests (basic, symbolic structure, with
coefficient, higher order approx_order=4). All existing tests pass.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Extend the ArrayOp template path to handle non-uniform grids for centered
(even-order) derivatives. Previously, any non-uniform grid triggered a full
per-point fallback. Now, per-point stencil weights are stored in a
Const-wrapped matrix indexed by symbolic grid indices, producing a single
ArrayOp equation instead of N scalar equations.

- Add weight_matrix field to StencilInfo for non-uniform weight storage
- Convert Vector{SVector} stencil weights to Matrix in precompute_stencils
- Relax can_template guard: only upwind/nonlinlap/spherical require uniform
- Index weight matrix with symbolic point index aligned to boundary frame
- Add 5 non-uniform grid tests (1D, 2D, coefficient, higher order, structure)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fix a prerequisite bug where the non-uniform CompleteUpwindDifference
constructor overrode the caller's offside parameter to 0, causing the
positive-wind operator to store incorrect offside in the
DerivativeOperator. This was invisible on the scalar path due to a
compensating assertion, but blocked correct ArrayOp indexing.

Changes:
- upwind_diff_weights.jl: Remove offside=0 override (line 154)
- upwind_difference.jl: Remove @Assert D.offside==0, fix stencil_coefs
  indexing to use II[j] - D.offside for the !ispositive branch
- generate_array_fd_rules.jl: Extend UpwindStencilInfo with weight
  matrices, populate them for non-uniform grids, update
  _upwind_stencil_expr to index into Const-wrapped weight matrices
  using symbolic grid indices, fix boundary frame to use per-side
  boundary point counts, remove upwind from schemes_needing_uniform
- array_disc_tests.jl: Add 5 Phase 8 tests verifying scalar path fix,
  ArrayOp-vs-scalar match, symbolic structure, parameter support, and
  advection-diffusion combo on non-uniform grids

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Build the WENO5 (Jiang-Shu) formula as a native symbolic ArrayOp
template, enabling vectorized computation for WENO first-order
derivatives on uniform grids. The template directly transcribes
weno_f using Const-wrapped array taps — no branching, all substencil
reconstructions computed and soft-blended via nonlinear weights.

Changes:
- Add WENOStencilInfo struct and precompute_weno_stencils
- Add _weno_template (WENO5 formula as symbolic expression)
- Add _build_weno_rules (term-level pattern matching for Dx(u))
- Relax precompute_upwind_stencils guard to also populate for
  FunctionalScheme windmap operators (odd orders >= 3)
- Update generate_array_interior_eqs with weno_cache, has_weno
  in can_template, WENO BPC in boundary frame
- Update _build_interior_arrayop to build and include WENO rules
- Add 5 Phase 10 tests: basic convection, coefficient-multiplied,
  mixed advection+diffusion, symbolic structure, 3rd-order coexistence

After this, only generic user-defined FunctionalScheme falls back
to per-point computation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…able, Neumann/Robin BCs

Remove !haslower/!hasupper guards from BPC computation in
generate_array_interior_eqs so boundary_point_count is always accumulated
regardless of BC type. This fixes ArrayDiscretization for periodic BCs
where haslower/hasupper returns (true, true).

Add Phase 11 tests (11a-11k): periodic BC centered/WENO/upwind, coupled
multi-variable diffusion and advection-diffusion, Neumann and Robin BCs,
and symbolic structure verification for each category.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Part A: Validate N-D ArrayOp with 2D upwind advection-diffusion,
2D coupled diffusion, and 2D symbolic structure tests.

Part B: Add periodic BC support to ArrayOp template so periodic PDEs
use vectorized computation instead of per-point fallback. Adds
_wrap_periodic_idx helper, periodic dimension detection, and wrapping
across all 6 scheme types (centered, upwind, WENO, nonlinear Laplacian,
spherical Laplacian, mixed cross-derivatives).

All 63 test sets pass with 0 failures.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… tests for ArrayOp

Fix _build_interior_arrayop to handle algebraic equations (PDAE) by detecting
absence of time derivatives and falling back to per-point discretization.
Refactor ODE coefficient extraction to use a fast path for standard Dt
coefficient (+1) that avoids pde_substitute on Const-wrapped arrays, with
fallback to substitution-based extraction for non-standard forms (e.g. v ~ Dt(u)).

Add 8 validation tests (Phase 13): PDAE diffusion+algebraic, KdV 3rd-order
upwind, mixed advection+diffusion+dispersion, beam equation (4th-order coupled),
Brusselator 2D periodic, Schrödinger complex-valued PDE, and symbolic structure
tests for higher-order upwind and 4th-order spatial derivatives.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Eliminate per-point scalar fallbacks for boundary-proximity frame points
and algebraic (PDAE) equations:

- Algebraic equations now produce ArrayOp directly instead of returning
  nothing and falling back to per-point discretization
- New FullInteriorStencilInfo/FullInteriorUpwindStencilInfo structs with
  position-dependent weight+offset matrices that fold boundary stencils
  from DerivativeOperator into a single ArrayOp covering ALL interior points
- Conditional frame elimination: when equation has only centered+upwind
  derivatives (no nonlinlap/spherical/WENO/periodic), the frame per-point
  loop is skipped entirely
- Nonlinlap/spherical/WENO equations correctly retain frame per-point behavior
- Phase 14 tests (E1-E8) covering PDAE, full-interior centered/upwind,
  higher-order, 2D, non-uniform, mixed, and nonlinlap regression

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Eliminate `has_nonlinlap` from the full-interior gate condition so that
nonlinear Laplacian PDEs (Dx(expr * Dx(u))) use position-dependent
weight+tap matrices in a single ArrayOp covering all interior points,
removing the per-point frame equations at boundary-proximity points.

Key changes:
- Add FullNonlinlapInfo struct and precompute_full_nonlinlap() to build
  3D weight+tap matrices indexed by (j_outer, j_inner/interp, grid_point)
- Add _nonlinlap_full_template() using Const-wrapped 3D matrices
- Fix padded outer tap 0/0 division: set first interp weight to 1.0 for
  zero-weight padded taps so expr_sym with negative powers (e.g. u^(-1))
  doesn't produce NaN
- Gate condition now: !(has_spherical || has_weno || nonlinlap_nonuniform)
  && !any(is_periodic)
- Spherical and non-uniform nonlinlap deferred to future phases
- Add Phase 15 tests (E1-E8) for nonlinlap/spherical full-interior

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove non-uniform grid guard from precompute_full_nonlinlap, enabling
  full-interior mode for nonlinlap on non-uniform grids
- Add D1 stencil caching in precompute_stencils for spherical variables
  so precompute_full_interior_stencils builds FullInteriorStencilInfo for
  the first derivative used by _spherical_template
- Update gate condition to only block on WENO and periodic dimensions
- Compute fi_nonlinlap when has_spherical (spherical templates use
  nonlinlap infrastructure internally)
- Add Phase 16 tests: non-uniform nonlinlap (1D, 2D), spherical
  non-uniform, and spherical+centered 2D

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Enable periodic BCs on uniform grids to use the full-interior ArrayOp path,
eliminating boundary-proximity frame equations. Non-uniform periodic grids
fall back to the standard path.

Changes:
- Gate condition: `all_full_interior` now allows periodic on uniform grids
- Stencil precomputation: periodic branches in centered, upwind, and nonlinlap
  precompute functions always use interior stencils
- ArrayOp wrapping: apply `_maybe_wrap` to full-interior index expressions
- Validation skip: skip `_equations_match` for periodic (different wrapping structure)
- Add `_wrap_grid_periodic` and `_wrap_half_periodic` integer wrapping helpers
- Add Phase 17 tests (17a-17g): structural, numerical, upwind, 2D, mixed, nonlinlap

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ase 18)

The standard ArrayOp path had an index mapping bug for periodic non-uniform
grids: weight matrices had N-2*bpc columns but iteration covered N points,
causing out-of-bounds access. Fix by building extended N-column weight
matrices with wrapped grid positions for boundary-proximity stencils, and
adjusting point_idx to map directly without bpc offset for periodic dims.

Changes:
- Add _periodic_stencil_positions, _build_periodic_centered_wmat, and
  _build_periodic_upwind_wmat helper functions
- Fix centered stencil path in _build_interior_arrayop for periodic non-uniform
- Fix upwind stencil path in _upwind_stencil_expr for periodic non-uniform
- Fix mixed derivative path in _build_mixed_derivative_rules for periodic non-uniform
- Add Phase 18 tests (18a-18f) covering structural, diffusion, upwind,
  2D mixed, higher-order, and regression scenarios
- All 300 array_disc_tests pass with zero regressions

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Staggered grids are the last remaining discretization scheme without
ArrayOp support. This adds ArrayOp-based discretization for staggered
grids used in hyperbolic PDE systems (e.g., wave equations) where two
dependent variables live on offset sub-grids.

- Add StaggeredStencilInfo struct and precompute_staggered_stencils
- Generate fixed-offset stencils based on variable alignment
  (CenterAligned → [0,1], EdgeAligned → [-1,0])
- Gate staggered into can_template; disable upwind rules when active
- Add Phase 19 tests: structural, numerical (mixed/periodic BCs),
  and SplitODEProblem compatibility

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Per SciML style guide, untyped struct fields must explicitly be annotated
with ::Any rather than left implicitly untyped.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add _ConstSR type alias to replace 34 verbose type references
- Extract _stencil_coefs_to_matrix utility (replaces 10 hcat/splat patterns)
- Merge duplicate periodic wmat builders into single _build_periodic_wmat
- Replace non-deterministic rand() in _equations_match with fixed test offsets
- Hoist allocations out of precompute loops (stencil weights, grid collects)
- Cache _weno_template calls to avoid duplicate computation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@ctessum

ctessum commented Mar 3, 2026

Copy link
Copy Markdown
Contributor
combined_speedup

@ctessum

ctessum commented Mar 3, 2026

Copy link
Copy Markdown
Contributor

Second attempt!

I don't know exactly what's going on with the brusselator, and I think there are some opportunities for refactoring, but I figured this would be a good point to start a conversation. It's not yet quite at constant time discretization, but it seems categorically faster than the pointwise implementation.

I am told that @arrayop and @makearray don't work here because the indexes are determined at runtime rather than parse-time (?), so ArrayOp and MakeArray are used instead.

Let me know what you think.

@ctessum-claude ctessum-claude marked this pull request as ready for review March 3, 2026 15:44
offsets = interior_offsets
elseif g <= bpc
# Lower frame: use low_boundary_coefs[g]
weights = Vector{Float64}(D_op.low_boundary_coefs[g])

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-generic?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be fixed by 4623342

ctessum-claude and others added 4 commits March 3, 2026 21:28
Replace hardcoded Float64 with type parameter T<:Real in 7 structs
(StencilInfo, UpwindStencilInfo, NonlinlapStencilInfo, WENOStencilInfo,
FullInteriorStencilInfo, FullInteriorUpwindStencilInfo, FullNonlinlapInfo)
and ~30 allocation sites. This enables Float32 or other Real subtypes
to propagate through DerivativeOperator{T} into weight matrices.

Key changes:
- Add {T<:Real} type parameter to all struct definitions
- Replace Vector{Float64}(...) with collect(...) to preserve element type
- Replace zeros(Float64,...) with zeros(T,...) using T from operator
- Add _op_eltype helper to extract T from DerivativeOperator{T}
- Use explicit type parameter at construction sites with nothing fields

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
6 tests covering the type-generic changes:
- Unit tests for struct construction and helper functions with Float32
- 1D centered diffusion with Float32 grid spacing
- Float32 vs Float64 solution comparison
- Upwind advection with Float32
- Nonlinear diffusion (nonlinlap) with Float32
- WENO advection with Float32 (using exact binary dx=0.125)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Move StableRNGs from [deps] to test-only (it's unused in src/)
- Extract _validate_arrayop_or_fallback helper to deduplicate validation logic
- Move _contains_time_diff to module level (pure utility, no closure capture)
- Remove dead code: unused bs/haslower/hasupper in BPC loop
- Remove unnecessary enumerate() where index was discarded
- Tighten StaggeredStencilInfo.alignment from ::Any to ::Type{<:AbstractVarAlign}

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Equations without spatial derivatives (e.g., algebraic equations like
R(t,x) ~ f(params)) were incorrectly falling back to per-point scalar
discretization when the system also contained PDEs with odd-order spatial
derivatives. This happened because `can_template` used the global
`derivweights.orders[x]` which includes derivative orders from ALL PDEs,
and the scheme-specific caches (WENO, upwind, etc.) had no entries for
variables without spatial derivatives.

Add `_contains_spatial_diff` helper and an early-return fast path in
`generate_array_interior_eqs` that bypasses the `can_template` gate and
boundary frame computation for equations with no spatial derivatives,
producing a single ArrayOp instead of N×M scalar equations.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ctessum-claude and others added 12 commits March 27, 2026 00:03
…tude)

Move upwind fallback rules from termlevel_rules to fd_rules so that
bare derivatives like Dx(u) inside nonlinear expressions (e.g.
sqrt(Dx(u)^2 + Dy(u)^2)) are substituted via pde_substitute, matching
the scalar path's generate_winding_rules fallback behaviour.

Also extend precompute_stencils and the fd_rules loop to include
odd-order centered stencils, and broaden can_template for non-
FunctionalScheme advection schemes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace `boundary_f = [nothing, nothing]` in WENOScheme with actual
boundary stencil functions so WENO frame points produce correct
derivative approximations instead of falling back to extrapolation.

The four boundary functions use 3rd-order one-sided finite differences
(Fornberg weights on a 4-point stencil) at the 2 grid points nearest
each domain boundary where the full 5-point WENO5 stencil does not fit.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
For multi-dimensional problems, periodic BC equations are now generated
as a single ArrayOp expression instead of one scalar equation per edge
point. This reduces symbolic processing overhead for 2D+ periodic BCs.

The ArrayOp tiles `disc1[boundary, tang...] ~ disc2[boundary+offset, tang...]`
along the tangential dimensions. Falls back to the scalar path for 1D
(single-point edges).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
For multi-dimensional problems on CenterAligned and Staggered grids,
truncating BC equations (Dirichlet, Neumann, Robin) are now generated as
single ArrayOp expressions instead of one scalar equation per edge point.

The implementation directly builds symbolic substitution rules using
ArrayOp index variables for tangential dimensions, handling:
- Variable value maps (u at boundary -> Const(discvar)[boundary, tang...])
- Derivative stencils (Dx(u) -> weighted sum with symbolic tangential indices)
- Grid coordinate maps (x -> boundary value, y -> Const(grid)[tang...])

Validates against the scalar path at the first edge point and falls back
to scalar for 1D, EdgeAlignedGrid, or HigherOrderInterfaceBoundary.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace simple isequal check with the robust _equations_match function
(symbolic + numerical comparison) for validating BC ArrayOp equations
against the scalar path. Falls back to scalar if validation fails.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
In generate_array_interior_eqs, the fast path for equations without spatial
derivatives was calling precompute_stencils a second time instead of reusing
the already-computed stencil_cache. Also hoist lo_vec/hi_vec extraction
before the branching logic so both the fast path and the N-D ArrayOp path
share a single computation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add `validate` field to ArrayDiscretization (default: true for
  correctness). When false, skips scalar-path validation of ArrayOp
  equations and boundary conditions, saving ~5-15% discretization time.
  Usage: ArrayDiscretization(validate=false)

- Defer stencil cache precomputation (upwind, nonlinlap, WENO,
  staggered) until after the fast-path returns for equations without
  spatial derivatives and the can_template check. This avoids
  unnecessary work for algebraic equations and equations that fall
  back to per-point scalar discretization.

- Use lightweight checks (halfoffsetmap key existence, staggeredvars
  presence, advection_scheme type) for the can_template decision,
  then refine with actual cache emptiness after precomputation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Applies the review fixes from the PR feedback without changing any
numerical behaviour:

- generate_array_fd_rules.jl: rework `_validate_arrayop_or_fallback`
  to sample up to three local points (first / mid / last) via a
  closure returned from `_build_interior_arrayop`, so position-
  dependent stencil bugs can be caught instead of only checking the
  first centered point.  Also initialises `eq_scalar` deterministically
  and removes the short-circuit dependency for periodic dims.
- _equations_match: expose the numerical comparison tolerance as a
  keyword and document its rationale.
- _wrap_periodic_idx: expand the docstring to spell out the 2:N
  canonical-range convention and how downstream consumers should read
  index 1.
- WENO.jl / advection_schemes.md: call out that `WENOScheme()` now
  uses 3rd-order one-sided Fornberg stencils at non-periodic
  boundaries, so users running convergence studies know the boundary
  order is 3 rather than 5.
- upwind_diff_weights.jl / upwind_difference.jl: add explanatory
  comments around the `offside = 0` removal and the non-uniform
  `stencil_coefs[II[j] - D.offside]` indexing, tying them back to the
  bug fix that made ArrayOp non-uniform upwind possible.
- spherical_laplacian.jl: update the remaining `s.discvars[v][I]`
  spot in `rsubs` to go through `_disc_gather` for consistency.
- array_disc_tests.jl: relax the KdV Array-vs-Scalar tolerance to
  `atol = 2e-2`.  `rtol = 1e-10` is unachievable for adaptive FBDF
  on a nonlinear dispersive PDE because tiny RHS differences (the
  scalar per-point build vs. ArrayOp-flattened build have different
  associativity) get amplified by timestep selection.  The same
  failure reproduces on the pre-fix branch.

Verified with GROUP=ArrayDisc (358 passed), GROUP=Higher_Order,
GROUP=Convection_WENO (309 passed), GROUP=Nonlinlap_ADV,
GROUP=Wave_Eq_Staggered, GROUP=Burgers, GROUP=Convection.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Five low-risk cleanups that reduce duplication and remove a couple of
internal-API dependencies without changing any numerical behaviour.

1. Project.toml: bump PDEBase compat from 0.1.20 to 0.1.22.  The PR
   already uses `pde_substitute`, which was only added in PDEBase
   0.1.22 — previous minimums would resolve to a version that crashes
   at discretization time with an UndefVarError.

2. Replace `SymbolicUtils.BSImpl.Const{SymReal}` with
   `SymbolicUtils.Const{SymReal}` throughout.  The top-level
   `SymbolicUtils.Const` alias is the least-internal path to the
   Const ctor (Symbolics.jl imports it directly at line 28); reaching
   through `BSImpl` ties us to Moshi's `@data` module layout, which
   is the most likely identifier to move in a SymbolicUtils minor
   rewrite.  Also drops the redundant local `_ConstSR` alias in
   generate_bc_eqs.jl in favour of the one defined in
   generate_array_fd_rules.jl.

3. Replace hand-rolled `_contains_time_diff` / `_contains_spatial_diff`
   tree walkers with thin wrappers over `PDEBase.differential_order`.
   Drops ~20 lines and aligns the recursion with the rest of the
   PDEBase pipeline so upstream fixes (e.g. Symbolics v7 `.order`
   handling) are picked up automatically.

4. Introduce `ArrayOpContext` and `StencilCaches` immutable structs
   to bundle the state that every `_build_*_rules` helper repeats in
   its signature.  Before: 15 positional args on
   `_build_interior_arrayop`, an 8-tuple copy-pasted across eight
   rule-builder signatures, and `is_periodic=falses(length(bases))`
   default placeholders duplicated 8 times.  After: all rule builders
   take `(ctx::ArrayOpContext, caches::StencilCaches, pde, ...)` plus
   a small handful of call-site-specific args.  The `sample_at`
   closure in `_build_interior_arrayop` now captures `ctx` by
   reference, so the old tuple of loose parameters threaded through
   every helper is gone.

5. Extract a single `_tap_expr(ctx, u_c, u_spatial[, x, off])` helper
   to replace 11 copies of the
   `taps = map(offsets) do off; idx_exprs = map(u_spatial) do xv … end`
   idiom across `_build_interior_arrayop`, `_build_upwind_rules`,
   `_build_mixed_derivative_rules`, `_nonlinlap_template`,
   `_nonlinlap_full_template`, `_spherical_template`, and
   `_weno_template`.  Also deletes the parallel
   `_upwind_stencil_expr`/`_upwind_full_interior_expr` inner helpers
   and folds their work through `ctx`.  Three-method dispatch:
   - no-args variant for `var_rules` (at-point lookup, no wrap),
   - `(x, off)` variant for single-dim stencil shifts (wrapped),
   - `(shifts::Dict)` variant for mixed cross-derivatives (wrapped).

   Note on wrapping: an earlier version of this refactor routed
   `var_rules` through `_maybe_wrap` too, which silently broke the
   periodic tests — wrapping the at-point index introduces
   `IfElse.ifelse` branches that `pde_substitute`'s `maketerm`
   reconstruction cannot traverse (the same workaround reason as
   `_substitute_terms`).  The no-shift `_tap_expr` therefore
   intentionally skips wrapping, and its docstring explains why.

Review refactors 6 (struct-zoo parametric cleanup), 7 (collapse
full-interior vs standard code paths), and 8 (file split into an
array_fd/ subdirectory) are deliberately deferred to a follow-up PR
— 7 interacts with 6 and would land a ~250-line rewrite of the
most intricate part of the ArrayOp builder, and 8 is a mechanical
move that is best done on top of a stable 6+7 base.

Verified with GROUP=ArrayDisc (358 passed / 0 failed / 0 errored).
No other test groups were re-run; refactors 1-5 are all pure refactors
(no behaviour changes) so the earlier broader verification from the
previous commit still applies.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
`UpwindStencilInfo` previously flattened seven parallel fields
(`D_neg`, `D_pos`, `neg_offsets`, `pos_offsets`, `is_uniform`,
`neg_weight_matrix`, `pos_weight_matrix`) into one struct.  Each half
is shape-identical to a centred `StencilInfo`, so the struct now
holds two `StencilInfo` fields — `neg::StencilInfo{T}` and
`pos::StencilInfo{T}` — and the tap-building helpers treat both wind
directions uniformly.

Mechanical changes:
- `precompute_upwind_stencils` builds two `StencilInfo`s and wraps
  them in an `UpwindStencilInfo`.
- `precompute_full_interior_upwind`, `_build_interior_arrayop`'s
  per-dim boundary-point-count computation, and `_build_upwind_rules`
  (both primary and fallback paths) update their field accesses:
  `usi.D_neg` → `usi.neg.D_op`, `usi.neg_offsets` → `usi.neg.offsets`,
  `usi.neg_weight_matrix` → `usi.neg.weight_matrix`, etc.  `is_uniform`
  now lives on each half (the two halves always have the same value).
- `StencilInfo`'s docstring is updated to reflect that it is now
  reused for upwind halves in addition to centred stencils; the
  field comments describe what "offsets" means for each case.

Benefit: removes 6 parallel field names, eliminates the duplicated
`is_uniform::Bool` field, and clears a small piece of structural
duplication in `_upwind_stencil_expr`'s call sites.  No behaviour
change.

Verified with GROUP=ArrayDisc (358 passed / 0 failed).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The previous monolithic file was 2955 LOC and mixed stencil
precomputation, PDE pattern detection, rule-building for seven
different discretization schemes, ArrayOp construction, validation,
and fallback logic.  Split into thirteen thematic files under
`src/discretization/array_fd/`:

- stencil_info.jl       (103 LOC)  — StencilInfo / UpwindStencilInfo /
                                     NonlinlapStencilInfo / WENOStencilInfo /
                                     StaggeredStencilInfo
- context.jl            (145 LOC)  — _ConstSR alias, ArrayOpContext,
                                     StencilCaches, _tap_expr variants,
                                     _op_eltype
- precompute.jl         (860 LOC)  — all precompute_* caches, the
                                     FullInterior* structs and their
                                     precomputers, _half_op/_half_inner
                                     helpers, stencil_weights_and_taps
- pattern_detection.jl  (105 LOC)  — _detect_nonlinlap_terms,
                                     _detect_spherical_terms
- validation.jl         (143 LOC)  — _equations_match, _local_sample_indices,
                                     _validate_arrayop_or_fallback
- interior_driver.jl    (275 LOC)  — generate_array_interior_eqs orchestrator
- substitution_helpers.jl (85 LOC) — _substitute_terms, _wrap_periodic_idx,
                                     _maybe_wrap, _contains_*_diff
- arrayop_builder.jl    (276 LOC)  — _build_interior_arrayop
- rules_upwind.jl       (192 LOC)
- rules_mixed.jl         (95 LOC)
- rules_nonlinlap.jl    (341 LOC)  — _nonlinlap_template,
                                     _nonlinlap_full_template,
                                     _build_nonlinlap_rules
- rules_spherical.jl    (156 LOC)
- rules_weno.jl         (152 LOC)

`generate_array_fd_rules.jl` is now a 68-line stub that `include`s
each subfile in a dependency-respecting order, with a comment
explaining why the order matters (struct-dispatched functions need
their types defined upstream; function calls resolve at runtime and
so tolerate forward references).

No behaviour change — this is a purely mechanical reshuffle.  The
new layout mirrors the existing `discretization/schemes/*/` pattern
so future contributors have one file per scheme to add/modify instead
of one 3000-line monolith.

Verified with GROUP=ArrayDisc (358 passed / 0 failed).

Refactor 7 (collapse full-interior vs standard code paths) is
deliberately deferred — it has plausible but unverified performance
implications (position-dependent matrix-indexed weights vs constant
dot products in the hot path) and the code-complexity win is more
than offset by the risk during a session without a benchmarking
harness.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
JET's `@report_opt target_modules=(MethodOfLines,)` on a cold
`discretize(pdesys, ArrayDiscretization())` flagged two concrete
runtime-dispatch sites inside the PR's own code.  Both are build-time
latency issues, not hot-path; even unfixed, ArrayDiscretization
already runs 3.6× faster with 3× fewer bytes than the scalar
baseline on a 32-point 1D diffusion test.  These patches remove the
dispatch churn anyway.

1. `_to_scalar_interiormap` (src/array_discretization.jl) — the
   output dict was constructed as plain `Dict()`, so every loop
   iteration hit runtime dispatch on `indexed_iterate`, `isempty`,
   `first`, `setindex!`, and `iterate` (JET flagged 9 sites in the
   ~15-line function).  Narrowed to `Dict{Any, CartesianIndices}`.
   The key stays `Any` because the upstream `InteriorMap.I` uses
   heterogeneous PDE-equation keys.  Also refactored the loop body
   so `scalar_I[pde] = CartesianIndices(...)` always assigns the
   same value type — the previous branch assigned a bare `ranges`
   in the fallback arm, which would have forced a type widening.

2. `_validate_arrayop_or_fallback`
   (src/discretization/array_fd/validation.jl) — all positional
   arguments were typed as `Any`, so
   `ntuple(d -> lo[d] + local_idx[d] - 1, ndim)` widened to `Tuple`
   and the downstream `CartesianIndex(...) ->
   discretize_equation_at_point(II::Any, ...)` chain hit runtime
   dispatch.  Added `n_region/lo/hi::AbstractVector{Int}`,
   `ndim::Int`, `is_periodic::AbstractVector{Bool}`, `validate::Bool`,
   and wrapped `ntuple` in `Val(ndim)` so it specializes on the
   static dimension count.

Several medium-severity struct field types (`StencilCaches`
`Dict{Any, Any}`, `NonlinlapStencilInfo.{outer,inner,interp}_weights::Any`,
`StaggeredStencilInfo.alignment::Type{<:AbstractVarAlign}`,
`ArrayOpContext.idxs::Vector` untyped eltype) are deliberately left
for a follow-up.  They are all build-time only, and the biggest
upstream blocker — `DiscreteSpace.discvars::Any` — is outside this
PR's scope.

Verified with GROUP=ArrayDisc (358 passed / 0 failed).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants