Skip to content

Refactor C translation around an explicit ABI and pointer model#1

Merged
DominicPM merged 58 commits intomainfrom
pointer-struct-field
Mar 24, 2026
Merged

Refactor C translation around an explicit ABI and pointer model#1
DominicPM merged 58 commits intomainfrom
pointer-struct-field

Conversation

@DominicPM
Copy link
Owner

No description provided.

Introduce `C_Translation_Model` and move the C frontend toward a
locale-provided pointer/memory interface instead of hard-wiring one
concrete pointer representation into translation.

This change:
- adds `c_translation_model` with pointer arithmetic, comparisons,
  pointer/int casts, and ABI parameters
- extends `C_To_Core_Translation` to track the reference universe
  (`addr`/`gv`) explicitly and resolve pointer-heavy types against it
- improves typedef resolution, record type resolution, and support for
  parametric/generated struct types
- distinguishes pure helper calls from side-effecting calls during
  translation
- adds signed pointer shifting support (`c_ptr_shift_signed`)
- tightens pointer, cast, member-access, and array-index translation,
  including struct-field access through pointers and raw `void *`
  handling
- updates the memory smoke tests to exercise the nat-addressed model
  explicitly

This is groundwork for translating C against an abstract machine model
while keeping pointer semantics and ABI assumptions explicit in the
target locale.

Signed-off-by: Dominic Mulligan <dominic.p.mulligan@gmail.com>
type_rank gave CLong/CULong and CLongLong/CULongLong the same rank (4),
violating C11 6.3.1.1p1 which requires long long to rank higher than
long. Under ILP32 where long is 32-bit and long long is 64-bit, this
caused usual_arith_conv to return the wrong (narrower) type.

Also fix the mixed-signedness branch of usual_arith_conv to check actual
bit widths per C11 6.3.1.8 rules 2+3. The old code blindly picked the
signed type without checking whether it could represent all unsigned
values — wrong under ILP32 where e.g. long and unsigned int are both
32-bit.

Both bugs were masked under LP64 (long == long long == 64-bit).
…lations

Smoke tests for struct-with-array-field patterns (e.g. &h->vec[i],
array decay from struct fields) failed with "Undeclared class:
Separation_Algebra.sepalg". The struct record declaration succeeded but
the subsequent function translation needed the sepalg class in scope.

Adding Separation_Algebra as a direct import to the two affected smoke
test theories fixes all 16 errors. The session dependency is already
transitively available via Shallow_Micro_C.
Tests long+long_long rank ordering and unsigned_int+long mixed-signedness
conversion under abi: ilp32-le, where long is 32-bit and long long is
64-bit. Also includes sizeof sanity checks and ABI profile assertions.

Added to both ROOT and AutoCorrode.thy so it runs in batch builds and
jEdit sessions.
c_signed_shr models GCC/Clang arithmetic (sign-extending) right shift,
which is implementation-defined per C11 6.5.7p5. The new
c_signed_shr_conservative variant aborts with SignedOverflow when the
operand is negative, for use with the conservative compiler profile
where no specific implementation-defined behavior is assumed.

Includes equational WP rule (wp_c_signed_shr_conservative) and intro
rule (wp_c_signed_shr_conservativeI) following the existing pattern.
c_scast silently truncates when narrowing (two's complement), which is
implementation-defined per C11 6.3.1.3p3. The new c_scast_checked
variant aborts with SignedOverflow when sint(w) does not fit in the
target type's signed range, for use with the conservative compiler
profile.

Includes equational WP rule (wp_c_scast_checked) and intro rule
(wp_c_scast_checkedI) following the existing pattern.
Adds C_Compiler structure with profiles controlling three C11
implementation-defined behaviors:
- char signedness (signed on x86, unsigned on ARM)
- signed right shift (arithmetic vs conservative/abort)
- signed narrowing cast (truncating vs checked/abort)

Known profiles: gcc-x86_64, clang-x86_64 (signed char, arithmetic shr,
truncating), gcc-aarch64, clang-aarch64 (unsigned char, arithmetic shr,
truncating), conservative (unsigned char, abort on negative shr, abort
on narrowing overflow).

Default (no compiler: specified) preserves existing behavior: unsigned
char, arithmetic shr, truncating narrowing.

Also refactors micro_c_file to reuse micro_c_translate option parsing
instead of duplicating it.
Tests char signedness (gcc-x86_64 vs gcc-aarch64), signed right shift
(arithmetic vs conservative), and signed narrowing cast (truncating vs
checked) under different compiler: profiles.

Added to both ROOT and AutoCorrode.thy.
Tests basic switch/case/default, fallthrough between cases, and switch
without default. This was a significant gap — switch had translation
infrastructure but zero test coverage.
alignof_c_type returned min(sizeof, 8) = 8 for CInt128/CUInt128, but
Isabelle c_alignof_int128 returns 16. Pattern-match these two cases
to return 16, matching the Isabelle side and real ABI requirements.
Two changes:

1. c_schar (8 sword) had no sizeof/alignof instances, unlike every other
   C numeric type. Add c_sizeof_schar (= 1) and c_alignof_schar (= 1)
   with simp lemmas.

2. Fix sporadic "missing struct field accessor constant" concurrency bug.
   The ML translation pipeline uses ~30 global Unsynchronized.ref cells
   across C_Ast_Utils, C_Translate, C_Compiler, and C_Def_Gen. When
   Isabelle processes multiple theories in parallel, concurrent
   micro_c_translate commands clobber each other's state. Serialize all
   translations through a Synchronized.var mutex.
C_ABI.char_is_signed is not used by the translation pipeline; plain-char
signedness is controlled by C_Compiler.get_compiler_profile (set via the
compiler: option). Update comments at both sites to clarify this.
Both signed and unsigned division/modulo hardcode c_abort (to construct
DivisionByZero), unlike the other unsigned operations which are
polymorphic in 'abort. Add text blocks explaining why and noting the
consequence for locale abort type unification.
Both functions had 3 identical copies across C_Ast_Utils, C_Translate,
and C_Def_Gen. Move the canonical definitions to C_Ast_Utils (exported
in the signature) and replace copies with val aliases.
Smoke tests for string literals, compound literals, sizeof, _Alignof,
struct-array access, and _Generic selection. These features were
previously untested at the translation level.
Extract the duplicated option processing + global ref setup from
micro_c_translate and micro_c_file into a shared helper. Both commands
call setup_translation_context, then do their source-specific work
(inline source vs file loading + manifest).
Introduce fold_c_expr, fold_c_init, and fold_c_stmt — generic
post-order folds over C AST nodes that factor out the recursive
traversal structure shared by all AST analysis passes.

Rewrite 7 walkers (expr_reads_vars, expr_written_vars,
find_assigned_vars, find_goto_targets, find_called_functions,
find_indexed_base_vars, find_named_calls_with_args) as thin
handler functions over the generic fold, removing ~250 lines of
duplicated pattern matching. Remove dead find_loop_written_vars
from C_Ast_Utils (superseded by find_loop_written_vars_local in
C_Def_Gen).

Complex walkers (expr_has_side_effect_with, find_list_backed_aliases)
are left unchanged — they thread context or environment state that
doesn't fit the simple fold pattern.
Exercises c_signed_add end-to-end: C source → translation →
contract with c_signed_in_range precondition → crush_boot/crush_base
proof. This fills the gap identified in the audit where no signed
arithmetic operation had a fully verified example demonstrating the
overflow precondition pattern.
…tion

Exercises c_unsigned_div end-to-end: C source → translation →
contract with b ≠ 0 precondition → crush_boot/crush_base proof.
This fills the gap where no division/modulo operation had a fully
verified example.
Verify u_call_helper (identity) and u_call_caller (calls helper
twice and adds results) with composing contracts. The caller's
proof automatically uses the helper's [crush_specs]-tagged spec,
demonstrating the cross-function verification pattern.
C_ABI_And_Compiler.thy      (~125 lines) — ABI and compiler profiles
C_Ast_Utilities.thy         (~1340 lines) — AST walking, type resolution, generic fold
C_Translation_Engine.thy    (~5510 lines) — translation context, term building, expr/stmt translation
C_Definition_Generation.thy (~1070 lines) — function definition generation, command registration

C_To_Core_Translation.thy is retained as a re-export stub so that
all downstream imports continue to work unchanged.
The abort: parameter for micro_c_translate/micro_c_file was never
meaningfully used with a type other than c_abort. Remove the option
and extract the abort type directly from reference_types in the
locale context (falling back to c_abort outside locales). This
removes the TranslateAbortTy variant, the abort field from the
options record, and the abort: keyword.

The one usage in C_Union_Examples (abort: 'abort) was unnecessary
since the locale's reference_types already constrains the abort
type correctly.
The upstream AFP Isabelle_C (mirror-afp-2025-1) has a bug in
C_Parser_Language.thy where transformDeclaration errors on function
parameter declarations with 3+ type specifiers (e.g. unsigned long
long). The local dependencies/afp/Isabelle_C copy already has a
fix, but CI checks out the AFP separately and hits the bug.

Add a patch file and apply it in the setup-isabelle-action after
AFP checkout. Document the requirement in dependencies/afp/README.md
so users setting up AutoCorrode locally know to apply it.
…files

- Translate pointer relational comparisons (< <= > >=) using c_ptr_less/
  c_ptr_le/c_ptr_greater/c_ptr_ge from C_Memory_Operations, with focused
  type unwrapping following the pointer subtraction pattern
- Add verified UB detection examples: signed overflow (SignedOverflow),
  division by zero (DivisionByZero), shift out of range (ShiftOutOfRange)
  using make_function_contract_with_abort
- Add LLP64_LE (Windows x64: long=32bit, ptr=64bit) and ILP32_BE (32-bit
  big-endian) ABI profiles with smoke tests and metadata validation
- Add ABI-parameterized c_sizeof_c_long/c_sizeof_c_pointer in c_abi_model
  locale for correct manual specifications across ABIs
- Document c_signed_mod quotient overflow check (C11 6.5.5p6: INT_MIN%-1)
- Add smoke tests for nested loops with break/continue, multi-function
  calls, enum constants in expressions, pointer subtraction, and pointer
  relational comparisons
…gnas, builtins, CChars0

Features implemented:

1. _Static_assert: evaluate constant expressions at translation time
   (sizeof, _Alignof, comparisons, arithmetic). Handled at both
   top-level (C_Definition_Generation) and block-level
   (C_Translation_Engine). New eval_const_int_expr + extract_string_literal
   utilities in C_Ast_Utilities.

2. Range case labels (case 1 ... 5:): new case_label datatype
   (SingleCase/RangeCase/DefaultCase) replaces option encoding.
   CCases0 handled in unwrap_case_labels, extract_switch_groups,
   make_switch_cond, make_any_case_match. Generates lo <= x AND x <= hi.

3. _Alignas specifier: explicit CAlignSpec0/CFunSpec0 patterns in
   resolve_c_type accumulator (was already silently ignored by catch-all,
   now documented).

4. __builtin_types_compatible_p: resolves type compatibility at
   translation time, produces constant 0/1. Other GCC builtins
   get explicit rejection message.

5. Multi-character constants (CChars0): explicit rejection with clear
   error message in translate_expr and init_scalar_const_value (both
   C_Translation_Engine and C_Definition_Generation).

Smoke tests in C_Misc_Examples.thy with verified contracts.
The makarius/isabelle container doesn't have git installed, and
actions/checkout downloads a tarball (no .git dir) for external repos.
Replace `git apply` with `patch -p1` which handles unified diffs.
- Add verified param_compound (compound assign x += y) and param_inc
  (post-increment x++) examples demonstrating parameter auto-promotion
- Fix range_case in C_Misc_Examples: use result-variable pattern instead
  of return-inside-switch which caused FunctionBody type clash
- Register C_Misc_Examples in root theory
- Fix mk_ptr_is_null: use unwrap_focused + c_ptr_to_uintptr instead of
  adhoc-overloaded address constant (which was ambiguous in locales)
- Add is_zero_int_const checks to CPtr/CPtr equality/inequality handlers
  so (void*)0 comparisons route through mk_ptr_is_null
- Add verified is_null example: p == (void*)0 with non-null precondition
- Add ptr_eq smoke test: p != q pointer inequality
When try_translate_pure_nat_expr fails on the bound but succeeds on init,
evaluate the bound expression monadically and use unat as the range limit.
This enables for-loops like for(i=0; i < n+1; i++) where n+1 is not a
literal or parameter. Verified with noop_loop example using
wp_raw_for_loop_framedI' with trivial invariant.
- Handle nested CInitList0 in array initializer: build list-of-lists for
  2D arrays like unsigned int mat[2][3] = {{1,2,3},{4,5,6}}
- Handle chained CIndex0 reads: when arr expr is itself a CIndex0,
  apply nth directly to the sub-list value without dereferencing
- Extend c_uint_arr_verification_ctx with c_uint_list_list_prism for 2D
- Verified mat_read example: mat[1][2] == 6 proved via crush_base
- Replace blanket CTypeQual0 wildcard with granular patterns:
  _Atomic now errors (atomic semantics not modeled), const and
  restrict are individually documented with C11 section references
- Add rationale comment for _Alignas ignore (C11 §6.7.5: alignment
  does not affect computation semantics; verified in C_Misc_Examples)
- Separate extern from blanket CStorageSpec0 with its own pattern
  and comment
Replace 70-line pattern-match chain with a declarative binop_table
mapping each C binary operator to its (signed, unsigned) Isabelle
constants. CShrOp0 remains a special case due to compiler-profile-
dependent dispatch (ArithmeticShift vs ConservativeShift).
Extract the nested if/else chain into resolve_from_flags with
pattern matching on the (signed, unsigned, char, short, int,
long_count, void, struct) flag tuple. Each pattern is annotated
with the C11 §6.7.2p2 specifier combination it represents.
- Add comment block before hol_type_of explaining the ABI indirection
  strategy: type synonyms are fixed-width, CLong/CULong are mapped
  to c_int/c_uint on ILP32 vs c_long/c_ulong on LP64, CChar is
  always unsigned with signedness resolved by resolve_c_type
- Add text block in C_Numeric_Types.thy clarifying that type synonyms
  do not vary with ABI and sizeof correctness follows from the
  hol_type_of indirection
- Delete 6 unreferenced functions: constrain_expr_arrow,
  constrain_expr_arrow_from_tm, constrain_function_body_arrow,
  resolve_ptr_to_uintptr_const, resolve_uintptr_to_ptr_const,
  struct_field_offset
- Remove 2 unused list_var bindings (shadowed by typed versions)
- Discard unused ctxt binding in pointer-to-integer cast
- Wildcard unused label parameter in resolve_pointer_model_const
- Add missing ParamListPtr case to 2 non-exhaustive matches in
  translate_expr (CAssign0) and translate_lvalue_location (CVar0)

Reduces ML warnings from 12 to 0.
C_Ast_Utilities.thy (13 warnings -> 0):
- Delete redundant catch-all pattern in resolve_c_type accumulator
- Wildcard unused pure_tab parameter in 5 trivially-true clauses
  of expr_has_side_effect_with
- Delete dead function add_decl_struct_bindings (never called)
- Wildcard unused parameters in expr_is_list_backed_in_env clauses
- Wildcard unused parametric_structs in CPtr clause

C_Definition_Generation.thy (5 warnings -> 0):
- Add catch-all to tfree_subst map (non-exhaustive match)
- Wildcard unused abr_str and target_cty in string literal pattern
- Rename variable o to po to avoid SML infix operator conflict
- Wildcard unused f in duplicate ManifestOpt error clause
Introduce shared implementations parameterized by CStructTag0/CUnionTag0
for 4 function pairs that previously differed only by tag:

- extract_aggregate_type_from_specs (was extract_{struct,union}_type_from_specs)
- extract_aggregate_type_from_decl_full (was extract_{struct,union}_type_from_decl_full)
- extract_aggregate_def_with_types_from_decl (was extract_{struct,union}_def_with_types_from_decl)
- extract_aggregate_defs_with_types (was extract_{struct,union}_defs_with_types)

The struct/union names remain as thin wrappers for API compatibility.
Document why _full variants and struct-only functions are kept separate.
Deduplicate the identical Resources.provide + ERROR handling pattern
that appeared at both the main source file and manifest file registration
sites in micro_c_file.
Add comments explaining:
- Why extract_member_field_info falls back from CVoid to CInt
  (struct/union fields cannot have void type; CVoid from opaque typedefs)
- Error convention in C_Translation_Engine: error "micro_c_translate: ..."
  for internal errors vs unsupported "..." for missing features
Extract 4 named phases from the 332-line monolith:
- setup_translation_state: reset mutable refs, read config
- extract_type_definitions: struct/union/enum/typedef extraction
- extract_and_order_functions: signature computation, topo-sort,
  purity analysis
- analyze_list_backed_params: call-site analysis for array parameters

process_translation_unit is now a ~40-line orchestrator that calls
these phases and threads results into definition generation.
register and auto are already silently ignored by the catch-all in
accumulate. Add smoke test in C_Translation_Smoke_Types and verified
proof example in C_Misc_Examples documenting this.
Add CBuiltinOffsetOf0 handler in translate_expr that computes byte
offsets using raw_struct_field_offset, with support for nested struct
field designator chains. Smoke test and verified proof (offset_of_b
returns 4 for second int field).
Handle CRangeDesig0 ([lo...hi] = val) in both global
(C_Definition_Generation) and local (C_Translation_Engine) array
initialization by expanding ranges into multiple index-value pairs.
Smoke test and verified proof (range_arr with [0..1]=10, [2..3]=20).
Handle (struct T){field1, field2, ...} compound literals in
translate_expr by reusing the struct field init logic from
translate_decl. Uses monadic path with bind variables for each
field expression. Smoke test and translation example verified.
Add CStatExpr0 handler using a forward-ref pattern to bridge
translate_expr (earlier mutual recursion block) with
translate_compound_items_expr (later block). The new function
processes declarations via Ref::new binds and returns the last
expression's value and type. Smoke test and verified proof.
Both unsigned __int128 addition and signed __int128 subtraction
translate correctly. Proof example deferred pending c_uint128
verification locale.
Replace hardcoded 2D array init with recursive build_nested that
handles 3D+ arrays by unwrapping CPtr levels and matching dimension
sizes. Includes build_zero for padding incomplete initializer lists.
Smoke test verifies 3D array int[2][2][2] translation.
Remove the hard error for volatile in accumulate, treating it like
const/restrict. This unblocks translation of volatile-qualified C
code. Sound yield-based volatile modeling (where volatile reads
produce nondeterministic values via Yield) is deferred as a
follow-up. Smoke tests for volatile pointer deref and volatile local.
When a function definition introduces extra type variables (e.g., from
unresolved adhoc overloading in switch bodies or void pointer casts),
the locale morphism turns these into schematic type variables (TVar) in
phantom TYPE arguments of the registered constant.  Later functions that
reference this constant then fail with "Illegal schematic type variable".

Fix: in define_c_function, detect TYPE args containing TVars and replace
them with TYPE(unit).  This prevents the cascade while preserving locale
type parameters (which appear as TFree, not TVar).
Add a case for CInitList0 with non-struct target types, enabling
translation of multi-dimensional array initializers like
int a[2][2][2] = {{{1,2},{3,4}},{{5,6},{7,8}}}.

Previously this hit the catch-all "unsupported non-constant global
initializer shape" error.  The new case recursively builds nested
HOL lists from the sub-array initializer elements.
…tracing

Add a 'verbose' keyword option to micro_c_translate and micro_c_file.
When enabled, functions with extra type variables (TVars or TFrees beyond
locale parameters) emit diagnostic output showing:
- The variable names and sorts
- Sub-term locations (Free/Const/lambda) that introduce them

Usage:
  micro_c_translate verbose ‹...›
  micro_c_file verbose ‹path.c›

This is a debugging aid for pinpointing the source of phantom type
variables in translated C functions.
- Increase sub-term traversal depth from 3 to 20 to reach deeply nested
  terms in complex functions like fdt_next_tag
- Include locale type parameters from the expression constraint and
  addr/gv refs in the "used" set, preventing false positive reports for
  simple functions whose only TFrees are the locale's own 's, 'i, 'o
- Fix crash on bound variables by checking lambda binding types directly
  instead of using type_of on the body (which fails on Bound terms)
Previously, CSubOp0 with a pointer LHS and integer RHS fell through
to the usual_arith_conv catch-all, which called type_rank on the
pointer type and failed with "pointer type has no integer conversion
rank".

Fix: add a case for (CSubOp0, CPtr elem_cty, <non-pointer>) that
negates the integer offset (0 - n) and delegates to the existing
pointer addition machinery (mk_raw_ptr_add / mk_list_ptr_add).

This unblocks fdt_find_string_len_ which uses:
  const char *last = strtab + tabsize - (slen + 1);
… parameter calls

Four fixes enabling translation of C functions that call locale-provided
stdlib functions (e.g. memcmp/memmove) and return pointer types:

1. void→T* scalar cast: use typed prism and function type skeleton for
   c_cast_from_void so fastype_of succeeds and adhoc overloading resolves.

2. Pointer increment (LocalPtr): give c_cast_from_void a focused result
   type so mk_ptr_shifted_term detects the focused value and extracts
   the raw pointer via unwrap_focused.

3. Function call constraint: pin the return type to function_body with
   dummyT side types, so funcall's shared type variables propagate
   through the function reference to the argument expressions.

4. FunctionBody value=return constraint: for pointer-returning functions,
   construct the focused return type directly (pointer-first dispatch)
   and constrain both value and return positions to match.
@DominicPM DominicPM force-pushed the pointer-struct-field branch from 345c6e4 to 00a347e Compare March 24, 2026 20:51
@DominicPM DominicPM self-assigned this Mar 24, 2026
@DominicPM DominicPM added bug Something isn't working enhancement New feature or request labels Mar 24, 2026
@DominicPM DominicPM merged commit 81097f2 into main Mar 24, 2026
9 checks passed
@DominicPM DominicPM deleted the pointer-struct-field branch March 24, 2026 21:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant