Refactor C translation around an explicit ABI and pointer model#1
Merged
Refactor C translation around an explicit ABI and pointer model#1
Conversation
Introduce `C_Translation_Model` and move the C frontend toward a locale-provided pointer/memory interface instead of hard-wiring one concrete pointer representation into translation. This change: - adds `c_translation_model` with pointer arithmetic, comparisons, pointer/int casts, and ABI parameters - extends `C_To_Core_Translation` to track the reference universe (`addr`/`gv`) explicitly and resolve pointer-heavy types against it - improves typedef resolution, record type resolution, and support for parametric/generated struct types - distinguishes pure helper calls from side-effecting calls during translation - adds signed pointer shifting support (`c_ptr_shift_signed`) - tightens pointer, cast, member-access, and array-index translation, including struct-field access through pointers and raw `void *` handling - updates the memory smoke tests to exercise the nat-addressed model explicitly This is groundwork for translating C against an abstract machine model while keeping pointer semantics and ABI assumptions explicit in the target locale. Signed-off-by: Dominic Mulligan <dominic.p.mulligan@gmail.com>
type_rank gave CLong/CULong and CLongLong/CULongLong the same rank (4), violating C11 6.3.1.1p1 which requires long long to rank higher than long. Under ILP32 where long is 32-bit and long long is 64-bit, this caused usual_arith_conv to return the wrong (narrower) type. Also fix the mixed-signedness branch of usual_arith_conv to check actual bit widths per C11 6.3.1.8 rules 2+3. The old code blindly picked the signed type without checking whether it could represent all unsigned values — wrong under ILP32 where e.g. long and unsigned int are both 32-bit. Both bugs were masked under LP64 (long == long long == 64-bit).
…lations Smoke tests for struct-with-array-field patterns (e.g. &h->vec[i], array decay from struct fields) failed with "Undeclared class: Separation_Algebra.sepalg". The struct record declaration succeeded but the subsequent function translation needed the sepalg class in scope. Adding Separation_Algebra as a direct import to the two affected smoke test theories fixes all 16 errors. The session dependency is already transitively available via Shallow_Micro_C.
Tests long+long_long rank ordering and unsigned_int+long mixed-signedness conversion under abi: ilp32-le, where long is 32-bit and long long is 64-bit. Also includes sizeof sanity checks and ABI profile assertions. Added to both ROOT and AutoCorrode.thy so it runs in batch builds and jEdit sessions.
c_signed_shr models GCC/Clang arithmetic (sign-extending) right shift, which is implementation-defined per C11 6.5.7p5. The new c_signed_shr_conservative variant aborts with SignedOverflow when the operand is negative, for use with the conservative compiler profile where no specific implementation-defined behavior is assumed. Includes equational WP rule (wp_c_signed_shr_conservative) and intro rule (wp_c_signed_shr_conservativeI) following the existing pattern.
c_scast silently truncates when narrowing (two's complement), which is implementation-defined per C11 6.3.1.3p3. The new c_scast_checked variant aborts with SignedOverflow when sint(w) does not fit in the target type's signed range, for use with the conservative compiler profile. Includes equational WP rule (wp_c_scast_checked) and intro rule (wp_c_scast_checkedI) following the existing pattern.
Adds C_Compiler structure with profiles controlling three C11 implementation-defined behaviors: - char signedness (signed on x86, unsigned on ARM) - signed right shift (arithmetic vs conservative/abort) - signed narrowing cast (truncating vs checked/abort) Known profiles: gcc-x86_64, clang-x86_64 (signed char, arithmetic shr, truncating), gcc-aarch64, clang-aarch64 (unsigned char, arithmetic shr, truncating), conservative (unsigned char, abort on negative shr, abort on narrowing overflow). Default (no compiler: specified) preserves existing behavior: unsigned char, arithmetic shr, truncating narrowing. Also refactors micro_c_file to reuse micro_c_translate option parsing instead of duplicating it.
Tests char signedness (gcc-x86_64 vs gcc-aarch64), signed right shift (arithmetic vs conservative), and signed narrowing cast (truncating vs checked) under different compiler: profiles. Added to both ROOT and AutoCorrode.thy.
Tests basic switch/case/default, fallthrough between cases, and switch without default. This was a significant gap — switch had translation infrastructure but zero test coverage.
alignof_c_type returned min(sizeof, 8) = 8 for CInt128/CUInt128, but Isabelle c_alignof_int128 returns 16. Pattern-match these two cases to return 16, matching the Isabelle side and real ABI requirements.
Two changes: 1. c_schar (8 sword) had no sizeof/alignof instances, unlike every other C numeric type. Add c_sizeof_schar (= 1) and c_alignof_schar (= 1) with simp lemmas. 2. Fix sporadic "missing struct field accessor constant" concurrency bug. The ML translation pipeline uses ~30 global Unsynchronized.ref cells across C_Ast_Utils, C_Translate, C_Compiler, and C_Def_Gen. When Isabelle processes multiple theories in parallel, concurrent micro_c_translate commands clobber each other's state. Serialize all translations through a Synchronized.var mutex.
C_ABI.char_is_signed is not used by the translation pipeline; plain-char signedness is controlled by C_Compiler.get_compiler_profile (set via the compiler: option). Update comments at both sites to clarify this.
Both signed and unsigned division/modulo hardcode c_abort (to construct DivisionByZero), unlike the other unsigned operations which are polymorphic in 'abort. Add text blocks explaining why and noting the consequence for locale abort type unification.
Both functions had 3 identical copies across C_Ast_Utils, C_Translate, and C_Def_Gen. Move the canonical definitions to C_Ast_Utils (exported in the signature) and replace copies with val aliases.
Smoke tests for string literals, compound literals, sizeof, _Alignof, struct-array access, and _Generic selection. These features were previously untested at the translation level.
Extract the duplicated option processing + global ref setup from micro_c_translate and micro_c_file into a shared helper. Both commands call setup_translation_context, then do their source-specific work (inline source vs file loading + manifest).
Introduce fold_c_expr, fold_c_init, and fold_c_stmt — generic post-order folds over C AST nodes that factor out the recursive traversal structure shared by all AST analysis passes. Rewrite 7 walkers (expr_reads_vars, expr_written_vars, find_assigned_vars, find_goto_targets, find_called_functions, find_indexed_base_vars, find_named_calls_with_args) as thin handler functions over the generic fold, removing ~250 lines of duplicated pattern matching. Remove dead find_loop_written_vars from C_Ast_Utils (superseded by find_loop_written_vars_local in C_Def_Gen). Complex walkers (expr_has_side_effect_with, find_list_backed_aliases) are left unchanged — they thread context or environment state that doesn't fit the simple fold pattern.
Exercises c_signed_add end-to-end: C source → translation → contract with c_signed_in_range precondition → crush_boot/crush_base proof. This fills the gap identified in the audit where no signed arithmetic operation had a fully verified example demonstrating the overflow precondition pattern.
…tion Exercises c_unsigned_div end-to-end: C source → translation → contract with b ≠ 0 precondition → crush_boot/crush_base proof. This fills the gap where no division/modulo operation had a fully verified example.
Verify u_call_helper (identity) and u_call_caller (calls helper twice and adds results) with composing contracts. The caller's proof automatically uses the helper's [crush_specs]-tagged spec, demonstrating the cross-function verification pattern.
C_ABI_And_Compiler.thy (~125 lines) — ABI and compiler profiles C_Ast_Utilities.thy (~1340 lines) — AST walking, type resolution, generic fold C_Translation_Engine.thy (~5510 lines) — translation context, term building, expr/stmt translation C_Definition_Generation.thy (~1070 lines) — function definition generation, command registration C_To_Core_Translation.thy is retained as a re-export stub so that all downstream imports continue to work unchanged.
The abort: parameter for micro_c_translate/micro_c_file was never meaningfully used with a type other than c_abort. Remove the option and extract the abort type directly from reference_types in the locale context (falling back to c_abort outside locales). This removes the TranslateAbortTy variant, the abort field from the options record, and the abort: keyword. The one usage in C_Union_Examples (abort: 'abort) was unnecessary since the locale's reference_types already constrains the abort type correctly.
The upstream AFP Isabelle_C (mirror-afp-2025-1) has a bug in C_Parser_Language.thy where transformDeclaration errors on function parameter declarations with 3+ type specifiers (e.g. unsigned long long). The local dependencies/afp/Isabelle_C copy already has a fix, but CI checks out the AFP separately and hits the bug. Add a patch file and apply it in the setup-isabelle-action after AFP checkout. Document the requirement in dependencies/afp/README.md so users setting up AutoCorrode locally know to apply it.
…files - Translate pointer relational comparisons (< <= > >=) using c_ptr_less/ c_ptr_le/c_ptr_greater/c_ptr_ge from C_Memory_Operations, with focused type unwrapping following the pointer subtraction pattern - Add verified UB detection examples: signed overflow (SignedOverflow), division by zero (DivisionByZero), shift out of range (ShiftOutOfRange) using make_function_contract_with_abort - Add LLP64_LE (Windows x64: long=32bit, ptr=64bit) and ILP32_BE (32-bit big-endian) ABI profiles with smoke tests and metadata validation - Add ABI-parameterized c_sizeof_c_long/c_sizeof_c_pointer in c_abi_model locale for correct manual specifications across ABIs - Document c_signed_mod quotient overflow check (C11 6.5.5p6: INT_MIN%-1) - Add smoke tests for nested loops with break/continue, multi-function calls, enum constants in expressions, pointer subtraction, and pointer relational comparisons
…gnas, builtins, CChars0 Features implemented: 1. _Static_assert: evaluate constant expressions at translation time (sizeof, _Alignof, comparisons, arithmetic). Handled at both top-level (C_Definition_Generation) and block-level (C_Translation_Engine). New eval_const_int_expr + extract_string_literal utilities in C_Ast_Utilities. 2. Range case labels (case 1 ... 5:): new case_label datatype (SingleCase/RangeCase/DefaultCase) replaces option encoding. CCases0 handled in unwrap_case_labels, extract_switch_groups, make_switch_cond, make_any_case_match. Generates lo <= x AND x <= hi. 3. _Alignas specifier: explicit CAlignSpec0/CFunSpec0 patterns in resolve_c_type accumulator (was already silently ignored by catch-all, now documented). 4. __builtin_types_compatible_p: resolves type compatibility at translation time, produces constant 0/1. Other GCC builtins get explicit rejection message. 5. Multi-character constants (CChars0): explicit rejection with clear error message in translate_expr and init_scalar_const_value (both C_Translation_Engine and C_Definition_Generation). Smoke tests in C_Misc_Examples.thy with verified contracts.
The makarius/isabelle container doesn't have git installed, and actions/checkout downloads a tarball (no .git dir) for external repos. Replace `git apply` with `patch -p1` which handles unified diffs.
- Add verified param_compound (compound assign x += y) and param_inc (post-increment x++) examples demonstrating parameter auto-promotion - Fix range_case in C_Misc_Examples: use result-variable pattern instead of return-inside-switch which caused FunctionBody type clash - Register C_Misc_Examples in root theory
- Fix mk_ptr_is_null: use unwrap_focused + c_ptr_to_uintptr instead of adhoc-overloaded address constant (which was ambiguous in locales) - Add is_zero_int_const checks to CPtr/CPtr equality/inequality handlers so (void*)0 comparisons route through mk_ptr_is_null - Add verified is_null example: p == (void*)0 with non-null precondition - Add ptr_eq smoke test: p != q pointer inequality
When try_translate_pure_nat_expr fails on the bound but succeeds on init, evaluate the bound expression monadically and use unat as the range limit. This enables for-loops like for(i=0; i < n+1; i++) where n+1 is not a literal or parameter. Verified with noop_loop example using wp_raw_for_loop_framedI' with trivial invariant.
- Handle nested CInitList0 in array initializer: build list-of-lists for
2D arrays like unsigned int mat[2][3] = {{1,2,3},{4,5,6}}
- Handle chained CIndex0 reads: when arr expr is itself a CIndex0,
apply nth directly to the sub-list value without dereferencing
- Extend c_uint_arr_verification_ctx with c_uint_list_list_prism for 2D
- Verified mat_read example: mat[1][2] == 6 proved via crush_base
- Replace blanket CTypeQual0 wildcard with granular patterns: _Atomic now errors (atomic semantics not modeled), const and restrict are individually documented with C11 section references - Add rationale comment for _Alignas ignore (C11 §6.7.5: alignment does not affect computation semantics; verified in C_Misc_Examples) - Separate extern from blanket CStorageSpec0 with its own pattern and comment
Replace 70-line pattern-match chain with a declarative binop_table mapping each C binary operator to its (signed, unsigned) Isabelle constants. CShrOp0 remains a special case due to compiler-profile- dependent dispatch (ArithmeticShift vs ConservativeShift).
Extract the nested if/else chain into resolve_from_flags with pattern matching on the (signed, unsigned, char, short, int, long_count, void, struct) flag tuple. Each pattern is annotated with the C11 §6.7.2p2 specifier combination it represents.
- Add comment block before hol_type_of explaining the ABI indirection strategy: type synonyms are fixed-width, CLong/CULong are mapped to c_int/c_uint on ILP32 vs c_long/c_ulong on LP64, CChar is always unsigned with signedness resolved by resolve_c_type - Add text block in C_Numeric_Types.thy clarifying that type synonyms do not vary with ABI and sizeof correctness follows from the hol_type_of indirection
- Delete 6 unreferenced functions: constrain_expr_arrow, constrain_expr_arrow_from_tm, constrain_function_body_arrow, resolve_ptr_to_uintptr_const, resolve_uintptr_to_ptr_const, struct_field_offset - Remove 2 unused list_var bindings (shadowed by typed versions) - Discard unused ctxt binding in pointer-to-integer cast - Wildcard unused label parameter in resolve_pointer_model_const - Add missing ParamListPtr case to 2 non-exhaustive matches in translate_expr (CAssign0) and translate_lvalue_location (CVar0) Reduces ML warnings from 12 to 0.
C_Ast_Utilities.thy (13 warnings -> 0): - Delete redundant catch-all pattern in resolve_c_type accumulator - Wildcard unused pure_tab parameter in 5 trivially-true clauses of expr_has_side_effect_with - Delete dead function add_decl_struct_bindings (never called) - Wildcard unused parameters in expr_is_list_backed_in_env clauses - Wildcard unused parametric_structs in CPtr clause C_Definition_Generation.thy (5 warnings -> 0): - Add catch-all to tfree_subst map (non-exhaustive match) - Wildcard unused abr_str and target_cty in string literal pattern - Rename variable o to po to avoid SML infix operator conflict - Wildcard unused f in duplicate ManifestOpt error clause
Introduce shared implementations parameterized by CStructTag0/CUnionTag0
for 4 function pairs that previously differed only by tag:
- extract_aggregate_type_from_specs (was extract_{struct,union}_type_from_specs)
- extract_aggregate_type_from_decl_full (was extract_{struct,union}_type_from_decl_full)
- extract_aggregate_def_with_types_from_decl (was extract_{struct,union}_def_with_types_from_decl)
- extract_aggregate_defs_with_types (was extract_{struct,union}_defs_with_types)
The struct/union names remain as thin wrappers for API compatibility.
Document why _full variants and struct-only functions are kept separate.
Deduplicate the identical Resources.provide + ERROR handling pattern that appeared at both the main source file and manifest file registration sites in micro_c_file.
Add comments explaining: - Why extract_member_field_info falls back from CVoid to CInt (struct/union fields cannot have void type; CVoid from opaque typedefs) - Error convention in C_Translation_Engine: error "micro_c_translate: ..." for internal errors vs unsupported "..." for missing features
Extract 4 named phases from the 332-line monolith: - setup_translation_state: reset mutable refs, read config - extract_type_definitions: struct/union/enum/typedef extraction - extract_and_order_functions: signature computation, topo-sort, purity analysis - analyze_list_backed_params: call-site analysis for array parameters process_translation_unit is now a ~40-line orchestrator that calls these phases and threads results into definition generation.
register and auto are already silently ignored by the catch-all in accumulate. Add smoke test in C_Translation_Smoke_Types and verified proof example in C_Misc_Examples documenting this.
Add CBuiltinOffsetOf0 handler in translate_expr that computes byte offsets using raw_struct_field_offset, with support for nested struct field designator chains. Smoke test and verified proof (offset_of_b returns 4 for second int field).
Handle CRangeDesig0 ([lo...hi] = val) in both global (C_Definition_Generation) and local (C_Translation_Engine) array initialization by expanding ranges into multiple index-value pairs. Smoke test and verified proof (range_arr with [0..1]=10, [2..3]=20).
Handle (struct T){field1, field2, ...} compound literals in
translate_expr by reusing the struct field init logic from
translate_decl. Uses monadic path with bind variables for each
field expression. Smoke test and translation example verified.
Add CStatExpr0 handler using a forward-ref pattern to bridge translate_expr (earlier mutual recursion block) with translate_compound_items_expr (later block). The new function processes declarations via Ref::new binds and returns the last expression's value and type. Smoke test and verified proof.
Both unsigned __int128 addition and signed __int128 subtraction translate correctly. Proof example deferred pending c_uint128 verification locale.
Replace hardcoded 2D array init with recursive build_nested that handles 3D+ arrays by unwrapping CPtr levels and matching dimension sizes. Includes build_zero for padding incomplete initializer lists. Smoke test verifies 3D array int[2][2][2] translation.
Remove the hard error for volatile in accumulate, treating it like const/restrict. This unblocks translation of volatile-qualified C code. Sound yield-based volatile modeling (where volatile reads produce nondeterministic values via Yield) is deferred as a follow-up. Smoke tests for volatile pointer deref and volatile local.
When a function definition introduces extra type variables (e.g., from unresolved adhoc overloading in switch bodies or void pointer casts), the locale morphism turns these into schematic type variables (TVar) in phantom TYPE arguments of the registered constant. Later functions that reference this constant then fail with "Illegal schematic type variable". Fix: in define_c_function, detect TYPE args containing TVars and replace them with TYPE(unit). This prevents the cascade while preserving locale type parameters (which appear as TFree, not TVar).
Add a case for CInitList0 with non-struct target types, enabling
translation of multi-dimensional array initializers like
int a[2][2][2] = {{{1,2},{3,4}},{{5,6},{7,8}}}.
Previously this hit the catch-all "unsupported non-constant global
initializer shape" error. The new case recursively builds nested
HOL lists from the sub-array initializer elements.
…tracing Add a 'verbose' keyword option to micro_c_translate and micro_c_file. When enabled, functions with extra type variables (TVars or TFrees beyond locale parameters) emit diagnostic output showing: - The variable names and sorts - Sub-term locations (Free/Const/lambda) that introduce them Usage: micro_c_translate verbose ‹...› micro_c_file verbose ‹path.c› This is a debugging aid for pinpointing the source of phantom type variables in translated C functions.
- Increase sub-term traversal depth from 3 to 20 to reach deeply nested terms in complex functions like fdt_next_tag - Include locale type parameters from the expression constraint and addr/gv refs in the "used" set, preventing false positive reports for simple functions whose only TFrees are the locale's own 's, 'i, 'o - Fix crash on bound variables by checking lambda binding types directly instead of using type_of on the body (which fails on Bound terms)
Previously, CSubOp0 with a pointer LHS and integer RHS fell through to the usual_arith_conv catch-all, which called type_rank on the pointer type and failed with "pointer type has no integer conversion rank". Fix: add a case for (CSubOp0, CPtr elem_cty, <non-pointer>) that negates the integer offset (0 - n) and delegates to the existing pointer addition machinery (mk_raw_ptr_add / mk_list_ptr_add). This unblocks fdt_find_string_len_ which uses: const char *last = strtab + tabsize - (slen + 1);
… parameter calls Four fixes enabling translation of C functions that call locale-provided stdlib functions (e.g. memcmp/memmove) and return pointer types: 1. void→T* scalar cast: use typed prism and function type skeleton for c_cast_from_void so fastype_of succeeds and adhoc overloading resolves. 2. Pointer increment (LocalPtr): give c_cast_from_void a focused result type so mk_ptr_shifted_term detects the focused value and extracts the raw pointer via unwrap_focused. 3. Function call constraint: pin the return type to function_body with dummyT side types, so funcall's shared type variables propagate through the function reference to the argument expressions. 4. FunctionBody value=return constraint: for pointer-returning functions, construct the focused return type directly (pointer-first dispatch) and constrain both value and return positions to match.
345c6e4 to
00a347e
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.