Skip to content

Split deserializer constructor to avoid JVM code-too-large error#601

Open
nikhil-zlai wants to merge 1 commit intolinkedin:masterfrom
nikhil-zlai:fix/constructor-splitting-large-schemas
Open

Split deserializer constructor to avoid JVM code-too-large error#601
nikhil-zlai wants to merge 1 commit intolinkedin:masterfrom
nikhil-zlai:fix/constructor-splitting-large-schemas

Conversation

@nikhil-zlai
Copy link

Summary

When a schema has many fields with distinct complex types (e.g., 1000+ fields each containing a unique record type in a union), the generated FastDeserializer constructor can exceed the JVM's 64KB method bytecode limit, causing a "code too large" compilation error at runtime.

The existing method-splitting logic (populate_ helpers, PRs #251, #507, #514, #525) only applies to the deserialization methods, not the constructor. The constructor assigns this.fieldN = readerSchema.getField("...").schema() for every unique schema encountered, and each assignment generates ~14 bytes of bytecode. With 5000+ unique schema vars this exceeds 64KB.

Changes

  • Add SCHEMA_VARS_PER_CONSTRUCTOR_METHOD constant (500) to control the split threshold
  • Every 500 schema variable declarations, create a new initSchemaFields_N(Schema readerSchema) helper method
  • The constructor delegates to these helpers sequentially
  • Drop FINAL modifier on schema fields to allow assignment in helper methods (these are private fields on generated classes — no functional or performance impact)

Test

Added shouldBeAbleToReadVeryLargeSchemaWithDistinctRecordFields: creates a schema with 1000 fields × 5 sub-fields, each field a union of [null, uniqueRecordType]. Without the fix, WARM_FAST_AVRO fails with FastSerdeGeneratorException: Unable to compile. With the fix, all three implementations pass.

Test plan

  • New test shouldBeAbleToReadVeryLargeSchemaWithDistinctRecordFields passes (all 3 variants)
  • Full existing avro-fastserde-tests111 test suite passes
  • Regenerated codegen output committed

When a schema has many fields with distinct complex types (e.g., 1000+
fields each containing a unique record type in a union), the generated
FastDeserializer constructor can exceed the JVM's 64KB method bytecode
limit, causing a "code too large" compilation error.

The existing method-splitting logic (populate_ helpers) only applies to
the deserialization methods, not the constructor. This change applies the
same splitting pattern to the constructor: every 500 schema variable
declarations, a new initSchemaFields_N() helper method is created, and
the constructor delegates to these helpers.

Changes:
- Add SCHEMA_VARS_PER_CONSTRUCTOR_METHOD constant (500)
- Track constructor body separately to redirect assignments to helpers
- Drop FINAL modifier on schema fields to allow assignment in helpers
- Add test with 1000 fields × 5 sub-fields of distinct record types
nikhil-zlai added a commit to zipline-ai/chronon that referenced this pull request Mar 24, 2026
LinkedIn's fastserde generates a deserializer constructor that exceeds
JVM's 64KB method bytecode limit for schemas with many distinct complex
fields (e.g., 1000+ fields each with a unique record type). This causes
runtime compilation failure and a TimeoutException in the fetcher.

Vendor a patched build of avro-fastserde (0.4.39-SNAPSHOT) that splits
the constructor into initSchemaFields_N() helper methods every 500
schema variables, mirroring the existing populate_ splitting pattern.

Upstream PR: linkedin/avro-util#601
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants