Split deserializer constructor to avoid JVM code-too-large error by nikhil-zlai · Pull Request #601 · linkedin/avro-util

nikhil-zlai · 2026-03-24T17:39:56Z

Summary

When a schema has many fields with distinct complex types (e.g., 1000+ fields each containing a unique record type in a union), the generated FastDeserializer constructor can exceed the JVM's 64KB method bytecode limit, causing a "code too large" compilation error at runtime.

The existing method-splitting logic (populate_ helpers, PRs #251, #507, #514, #525) only applies to the deserialization methods, not the constructor. The constructor assigns this.fieldN = readerSchema.getField("...").schema() for every unique schema encountered, and each assignment generates ~14 bytes of bytecode. With 5000+ unique schema vars this exceeds 64KB.

Changes

Add SCHEMA_VARS_PER_CONSTRUCTOR_METHOD constant (500) to control the split threshold
Every 500 schema variable declarations, create a new initSchemaFields_N(Schema readerSchema) helper method
The constructor delegates to these helpers sequentially
Drop FINAL modifier on schema fields to allow assignment in helper methods (these are private fields on generated classes — no functional or performance impact)

Test

Added shouldBeAbleToReadVeryLargeSchemaWithDistinctRecordFields: creates a schema with 1000 fields × 5 sub-fields, each field a union of [null, uniqueRecordType]. Without the fix, WARM_FAST_AVRO fails with FastSerdeGeneratorException: Unable to compile. With the fix, all three implementations pass.

Test plan

New test shouldBeAbleToReadVeryLargeSchemaWithDistinctRecordFields passes (all 3 variants)
Full existing avro-fastserde-tests111 test suite passes
Regenerated codegen output committed

When a schema has many fields with distinct complex types (e.g., 1000+ fields each containing a unique record type in a union), the generated FastDeserializer constructor can exceed the JVM's 64KB method bytecode limit, causing a "code too large" compilation error. The existing method-splitting logic (populate_ helpers) only applies to the deserialization methods, not the constructor. This change applies the same splitting pattern to the constructor: every 500 schema variable declarations, a new initSchemaFields_N() helper method is created, and the constructor delegates to these helpers. Changes: - Add SCHEMA_VARS_PER_CONSTRUCTOR_METHOD constant (500) - Track constructor body separately to redirect assignments to helpers - Drop FINAL modifier on schema fields to allow assignment in helpers - Add test with 1000 fields × 5 sub-fields of distinct record types

LinkedIn's fastserde generates a deserializer constructor that exceeds JVM's 64KB method bytecode limit for schemas with many distinct complex fields (e.g., 1000+ fields each with a unique record type). This causes runtime compilation failure and a TimeoutException in the fetcher. Vendor a patched build of avro-fastserde (0.4.39-SNAPSHOT) that splits the constructor into initSchemaFields_N() helper methods every 500 schema variables, mirroring the existing populate_ splitting pattern. Upstream PR: linkedin/avro-util#601

nikhil-zlai mentioned this pull request Mar 24, 2026

Vendor patched avro-fastserde to fix code-too-large for large schemas zipline-ai/chronon#1622

Open

3 tasks

piyush-zlai approved these changes Mar 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split deserializer constructor to avoid JVM code-too-large error#601

Split deserializer constructor to avoid JVM code-too-large error#601
nikhil-zlai wants to merge 1 commit intolinkedin:masterfrom
nikhil-zlai:fix/constructor-splitting-large-schemas

nikhil-zlai commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nikhil-zlai commented Mar 24, 2026

Summary

Changes

Test

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants