Skip to content

fix: add depth limit to get_sample_values for nested struct columns#9383

Draft
weiguangli-io wants to merge 2 commits intomarimo-team:mainfrom
weiguangli-io:codex/marimo-9378-nested-struct-perf
Draft

fix: add depth limit to get_sample_values for nested struct columns#9383
weiguangli-io wants to merge 2 commits intomarimo-team:mainfrom
weiguangli-io:codex/marimo-9378-nested-struct-perf

Conversation

@weiguangli-io
Copy link
Copy Markdown

This pull request was authored by a coding agent.

Summary

  • Add a MAX_NESTING_DEPTH=5 limit to the to_primitive() helper inside NarwhalsTableManager.get_sample_values(), preventing exponential blowup when serializing deeply nested Polars Struct/List columns during dataset registration
  • Once the depth limit is reached, falls back to str() for the remaining nested value instead of recursing further

Context

Named Polars DataFrames with deeply nested struct columns become extremely slow when registered as datasets (#9378). The root cause is to_primitive() recursively stringifying nested Python list/dict values without a depth cap, which becomes pathological for recursive struct/list payloads:

  • depth 5: ~0.01s
  • depth 6: ~0.08s
  • depth 7: ~0.63s
  • depth 8: ~5.5s

With this fix, even depth 20+ completes in <1ms.

Test plan

  • Verified with a manual benchmark that deeply nested payloads (depth 10, 20) complete in <0.001s
  • Existing pandas get_sample_values tests pass
  • The fix preserves the existing behavior for nesting depths <= 5 (no change in output)

Closes #9378

🤖 Generated with Claude Code

Deeply nested Polars Struct/List columns caused exponential blowup in
dataset registration because `to_primitive()` recursively serialized
without a depth cap. Add a MAX_NESTING_DEPTH=5 limit that falls back
to `str()` for deeply nested values, preventing the pathological
slowdown described in marimo-team#9378.

Closes marimo-team#9378

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 25, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
marimo-docs Ready Ready Preview, Comment Apr 25, 2026 5:48am

Request Review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Named Polars DataFrames with deeply nested struct columns become extremely slow when registered as datasets

1 participant