Skip to content

Output mypy cache as a zip file instead of a tree artifact#138

Open
dizzy57 wants to merge 1 commit into
bazel-contrib:mainfrom
dizzy57:cache_in_zip
Open

Output mypy cache as a zip file instead of a tree artifact#138
dizzy57 wants to merge 1 commit into
bazel-contrib:mainfrom
dizzy57:cache_in_zip

Conversation

@dizzy57
Copy link
Copy Markdown

@dizzy57 dizzy57 commented May 21, 2026

Previously, the output of mypy_runner.py was declared as a tree artifact (declare_directory). This PR changes it to a single zip file (declare_file) instead, which works better with bazel remote cache.

Zip was chosen over other archive formats (for example, tar.gz) because it supports random access, allowing individual entries to be skipped or extracted without reading the entire archive. File compression is explicitly disabled (ZIP_STORED) because each target produces a full accumulated cache containing mypy results for all transitive dependencies. Enabling compression would force every target to recompress the outputs of its entire transitive closure on each build, wasting CPU time.

A follow-up PR will address the cache accumulation issue by outputting only the cache entries produced by an individual rule.

This implementation showed a ~10% speedup in local runs, likely due to reducing the number of files bazel needs to copy between sandboxes.

Previously, the output of `mypy_runner.py` was declared as a tree artifact (`declare_directory`). This PR changes it to a single zip file (`declare_file`) instead, which works better with bazel remote cache.

Zip was chosen over other archive formats (for example, `tar.gz`) because it supports random access, allowing individual entries to be skipped or extracted without reading the entire archive. File compression is explicitly disabled (`ZIP_STORED`) because each target produces a full accumulated cache containing mypy results for all transitive dependencies. Enabling compression would force every target to recompress the outputs of its entire transitive closure on each build, wasting CPU time.

A follow-up PR will address the cache accumulation issue by outputting only the cache entries produced by an individual rule.

This implementation showed a ~10% speedup in local runs, likely due to reducing the number of files bazel needs to copy between sandboxes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant