Skip to content

fix: preserve image EXIF orientation and PDF /Rotate through compression#154

Merged
dholbach merged 1 commit into
mainfrom
fix/media-compression-preserve-rotation
Jul 2, 2026
Merged

fix: preserve image EXIF orientation and PDF /Rotate through compression#154
dholbach merged 1 commit into
mainfrom
fix/media-compression-preserve-rotation

Conversation

@dholbach

@dholbach dholbach commented Jul 2, 2026

Copy link
Copy Markdown
Owner

Summary

  • Root cause: Ghostscript's pdfwrite device silently strips /Rotate page attributes from PDFs; Pillow drops the EXIF Orientation tag when re-saving images. Both compression paths (upload-time and compress_media management command) were affected, leaving scanned documents displayed upside-down.
  • PDF fix: Added _read_page_rotations() / _restore_page_rotations() helpers that snapshot page /Rotate values before Ghostscript runs and write them back into the output via pypdf. Wired into both _compress_pdf_bytes (upload path) and compress_pdf_inplace (management command).
  • Image fix: Added ImageOps.exif_transpose() after Image.open() in both compress_image_upload and compress_image_inplace so scanner EXIF orientation is baked into pixels before the tag is stripped on re-save.
  • Remediation tools added to compress_media management command:
    • --force: bypass image size threshold to reprocess already-compressed images (fixes lost EXIF orientation)
    • --rotate-pages DEGREES [--path <file-or-dir>]: set /Rotate on every page of matching PDFs; accepts a single file or a directory (fixes PDFs whose rotation metadata was already stripped)

Test plan

  • Full Django + JS test suite passes
  • Manual functional test: JPEG with EXIF Orientation=3 correctly re-saved with pixels rotated and tag absent after compress_image_inplace
  • Verified on production data: ML client's 3 scanned PDFs fixed via --rotate-pages 180 --path clients/ml, generated PDF individually corrected via --rotate-pages 0 --path clients/ml/2026/<file>.pdf

🤖 Generated with Claude Code

Ghostscript's pdfwrite device silently strips /Rotate page attributes,
leaving scanned PDFs displayed upside-down after compression. Pillow
similarly drops EXIF Orientation when re-saving, rotating scanned images.

file_processing.py:
- Add ImageOps.exif_transpose() after Image.open() in both
  compress_image_upload and compress_image_inplace so scanner EXIF
  orientation is baked into pixels before the tag is stripped
- compress_image_inplace now also detects orientation-only fixes so
  small files below the skip threshold are still corrected
- Add _read_page_rotations() / _restore_page_rotations() helpers that
  snapshot page /Rotate values before Ghostscript and write them back
  into the output via pypdf; wired into both _compress_pdf_bytes
  (upload path) and compress_pdf_inplace (management command path)

compress_media.py:
- --force flag: bypass image size threshold (remediate already-compressed
  images with lost EXIF orientation)
- --rotate-pages DEGREES flag: set /Rotate on every page of all PDFs
  under --path; accepts both a directory and a single file path
  (remediate PDFs whose rotation metadata was already stripped)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@dholbach dholbach merged commit d3b83be into main Jul 2, 2026
1 check passed
@dholbach dholbach deleted the fix/media-compression-preserve-rotation branch July 2, 2026 18:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant