Skip to content

fix(docker): replace PyPI opencv wheel with ffmpeg-free build [security]#569

Merged
lawrence-u10d merged 1 commit into
mainfrom
ffmpeg-fix
Apr 22, 2026
Merged

fix(docker): replace PyPI opencv wheel with ffmpeg-free build [security]#569
lawrence-u10d merged 1 commit into
mainfrom
ffmpeg-fix

Conversation

@lawrence-u10d

@lawrence-u10d lawrence-u10d commented Apr 22, 2026

Copy link
Copy Markdown
Contributor

Summary

Mirrors Unstructured-IO/unstructured#4336 in this repo so the quay.io/unstructured-io/unstructured-api image no longer ships the 14 ffmpeg 5.1.x CVEs bundled in PyPI opencv-python wheels.

After uv sync, the Dockerfile now:

  • Downloads the architecture-specific opencv-contrib-python-headless wheel (built with WITH_FFMPEG=OFF + ENABLE_CONTRIB=1 + ENABLE_HEADLESS=1) from the upstream Unstructured-IO/unstructured GitHub release (opencv-4.12.0.88)
  • SHA-256-verifies against the hashes published by the upstream build-opencv-wheels.yml workflow
  • Uninstalls any installed PyPI opencv variants and installs the verified wheel with --no-deps

The contrib-headless variant is a strict superset of the cv2 API exposed by opencv-python, opencv-python-headless, and opencv-contrib-python, so a single wheel transparently replaces whichever variant is present.

One deviation from upstream

Upstream uninstalls all four opencv variants in a single uv pip uninstall … call because their image pulls all four transitively (via unstructured-paddleocr). Our uv.lock currently only resolves opencv-python, so a single combined uninstall would fail on the three that aren't installed. Replaced with a per-package loop using || true — same end state, robust if transitive deps change.

Version / Changelog

  • Bumps service version 0.1.30.1.4
  • CHANGELOG.md entry under 0.1.4 → Security
  • No uv lock changes needed; the lockfile still resolves opencv-python 4.13.0.92, and we overlay the 4.12.0.88 contrib-headless wheel only at image build time (upstream 4.13.0.92 has no sdist on PyPI, which is why the build-from-source workflow is pinned to 4.12.0.88).

Test plan

  • make docker-build succeeds on amd64 and arm64; the opencv replacement step resolves the architecture-specific wheel and the SHA-256 check passes
  • docker run … python -c "import cv2; print(cv2.__version__)" prints 4.12.0.88 inside the built image
  • make docker-test passes against the rebuilt image
  • Container scan of the rebuilt image no longer flags the 14 ffmpeg CVEs called out by upstream PR #4336

🤖 Generated with Claude Code


Note

Medium Risk
Medium risk because it changes a core binary dependency (opencv) at image build time via an external wheel download and forced uninstall/reinstall, which could impact image build reliability or runtime CV2 behavior across architectures.

Overview
Updates the Docker build to remove vulnerable ffmpeg-bundled PyPI OpenCV wheels by downloading an arch-specific, SHA-256-verified opencv-contrib-python-headless wheel built with WITH_FFMPEG=OFF, uninstalling any installed OpenCV variants, and reinstalling the verified wheel.

Bumps the service version to 0.1.4 and adds a CHANGELOG.md security entry documenting the OpenCV/ffmpeg CVE mitigation.

Reviewed by Cursor Bugbot for commit 7e23afc. Bugbot is set up for automated code reviews on this repo. Configure here.

Mirrors Unstructured-IO/unstructured#4336. After uv sync, the Dockerfile
now downloads a source-built opencv-contrib-python-headless wheel
(WITH_FFMPEG=OFF) from the upstream release, hash-verifies it, and
substitutes it for the PyPI opencv variant installed from uv.lock. This
eliminates the 14 bundled ffmpeg 5.1.x CVEs shipped in PyPI opencv wheels.

Bumps service version 0.1.3 -> 0.1.4.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lawrence-u10d lawrence-u10d requested a review from qued April 22, 2026 16:26
@lawrence-u10d lawrence-u10d merged commit 03b57e0 into main Apr 22, 2026
12 checks passed
@lawrence-u10d lawrence-u10d deleted the ffmpeg-fix branch April 22, 2026 21:14
lawrence-u10d added a commit that referenced this pull request Apr 22, 2026
## Summary
Follow-up to #569 (v0.1.4). That PR replaced the PyPI `opencv-python`
wheel with an ffmpeg-free build, but image scanners were still flagging
the 14 ffmpeg CVEs against v0.1.4. Root cause is scanner scope, not a
broken replacement.

## Root cause
`uv pip uninstall` only drops a package from `site-packages`. The
extracted wheel archive stays in the uv cache. Inspecting the pushed
v0.1.4 image:

- ✅ `cv2.__version__` reports `4.12.0` (our replacement wheel)
- ✅ `site-packages/cv2/` has no `.libs/` directory
- ❌
`/home/notebook-user/.cache/uv/archive-v0/<hash>/opencv_python.libs/`
still contains the full extracted old wheel:
  - `libavcodec-*.so.59.37.100`
  - `libavformat-*.so.59.27.100`
  - `libavutil-*.so.57.28.100`
  - plus `libavfilter`, `libavdevice`, `libswscale`, `libswresample`

SO-version suffixes (avcodec 59.37 / avformat 59.27 / avutil 57.28) are
ffmpeg 5.1.x — matching the CVE set the upstream PR called out. Scanners
walk the whole filesystem and flag these even though nothing links
against them at runtime. `UV_LINK_MODE=copy` (set globally in this
Dockerfile) compounds it — the cache keeps its own copy independent of
`site-packages`.

## Fix
Add `uv cache clean` to the end of the opencv replacement `RUN` to wipe
the cache (including the old opencv wheel archive) from the final image
layer. Single minimal change — scoped to the opencv-fix RUN, not a
broader image-slimming pass.

Safe because `UV_LINK_MODE=copy` means the live venv copies files out of
cache — wiping the cache doesn't affect the installed packages.

## False positives ignored (not fixed here)
Two other `libav*` filenames in the image that are **not** ffmpeg and
don't trigger these CVEs:
- `/usr/lib/libreoffice/program/libavmedia{gst,lo}.so` — LibreOffice's
\"avmedia\" framework shim
- `pillow.libs/libavif-*.so.16` — AV1 image codec

## Version / Changelog
- Bumps service version `0.1.4` → `0.1.5`
- `CHANGELOG.md` entry under `0.1.5` → Security
- No `uv lock` changes

## Test plan
- [ ] `make docker-build` succeeds on `amd64` and `arm64`
- [ ] In the rebuilt image, `find / -name \"libavcodec*\" -o -name
\"libavformat*\" -o -name \"libswscale*\"` returns nothing under
`/home/notebook-user/.cache/uv/` and nothing under
`site-packages/cv2/.libs/`
- [ ] `cv2.__version__` still reports `4.12.0.88` and `import cv2;
cv2.imdecode(...)` smoke check works
- [ ] Container scan of the rebuilt image no longer flags the 14 ffmpeg
CVEs

🤖 Generated with [Claude Code](https://claude.com/claude-code)

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Low Risk**
> Low risk: a single Docker build-step cleanup (`uv cache clean`) plus
version/changelog bumps; main risk is unintended impact on Docker layer
caching or build time, not runtime behavior.
> 
> **Overview**
> Removes leftover ffmpeg `.so` files from the built image by adding `uv
cache clean` after uninstalling/reinstalling OpenCV wheels in the
Dockerfile, preventing scanners from flagging CVEs from cached wheel
contents.
> 
> Bumps the service version to `0.1.5` and adds a matching
`CHANGELOG.md` security entry describing the cache purge.
> 
> <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit
f73143d. Bugbot is set up for automated
code reviews on this repo. Configure
[here](https://www.cursor.com/dashboard/bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
tylorbayer added a commit to SchoolAI/unstructured-api that referenced this pull request Jun 22, 2026
* Bump packages, clean up uv commands (Unstructured-IO#564)

<!-- CURSOR_SUMMARY -->
> [!NOTE]
> **Medium Risk**
> Moderate risk because this updates the runtime dependency set and
changes build/CI/Docker provisioning (uv lock enforcement and spaCy
model preloading), which can cause install or runtime regressions if the
new `unstructured`/model behavior differs.
> 
> **Overview**
> Bumps the release to `0.1.2` and refreshes dependencies (including
adding `python-multipart`), aligning with an `unstructured` update that
replaces NLTK usage with spaCy.
> 
> Updates Dockerfile, Makefile, and GitHub workflows to **pre-download
spaCy models** (replacing `download_nltk_packages`) and standardizes
dependency installs by switching `uv sync` from `--frozen` to `--locked`
across CI, Docker, and local install targets.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
d9d6362. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

* fix(deps): upgrade vulnerable transitive dependencies [security] (Unstructured-IO#566)

## Summary

Automated scan found CVEs in transitive dependencies locked in `uv.lock`
files.
These packages were upgraded to patched versions.

### Remediated vulnerabilities

| Package | From | To | Severity | CVE |
|---|---|---|---|---|
| cryptography | 46.0.6 | 46.0.7 | Medium | CVE-2026-39892 |
| pypdf | 6.9.2 | 6.10.0 | Medium | CVE-2026-40260 |
| starlette | 0.41.2 | 0.47.2 | Medium | CVE-2025-54121 |
| starlette | 0.41.2 | 0.49.1 | High | CVE-2025-62727 |

### What this PR does
1. Scans all `uv.lock` files with
[grype](https://github.com/anchore/grype) for known CVEs
2. Runs `uv lock --upgrade-package <pkg>` for each fixable vulnerability
(skips major bumps)
3. Bumps component versions (patch) and updates CHANGELOGs via
`version-bump`

> Created by
[lockfile-security-scan](https://github.com/Unstructured-IO/infra/actions/workflows/lockfile-security-scan.yml).
> Targets **transitive dependencies** that Renovate cannot reach.

Co-authored-by: utic-renovate[bot] <utic-renovate[bot]@users.noreply.github.com>

* fix(docker): replace PyPI opencv wheel with ffmpeg-free build [security] (Unstructured-IO#569)

## Summary
Mirrors
[Unstructured-IO/unstructured#4336](Unstructured-IO/unstructured#4336)
in this repo so the `quay.io/unstructured-io/unstructured-api` image no
longer ships the 14 ffmpeg 5.1.x CVEs bundled in PyPI `opencv-python`
wheels.

After `uv sync`, the Dockerfile now:
- Downloads the architecture-specific `opencv-contrib-python-headless`
wheel (built with `WITH_FFMPEG=OFF` + `ENABLE_CONTRIB=1` +
`ENABLE_HEADLESS=1`) from the upstream `Unstructured-IO/unstructured`
GitHub release (`opencv-4.12.0.88`)
- SHA-256-verifies against the hashes published by the upstream
`build-opencv-wheels.yml` workflow
- Uninstalls any installed PyPI opencv variants and installs the
verified wheel with `--no-deps`

The contrib-headless variant is a strict superset of the `cv2` API
exposed by `opencv-python`, `opencv-python-headless`, and
`opencv-contrib-python`, so a single wheel transparently replaces
whichever variant is present.

## One deviation from upstream
Upstream uninstalls all four opencv variants in a single `uv pip
uninstall …` call because their image pulls all four transitively (via
`unstructured-paddleocr`). Our `uv.lock` currently only resolves
`opencv-python`, so a single combined uninstall would fail on the three
that aren't installed. Replaced with a per-package loop using `|| true`
— same end state, robust if transitive deps change.

## Version / Changelog
- Bumps service version `0.1.3` → `0.1.4`
- `CHANGELOG.md` entry under `0.1.4` → Security
- No `uv lock` changes needed; the lockfile still resolves
`opencv-python 4.13.0.92`, and we overlay the 4.12.0.88 contrib-headless
wheel only at image build time (upstream 4.13.0.92 has no sdist on PyPI,
which is why the build-from-source workflow is pinned to 4.12.0.88).

## Test plan
- [ ] `make docker-build` succeeds on `amd64` and `arm64`; the opencv
replacement step resolves the architecture-specific wheel and the
SHA-256 check passes
- [ ] `docker run … python -c "import cv2; print(cv2.__version__)"`
prints `4.12.0.88` inside the built image
- [ ] `make docker-test` passes against the rebuilt image
- [ ] Container scan of the rebuilt image no longer flags the 14 ffmpeg
CVEs called out by upstream PR #4336

🤖 Generated with [Claude Code](https://claude.com/claude-code)

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Medium risk because it changes a core binary dependency (`opencv`) at
image build time via an external wheel download and forced
uninstall/reinstall, which could impact image build reliability or
runtime CV2 behavior across architectures.
> 
> **Overview**
> Updates the Docker build to **remove vulnerable ffmpeg-bundled PyPI
OpenCV wheels** by downloading an arch-specific, SHA-256-verified
`opencv-contrib-python-headless` wheel built with `WITH_FFMPEG=OFF`,
uninstalling any installed OpenCV variants, and reinstalling the
verified wheel.
> 
> Bumps the service version to `0.1.4` and adds a `CHANGELOG.md`
security entry documenting the OpenCV/ffmpeg CVE mitigation.
> 
> <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit
7e23afc. Bugbot is set up for automated
code reviews on this repo. Configure
[here](https://www.cursor.com/dashboard/bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(docker): purge uv wheel cache after opencv swap [security] (Unstructured-IO#570)

## Summary
Follow-up to Unstructured-IO#569 (v0.1.4). That PR replaced the PyPI `opencv-python`
wheel with an ffmpeg-free build, but image scanners were still flagging
the 14 ffmpeg CVEs against v0.1.4. Root cause is scanner scope, not a
broken replacement.

## Root cause
`uv pip uninstall` only drops a package from `site-packages`. The
extracted wheel archive stays in the uv cache. Inspecting the pushed
v0.1.4 image:

- ✅ `cv2.__version__` reports `4.12.0` (our replacement wheel)
- ✅ `site-packages/cv2/` has no `.libs/` directory
- ❌
`/home/notebook-user/.cache/uv/archive-v0/<hash>/opencv_python.libs/`
still contains the full extracted old wheel:
  - `libavcodec-*.so.59.37.100`
  - `libavformat-*.so.59.27.100`
  - `libavutil-*.so.57.28.100`
  - plus `libavfilter`, `libavdevice`, `libswscale`, `libswresample`

SO-version suffixes (avcodec 59.37 / avformat 59.27 / avutil 57.28) are
ffmpeg 5.1.x — matching the CVE set the upstream PR called out. Scanners
walk the whole filesystem and flag these even though nothing links
against them at runtime. `UV_LINK_MODE=copy` (set globally in this
Dockerfile) compounds it — the cache keeps its own copy independent of
`site-packages`.

## Fix
Add `uv cache clean` to the end of the opencv replacement `RUN` to wipe
the cache (including the old opencv wheel archive) from the final image
layer. Single minimal change — scoped to the opencv-fix RUN, not a
broader image-slimming pass.

Safe because `UV_LINK_MODE=copy` means the live venv copies files out of
cache — wiping the cache doesn't affect the installed packages.

## False positives ignored (not fixed here)
Two other `libav*` filenames in the image that are **not** ffmpeg and
don't trigger these CVEs:
- `/usr/lib/libreoffice/program/libavmedia{gst,lo}.so` — LibreOffice's
\"avmedia\" framework shim
- `pillow.libs/libavif-*.so.16` — AV1 image codec

## Version / Changelog
- Bumps service version `0.1.4` → `0.1.5`
- `CHANGELOG.md` entry under `0.1.5` → Security
- No `uv lock` changes

## Test plan
- [ ] `make docker-build` succeeds on `amd64` and `arm64`
- [ ] In the rebuilt image, `find / -name \"libavcodec*\" -o -name
\"libavformat*\" -o -name \"libswscale*\"` returns nothing under
`/home/notebook-user/.cache/uv/` and nothing under
`site-packages/cv2/.libs/`
- [ ] `cv2.__version__` still reports `4.12.0.88` and `import cv2;
cv2.imdecode(...)` smoke check works
- [ ] Container scan of the rebuilt image no longer flags the 14 ffmpeg
CVEs

🤖 Generated with [Claude Code](https://claude.com/claude-code)

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Low Risk**
> Low risk: a single Docker build-step cleanup (`uv cache clean`) plus
version/changelog bumps; main risk is unintended impact on Docker layer
caching or build time, not runtime behavior.
> 
> **Overview**
> Removes leftover ffmpeg `.so` files from the built image by adding `uv
cache clean` after uninstalling/reinstalling OpenCV wheels in the
Dockerfile, preventing scanners from flagging CVEs from cached wheel
contents.
> 
> Bumps the service version to `0.1.5` and adds a matching
`CHANGELOG.md` security entry describing the cache purge.
> 
> <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit
f73143d. Bugbot is set up for automated
code reviews on this repo. Configure
[here](https://www.cursor.com/dashboard/bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: remediate CVEs for unstructured-api (Unstructured-IO#571)

## Summary

- **starlette** 0.41.2 → 1.0.0: remediates CVE-2025-54121 (MEDIUM) and
CVE-2025-62727 (HIGH). Removes the `starlette==0.41.2` constraint pin
from `[tool.uv]` — the only middleware in this repo is FastAPI's
built-in CORS middleware, which is compatible with starlette 1.0.0.
- **python-multipart** 0.0.22 → 0.0.27: remediates CVE-2026-40347
(MEDIUM).
- Bumps service version from 0.1.5 → 0.1.6.
- Does **not** touch lxml (handled by PR Unstructured-IO#525).

## Test plan

- [x] `uv sync --locked` succeeds (lockfile is consistent)
- [x] `make check-src` passes (ruff format, ruff check, mypy)
- [ ] CI lint + unit tests pass
- [ ] Docker smoke tests pass

🤖 Generated with [Claude Code](https://claude.com/claude-code)

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Primarily dependency/version changes, but removing the
`starlette==0.41.2` constraint can introduce runtime incompatibilities
due to a major Starlette upgrade affecting FastAPI/middleware behavior.
> 
> **Overview**
> Updates the service to `0.1.6` and documents a new security release in
`CHANGELOG.md`.
> 
> Removes the `starlette==0.41.2` constraint from `pyproject.toml`
(allowing Starlette to upgrade to remediate CVE-2025-54121 and
CVE-2025-62727) and bumps `python-multipart` to a non-vulnerable release
to address CVE-2026-40347.
> 
> <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit
ddaeefc. Bugbot is set up for automated
code reviews on this repo. Configure
[here](https://www.cursor.com/dashboard/bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: remediate starlette, lxml, and python-multipart CVEs for unstructured-api (Unstructured-IO#573)

## Summary

- **Bump starlette** 1.0.0 → 1.1.0 (transitive via fastapi) — fixes
CVE-2025-62727 (HIGH, SLA breach +36d) and CVE-2025-54121 (MEDIUM, SLA
breach +20d)
- **Bump lxml** 6.1.0 → 6.1.1 (transitive via unstructured) — fixes
CVE-2026-41066 (HIGH, SLA breach +20d)
- **Bump python-multipart** 0.0.27 → 0.0.29 (direct dep) — fixes
CVE-2026-40347 (MEDIUM, SLA breach +4d)
- **Rebuild** picks up latest python-3.12 apk — resolves CVE-2025-12781
(MEDIUM)

All 5 CVEs are in SLA breach.

### Changes

- `pyproject.toml`: bumped `python-multipart` minimum from `>=0.0.18` to
`>=0.0.29`; added `starlette>=1.1.0` and `lxml>=6.1.1` to `[tool.uv]
constraint-dependencies` to pin transitive dep floors
- `uv.lock`: regenerated with upgraded packages
- `prepline_general/api/__version__.py`: patch bump 0.1.6 → 0.1.7
- `CHANGELOG.md`: added 0.1.7 security entry

## Test plan

- [x] `make install-test` succeeds with `--locked`
- [x] `make test` — 133 passed, 0 failures
- [ ] CI passes on this PR
- [ ] Image build + scan confirms CVEs resolved

🤖 Generated with [Claude Code](https://claude.com/claude-code)

<!-- This is an auto-generated description by cubic. -->
---
## Summary by cubic
Remediates five SLA-breached CVEs by updating `starlette`, `lxml`, and
`python-multipart`, rebuilding for the latest Python 3.12 APK, and
tightening transitive floors. Also fixes CI by regenerating `uv.lock`
with the real `uv` binary.

- **Dependencies**
  - `starlette` 1.0.0 → 1.1.0 — fixes CVE-2025-62727, CVE-2025-54121.
  - `lxml` 6.1.0 → 6.1.1 — fixes CVE-2026-41066.
  - `python-multipart` 0.0.27 → 0.0.29 — fixes CVE-2026-40347.
- Rebuild image to include latest Python 3.12 APK — fixes
CVE-2025-12781.
- Add `[tool.uv]` constraints (`starlette>=1.1.0`, `lxml>=6.1.1`);
regenerate `uv.lock` with real `uv` to fix `uv sync --locked` in CI.
  - Bump version to 0.1.7 and update `CHANGELOG.md`.

<sup>Written for commit 73f74ba.
Summary will update on new commits. <a
href="https://cubic.dev/pr/Unstructured-IO/unstructured-api/pull/573?utm_source=github">Review
in cubic</a></sup>

<!-- End of auto-generated description by cubic. -->

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* remove cicd

---------

Co-authored-by: Emily Voss <github@emilyvoss.dev>
Co-authored-by: utic-github-cicd-token-generator[bot] <258069197+utic-github-cicd-token-generator[bot]@users.noreply.github.com>
Co-authored-by: utic-renovate[bot] <utic-renovate[bot]@users.noreply.github.com>
Co-authored-by: Lawrence Elitzer (LoLo) <lawrence@unstructured.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants