fix: remediate starlette, lxml, and python-multipart CVEs for unstructured-api#573
Merged
Conversation
Bump starlette (1.0.0 -> 1.1.0), lxml (6.1.0 -> 6.1.1), and python-multipart (0.0.27 -> 0.0.29) to resolve 5 SLA-breached CVEs: - CVE-2025-62727 (starlette, HIGH) - CVE-2025-54121 (starlette, MEDIUM) - CVE-2026-41066 (lxml, HIGH) - CVE-2026-40347 (python-multipart, MEDIUM) - CVE-2025-12781 (python-3.12 apk, MEDIUM — resolved by rebuild) Adds constraint-dependencies for starlette and lxml (transitive deps) to prevent version regression. Bumps python-multipart minimum in direct dependencies. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
There was a problem hiding this comment.
No issues found across 4 files
Shadow auto-approve: would auto-approve. This PR addresses critical CVEs by bumping three dependencies to patched versions with no breaking changes, and all tests pass successfully.
Re-trigger cubic
The previous lockfile was generated through the uv-wrapper which injected Azure DevOps registry URLs, causing `uv sync --locked` to fail in CI. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
william-u10d
approved these changes
May 27, 2026
tylorbayer
added a commit
to SchoolAI/unstructured-api
that referenced
this pull request
Jun 22, 2026
* Bump packages, clean up uv commands (Unstructured-IO#564) <!-- CURSOR_SUMMARY --> > [!NOTE] > **Medium Risk** > Moderate risk because this updates the runtime dependency set and changes build/CI/Docker provisioning (uv lock enforcement and spaCy model preloading), which can cause install or runtime regressions if the new `unstructured`/model behavior differs. > > **Overview** > Bumps the release to `0.1.2` and refreshes dependencies (including adding `python-multipart`), aligning with an `unstructured` update that replaces NLTK usage with spaCy. > > Updates Dockerfile, Makefile, and GitHub workflows to **pre-download spaCy models** (replacing `download_nltk_packages`) and standardizes dependency installs by switching `uv sync` from `--frozen` to `--locked` across CI, Docker, and local install targets. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit d9d6362. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY --> * fix(deps): upgrade vulnerable transitive dependencies [security] (Unstructured-IO#566) ## Summary Automated scan found CVEs in transitive dependencies locked in `uv.lock` files. These packages were upgraded to patched versions. ### Remediated vulnerabilities | Package | From | To | Severity | CVE | |---|---|---|---|---| | cryptography | 46.0.6 | 46.0.7 | Medium | CVE-2026-39892 | | pypdf | 6.9.2 | 6.10.0 | Medium | CVE-2026-40260 | | starlette | 0.41.2 | 0.47.2 | Medium | CVE-2025-54121 | | starlette | 0.41.2 | 0.49.1 | High | CVE-2025-62727 | ### What this PR does 1. Scans all `uv.lock` files with [grype](https://github.com/anchore/grype) for known CVEs 2. Runs `uv lock --upgrade-package <pkg>` for each fixable vulnerability (skips major bumps) 3. Bumps component versions (patch) and updates CHANGELOGs via `version-bump` > Created by [lockfile-security-scan](https://github.com/Unstructured-IO/infra/actions/workflows/lockfile-security-scan.yml). > Targets **transitive dependencies** that Renovate cannot reach. Co-authored-by: utic-renovate[bot] <utic-renovate[bot]@users.noreply.github.com> * fix(docker): replace PyPI opencv wheel with ffmpeg-free build [security] (Unstructured-IO#569) ## Summary Mirrors [Unstructured-IO/unstructured#4336](Unstructured-IO/unstructured#4336) in this repo so the `quay.io/unstructured-io/unstructured-api` image no longer ships the 14 ffmpeg 5.1.x CVEs bundled in PyPI `opencv-python` wheels. After `uv sync`, the Dockerfile now: - Downloads the architecture-specific `opencv-contrib-python-headless` wheel (built with `WITH_FFMPEG=OFF` + `ENABLE_CONTRIB=1` + `ENABLE_HEADLESS=1`) from the upstream `Unstructured-IO/unstructured` GitHub release (`opencv-4.12.0.88`) - SHA-256-verifies against the hashes published by the upstream `build-opencv-wheels.yml` workflow - Uninstalls any installed PyPI opencv variants and installs the verified wheel with `--no-deps` The contrib-headless variant is a strict superset of the `cv2` API exposed by `opencv-python`, `opencv-python-headless`, and `opencv-contrib-python`, so a single wheel transparently replaces whichever variant is present. ## One deviation from upstream Upstream uninstalls all four opencv variants in a single `uv pip uninstall …` call because their image pulls all four transitively (via `unstructured-paddleocr`). Our `uv.lock` currently only resolves `opencv-python`, so a single combined uninstall would fail on the three that aren't installed. Replaced with a per-package loop using `|| true` — same end state, robust if transitive deps change. ## Version / Changelog - Bumps service version `0.1.3` → `0.1.4` - `CHANGELOG.md` entry under `0.1.4` → Security - No `uv lock` changes needed; the lockfile still resolves `opencv-python 4.13.0.92`, and we overlay the 4.12.0.88 contrib-headless wheel only at image build time (upstream 4.13.0.92 has no sdist on PyPI, which is why the build-from-source workflow is pinned to 4.12.0.88). ## Test plan - [ ] `make docker-build` succeeds on `amd64` and `arm64`; the opencv replacement step resolves the architecture-specific wheel and the SHA-256 check passes - [ ] `docker run … python -c "import cv2; print(cv2.__version__)"` prints `4.12.0.88` inside the built image - [ ] `make docker-test` passes against the rebuilt image - [ ] Container scan of the rebuilt image no longer flags the 14 ffmpeg CVEs called out by upstream PR #4336 🤖 Generated with [Claude Code](https://claude.com/claude-code) <!-- CURSOR_SUMMARY --> --- > [!NOTE] > **Medium Risk** > Medium risk because it changes a core binary dependency (`opencv`) at image build time via an external wheel download and forced uninstall/reinstall, which could impact image build reliability or runtime CV2 behavior across architectures. > > **Overview** > Updates the Docker build to **remove vulnerable ffmpeg-bundled PyPI OpenCV wheels** by downloading an arch-specific, SHA-256-verified `opencv-contrib-python-headless` wheel built with `WITH_FFMPEG=OFF`, uninstalling any installed OpenCV variants, and reinstalling the verified wheel. > > Bumps the service version to `0.1.4` and adds a `CHANGELOG.md` security entry documenting the OpenCV/ffmpeg CVE mitigation. > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit 7e23afc. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup> <!-- /CURSOR_SUMMARY --> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(docker): purge uv wheel cache after opencv swap [security] (Unstructured-IO#570) ## Summary Follow-up to Unstructured-IO#569 (v0.1.4). That PR replaced the PyPI `opencv-python` wheel with an ffmpeg-free build, but image scanners were still flagging the 14 ffmpeg CVEs against v0.1.4. Root cause is scanner scope, not a broken replacement. ## Root cause `uv pip uninstall` only drops a package from `site-packages`. The extracted wheel archive stays in the uv cache. Inspecting the pushed v0.1.4 image: - ✅ `cv2.__version__` reports `4.12.0` (our replacement wheel) - ✅ `site-packages/cv2/` has no `.libs/` directory - ❌ `/home/notebook-user/.cache/uv/archive-v0/<hash>/opencv_python.libs/` still contains the full extracted old wheel: - `libavcodec-*.so.59.37.100` - `libavformat-*.so.59.27.100` - `libavutil-*.so.57.28.100` - plus `libavfilter`, `libavdevice`, `libswscale`, `libswresample` SO-version suffixes (avcodec 59.37 / avformat 59.27 / avutil 57.28) are ffmpeg 5.1.x — matching the CVE set the upstream PR called out. Scanners walk the whole filesystem and flag these even though nothing links against them at runtime. `UV_LINK_MODE=copy` (set globally in this Dockerfile) compounds it — the cache keeps its own copy independent of `site-packages`. ## Fix Add `uv cache clean` to the end of the opencv replacement `RUN` to wipe the cache (including the old opencv wheel archive) from the final image layer. Single minimal change — scoped to the opencv-fix RUN, not a broader image-slimming pass. Safe because `UV_LINK_MODE=copy` means the live venv copies files out of cache — wiping the cache doesn't affect the installed packages. ## False positives ignored (not fixed here) Two other `libav*` filenames in the image that are **not** ffmpeg and don't trigger these CVEs: - `/usr/lib/libreoffice/program/libavmedia{gst,lo}.so` — LibreOffice's \"avmedia\" framework shim - `pillow.libs/libavif-*.so.16` — AV1 image codec ## Version / Changelog - Bumps service version `0.1.4` → `0.1.5` - `CHANGELOG.md` entry under `0.1.5` → Security - No `uv lock` changes ## Test plan - [ ] `make docker-build` succeeds on `amd64` and `arm64` - [ ] In the rebuilt image, `find / -name \"libavcodec*\" -o -name \"libavformat*\" -o -name \"libswscale*\"` returns nothing under `/home/notebook-user/.cache/uv/` and nothing under `site-packages/cv2/.libs/` - [ ] `cv2.__version__` still reports `4.12.0.88` and `import cv2; cv2.imdecode(...)` smoke check works - [ ] Container scan of the rebuilt image no longer flags the 14 ffmpeg CVEs 🤖 Generated with [Claude Code](https://claude.com/claude-code) <!-- CURSOR_SUMMARY --> --- > [!NOTE] > **Low Risk** > Low risk: a single Docker build-step cleanup (`uv cache clean`) plus version/changelog bumps; main risk is unintended impact on Docker layer caching or build time, not runtime behavior. > > **Overview** > Removes leftover ffmpeg `.so` files from the built image by adding `uv cache clean` after uninstalling/reinstalling OpenCV wheels in the Dockerfile, preventing scanners from flagging CVEs from cached wheel contents. > > Bumps the service version to `0.1.5` and adds a matching `CHANGELOG.md` security entry describing the cache purge. > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit f73143d. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup> <!-- /CURSOR_SUMMARY --> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: remediate CVEs for unstructured-api (Unstructured-IO#571) ## Summary - **starlette** 0.41.2 → 1.0.0: remediates CVE-2025-54121 (MEDIUM) and CVE-2025-62727 (HIGH). Removes the `starlette==0.41.2` constraint pin from `[tool.uv]` — the only middleware in this repo is FastAPI's built-in CORS middleware, which is compatible with starlette 1.0.0. - **python-multipart** 0.0.22 → 0.0.27: remediates CVE-2026-40347 (MEDIUM). - Bumps service version from 0.1.5 → 0.1.6. - Does **not** touch lxml (handled by PR Unstructured-IO#525). ## Test plan - [x] `uv sync --locked` succeeds (lockfile is consistent) - [x] `make check-src` passes (ruff format, ruff check, mypy) - [ ] CI lint + unit tests pass - [ ] Docker smoke tests pass 🤖 Generated with [Claude Code](https://claude.com/claude-code) <!-- CURSOR_SUMMARY --> --- > [!NOTE] > **Medium Risk** > Primarily dependency/version changes, but removing the `starlette==0.41.2` constraint can introduce runtime incompatibilities due to a major Starlette upgrade affecting FastAPI/middleware behavior. > > **Overview** > Updates the service to `0.1.6` and documents a new security release in `CHANGELOG.md`. > > Removes the `starlette==0.41.2` constraint from `pyproject.toml` (allowing Starlette to upgrade to remediate CVE-2025-54121 and CVE-2025-62727) and bumps `python-multipart` to a non-vulnerable release to address CVE-2026-40347. > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit ddaeefc. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup> <!-- /CURSOR_SUMMARY --> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: remediate starlette, lxml, and python-multipart CVEs for unstructured-api (Unstructured-IO#573) ## Summary - **Bump starlette** 1.0.0 → 1.1.0 (transitive via fastapi) — fixes CVE-2025-62727 (HIGH, SLA breach +36d) and CVE-2025-54121 (MEDIUM, SLA breach +20d) - **Bump lxml** 6.1.0 → 6.1.1 (transitive via unstructured) — fixes CVE-2026-41066 (HIGH, SLA breach +20d) - **Bump python-multipart** 0.0.27 → 0.0.29 (direct dep) — fixes CVE-2026-40347 (MEDIUM, SLA breach +4d) - **Rebuild** picks up latest python-3.12 apk — resolves CVE-2025-12781 (MEDIUM) All 5 CVEs are in SLA breach. ### Changes - `pyproject.toml`: bumped `python-multipart` minimum from `>=0.0.18` to `>=0.0.29`; added `starlette>=1.1.0` and `lxml>=6.1.1` to `[tool.uv] constraint-dependencies` to pin transitive dep floors - `uv.lock`: regenerated with upgraded packages - `prepline_general/api/__version__.py`: patch bump 0.1.6 → 0.1.7 - `CHANGELOG.md`: added 0.1.7 security entry ## Test plan - [x] `make install-test` succeeds with `--locked` - [x] `make test` — 133 passed, 0 failures - [ ] CI passes on this PR - [ ] Image build + scan confirms CVEs resolved 🤖 Generated with [Claude Code](https://claude.com/claude-code) <!-- This is an auto-generated description by cubic. --> --- ## Summary by cubic Remediates five SLA-breached CVEs by updating `starlette`, `lxml`, and `python-multipart`, rebuilding for the latest Python 3.12 APK, and tightening transitive floors. Also fixes CI by regenerating `uv.lock` with the real `uv` binary. - **Dependencies** - `starlette` 1.0.0 → 1.1.0 — fixes CVE-2025-62727, CVE-2025-54121. - `lxml` 6.1.0 → 6.1.1 — fixes CVE-2026-41066. - `python-multipart` 0.0.27 → 0.0.29 — fixes CVE-2026-40347. - Rebuild image to include latest Python 3.12 APK — fixes CVE-2025-12781. - Add `[tool.uv]` constraints (`starlette>=1.1.0`, `lxml>=6.1.1`); regenerate `uv.lock` with real `uv` to fix `uv sync --locked` in CI. - Bump version to 0.1.7 and update `CHANGELOG.md`. <sup>Written for commit 73f74ba. Summary will update on new commits. <a href="https://cubic.dev/pr/Unstructured-IO/unstructured-api/pull/573?utm_source=github">Review in cubic</a></sup> <!-- End of auto-generated description by cubic. --> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * remove cicd --------- Co-authored-by: Emily Voss <github@emilyvoss.dev> Co-authored-by: utic-github-cicd-token-generator[bot] <258069197+utic-github-cicd-token-generator[bot]@users.noreply.github.com> Co-authored-by: utic-renovate[bot] <utic-renovate[bot]@users.noreply.github.com> Co-authored-by: Lawrence Elitzer (LoLo) <lawrence@unstructured.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Cursor <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
All 5 CVEs are in SLA breach.
Changes
pyproject.toml: bumpedpython-multipartminimum from>=0.0.18to>=0.0.29; addedstarlette>=1.1.0andlxml>=6.1.1to[tool.uv] constraint-dependenciesto pin transitive dep floorsuv.lock: regenerated with upgraded packagesprepline_general/api/__version__.py: patch bump 0.1.6 → 0.1.7CHANGELOG.md: added 0.1.7 security entryTest plan
make install-testsucceeds with--lockedmake test— 133 passed, 0 failures🤖 Generated with Claude Code
Summary by cubic
Remediates five SLA-breached CVEs by updating
starlette,lxml, andpython-multipart, rebuilding for the latest Python 3.12 APK, and tightening transitive floors. Also fixes CI by regeneratinguv.lockwith the realuvbinary.starlette1.0.0 → 1.1.0 — fixes CVE-2025-62727, CVE-2025-54121.lxml6.1.0 → 6.1.1 — fixes CVE-2026-41066.python-multipart0.0.27 → 0.0.29 — fixes CVE-2026-40347.[tool.uv]constraints (starlette>=1.1.0,lxml>=6.1.1); regenerateuv.lockwith realuvto fixuv sync --lockedin CI.CHANGELOG.md.Written for commit 73f74ba. Summary will update on new commits. Review in cubic