Rust backed PDF text extraction library for Python.
- Detect and remove headers and footers
- Clean bilingual PDFs
- Mark headings in bold (basic Markdown)
- High accuracy
- Performance
uv sync --only-dev
# run tests (it rebuilds automatically)
uv run python -m unittest
# updating dependencies
cargo update
uv lock --upgrade
- Check the latest published version.
python - <<'PY'
import json
import urllib.request
with urllib.request.urlopen("https://pypi.org/pypi/fast-pdf-extract/json") as response:
data = json.load(response)
print(data["info"]["version"])
PY- Bump the version in
Cargo.toml.
[package]
version = "0.6.1"- Refresh lockfiles and run checks.
cargo check
uv lock
just test- Build the release artifacts.
rm -rf target/wheels dist
uv run maturin build --release- Publish to PyPI.
# MATURIN_PYPI_TOKEN must be set in the environment.
uv run maturin publish --skip-existing- Verify PyPI shows the new version.
python - <<'PY'
import json
import urllib.request
with urllib.request.urlopen("https://pypi.org/pypi/fast-pdf-extract/json") as response:
data = json.load(response)
print(data["info"]["version"])
PY- Commit the version bump.
git add Cargo.toml Cargo.lock
git commit -m "Bump version to <version>"If cargo build complains of missing python version.
cargo clean
cargo build