Skip to content

PDF processing fails with "PdfiumError: Data format error" on /general/v0/general #572

Description

@Asutorufa

Describe the bug
When calling the /general/v0/general endpoint with a PDF file, the API returns 500 Internal Server Error.

  1. Send a POST request to /general/v0/general with a PDF file
  2. API returns 500 error

Environment:

  • self hosting, Docker

Additional context

  • The same PDF can be opened correctly in Chrome
2026-05-11 09:29:52,614 unstructured_inference INFO Reading PDF for file: /tmp/tmpkn1jn_hm/document ...
2026-05-11 09:29:52,725 172.16.50.230:55376 POST /general/v0/general HTTP/1.1 - 500 Internal Server Error
2026-05-11 09:29:52,726 uvicorn.error ERROR Exception in ASGI application
Traceback (most recent call last):
  File "/home/notebook-user/.local/lib/python3.12/site-packages/uvicorn/protocols/http/h11_impl.py", line 410, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/notebook-user/.local/lib/python3.12/site-packages/uvicorn/middleware/proxy_headers.py", line 60, in __call__
    return await self.app(scope, receive, send)
  File "/home/notebook-user/.local/lib/python3.12/site-packages/fastapi/applications.py", line 1138, in __call__
    await super().__call__(scope, receive, send)
  File "/home/notebook-user/.local/lib/python3.12/site-packages/starlette/applications.py", line 113, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/home/notebook-user/.local/lib/python3.12/site-packages/starlette/middleware/errors.py", line 187, in __call__
    raise exc
  File "/home/notebook-user/.local/lib/python3.12/site-packages/starlette/middleware/errors.py", line 165, in __call__
    return await self.app(scope, receive, _send)
  File "/home/notebook-user/.local/lib/python3.12/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
    await self.app(scope, receive, send)
  File "/home/notebook-user/.local/lib/python3.12/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    raise exc
  File "/home/notebook-user/.local/lib/python3.12/site-packages/starlette/_exception_handler.py", line 42, in __call__
    await app(scope, receive, sender)
  File "/home/notebook-user/.local/lib/python3.12/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
    await self.app(scope, receive, send)
  File "/home/notebook-user/.local/lib/python3.12/site-packages/starlette/routing.py", line 715, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/home/notebook-user/.local/lib/python3.12/site-packages/starlette/routing.py", line 735, in app
    await route.handle(scope, receive, sender)
  File "/home/notebook-user/.local/lib/python3.12/site-packages/starlette/routing.py", line 288, in handle
    await self.app(scope, receive, sender)
  File "/home/notebook-user/.local/lib/python3.12/site-packages/fastapi/routing.py", line 121, in app
    await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  File "/home/notebook-user/.local/lib/python3.12/site-packages/fastapi/routing.py", line 107, in app
    response = await f(request)
               ^^^^^^^^^^^^^^^^
  File "/home/notebook-user/.local/lib/python3.12/site-packages/fastapi/routing.py", line 426, in app
    raw_response = await run_endpoint_function(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/notebook-user/.local/lib/python3.12/site-packages/fastapi/routing.py", line 316, in run_endpoint_function
    return await run_in_threadpool(dependant.call, **values)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/notebook-user/.local/lib/python3.12/site-packages/starlette/concurrency.py", line 39, in run_in_threadpool
    return await anyio.to_thread.run_sync(func, *values)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/notebook-user/.local/lib/python3.12/site-packages/anyio/_backends/_asyncio.py", line 2502, in run_sync_in_worker_thread
    return await future
               ^^^^^^^
  File "/home/notebook-user/.local/lib/python3.12/site-packages/anyio/_backends/_asyncio.py", line 986, in run
    result = context.run(func, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/notebook-user/prepline_general/api/general.py", line 759, in general_partition
    list(response_generator(is_multipart=False))[0]
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/notebook-user/prepline_general/api/general.py", line 693, in response_generator
    response = pipeline_api(
               ^^^^^^^^^^^^^
  File "/home/notebook-user/prepline_general/api/general.py", line 387, in pipeline_api
    elements = partition(**partition_kwargs)  # type: ignore # pyright: ignore[reportGeneralTypeIssues]
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/notebook-user/.local/lib/python3.12/site-packages/unstructured/partition/auto.py", line 212, in partition
    elements = partition_pdf(
               ^^^^^^^^^^^^^
  File "/home/notebook-user/.local/lib/python3.12/site-packages/unstructured/partition/common/metadata.py", line 161, in wrapper
    elements = func(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^
  File "/home/notebook-user/.local/lib/python3.12/site-packages/unstructured/chunking/dispatch.py", line 74, in wrapper
    elements = func(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^
  File "/home/notebook-user/.local/lib/python3.12/site-packages/unstructured/partition/pdf.py", line 225, in partition_pdf
    return partition_pdf_or_image(
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/notebook-user/.local/lib/python3.12/site-packages/unstructured/partition/pdf.py", line 340, in partition_pdf_or_image
    elements = _partition_pdf_or_image_local(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/notebook-user/.local/lib/python3.12/site-packages/unstructured/utils.py", line 216, in wrapper
    return func(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^
  File "/home/notebook-user/.local/lib/python3.12/site-packages/unstructured/partition/pdf.py", line 706, in _partition_pdf_or_image_local
    inferred_document_layout = process_data_with_model(
                               ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/notebook-user/.local/lib/python3.12/site-packages/unstructured_inference/inference/layout.py", line 356, in process_data_with_model
    layout = process_file_with_model(
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/notebook-user/.local/lib/python3.12/site-packages/unstructured_inference/inference/layout.py", line 395, in process_file_with_model
    else DocumentLayout.from_file(
         ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/notebook-user/.local/lib/python3.12/site-packages/unstructured_inference/inference/layout.py", line 66, in from_file
    _image_paths = convert_pdf_to_image(
                   ^^^^^^^^^^^^^^^^^^^^
  File "/home/notebook-user/.local/lib/python3.12/site-packages/unstructured_inference/inference/layout.py", line 426, in convert_pdf_to_image
    with pdfium.PdfDocument(filename or file, password=password) as pdf:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/notebook-user/.local/lib/python3.12/site-packages/pypdfium2/_helpers/document.py", line 73, in __init__
    self.raw, to_hold, to_close = _open_pdf(self._input, self._password, self._autoclose)
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/notebook-user/.local/lib/python3.12/site-packages/pypdfium2/_helpers/document.py", line 549, in _open_pdf
    raise PdfiumError(f"Failed to load document (PDFium: {pdfium_i.ErrorToStr.get(err_code)}).", err_code=err_code)
pypdfium2._helpers.misc.PdfiumError: Failed to load document (PDFium: Data format error).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions