Skip to content

Calling unstructured api within another api call takes more time #496

Description

@Cyril-19-gen

Hey, I have a code that uses partition_via_api() function from unstructured to extract images and text. When i run this code from terminal (python3 extractor.py) the overall code completes within 1min. Now, I'm putting the same function in a fastapi endpoint. When the endpoint in called via a request, the same function is triggered, but it takes drastically higher time to make unstructured api call. Any ideas on why this issue or how to solve this?

sample function used:
async def pdf_extracted_images(file_path,filename):

chunks = partition_via_api(
    filename = str(file_path),
    api_key = "api_key", 
    api_url = "https://api.unstructuredapp.io/general/v0/general",
    strategy = "hi_res",
    split_pdf_page = True,
    split_pdf_concurrency_level = 15,
    infer_table_structure = True,
    extract_image_block_types = ["Image"],
    extract_image_block_to_payload = True,
    chunking_strategy = "basic",
    max_characters = 20000,
    combine_text_under_n_chars = 6000,
    new_after_n_chars = 6000,
)
return chunks

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions