Skip to content

Questions on the inference code #2

@ld-lab-pulsia

Description

@ld-lab-pulsia

Hello, thank you for sharing all this, I had a few questions about the inference.

1 -> Is it normal that the prompt use all this indentation in the docstring? Did you really use this prompt in the training or is there a missing textwrap.dedent?

2 -> max_model_len=32768 max_tokens=8192 -> Maybe you want to reduce 32768 as your prompt is not all thaty big, it could save cache space?

3 - > Could you explain why you run multiprocessing over vllm? Won't it duplicate the model? Usually the standard is rather to use a separate server and to do asynchronous calls.

4 -> limit_mm_per_prompt={"image": 50}, -> does that mean you intend to run the model over multiple pages simultaneously?

I am trying to integrate your model in this inference library: vlmparse, and I want to make sure I am doing it right.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions