Hi, I have been working on adding async and polling task support for LitServe (related to #405 and #348). Instead of opening a large unsolicited PR right away, I built it as a small standalone package to explore the idea first.
In our research project we run a relatively small CPU-bound model, but the input preparation and processing pipeline can take a couple of minutes before inference even begins. Right now we use Celery together with FastAPI. That setup works, but as the project grows, with more models, possible GPU workloads, and higher scalability needs, we started looking at LitServe as a cleaner long-term direction.
Some inference workloads such as heavy preprocessing, large model generation, or video and audio processing can simply take too long for a synchronous HTTP response. Keeping a connection open for minutes is fragile and does not scale well. A task-based pattern lets clients submit work and receive a task_id immediately. They can then poll for status and fetch the result when it is ready. This fits naturally with async workflows and makes it easier to replace something like Celery without losing the fire-and-forget model.
The implementation adds four endpoints through a TaskSpec(LitSpec):
| Method |
Path |
Description |
| POST |
/tasks |
Submit a task and return {"task_id": "..."} immediately |
| GET |
/tasks/{task_id} |
Poll task status |
| GET |
/tasks/{task_id}/result |
Fetch the completed result |
| DELETE |
/tasks/{task_id} |
Explicit cleanup |
Usage is straightforward:
api = MyLitAPI(spec=TaskSpec())
The package is here:
https://github.com/dacrystal/litserve-tasks
Disclaimer: This is currently being tested in our research project at a small scale. It should be considered a proof of concept rather than production ready code.
Would this be something you might want to bring upstream? or open a PR if this direction makes sense.
Hi, I have been working on adding async and polling task support for LitServe (related to #405 and #348). Instead of opening a large unsolicited PR right away, I built it as a small standalone package to explore the idea first.
In our research project we run a relatively small CPU-bound model, but the input preparation and processing pipeline can take a couple of minutes before inference even begins. Right now we use Celery together with FastAPI. That setup works, but as the project grows, with more models, possible GPU workloads, and higher scalability needs, we started looking at LitServe as a cleaner long-term direction.
Some inference workloads such as heavy preprocessing, large model generation, or video and audio processing can simply take too long for a synchronous HTTP response. Keeping a connection open for minutes is fragile and does not scale well. A task-based pattern lets clients submit work and receive a
task_idimmediately. They can then poll for status and fetch the result when it is ready. This fits naturally with async workflows and makes it easier to replace something like Celery without losing the fire-and-forget model.The implementation adds four endpoints through a
TaskSpec(LitSpec):/tasks{"task_id": "..."}immediately/tasks/{task_id}/tasks/{task_id}/result/tasks/{task_id}Usage is straightforward:
The package is here:
https://github.com/dacrystal/litserve-tasks
Disclaimer: This is currently being tested in our research project at a small scale. It should be considered a proof of concept rather than production ready code.
Would this be something you might want to bring upstream? or open a PR if this direction makes sense.