Skip to content

Ideas for bq enhancements #4

@mahmoud

Description

@mahmoud

Hey again! Per #3, I have a few ideas/observations after 3 months of using bq. For a bit of background, at SimpleLegal my team built a very similar transactional queuing system on postgres, but locked into Django land.

These days I can't go back to Django from SQLAlchemy and bq has been such a find, and fills an important gap in the Python queuing ecosystem.

Hoping to get your reactions before breaking ground on PRs. Happy to discuss about each:

1. More reliable metrics

In my production setup I've got monitoring hooked up to the metrics server as a healthcheck. It was such a pleasant surprise to find that already built. But after 3 months, I've observed inconsistencies in response times (5+ seconds) that require restarting the worker. Happens at least once every couple days. The worker itself is fine, but IME the Python builtins for http server stuff are pretty flaky. So, one idea is to improve this, whether via introducing a lightweight (pure python) dependency, or some other means.

2. Enhanced graceful shutdown

When running under docker compose and then exiting with an interrupt, all my other python processes terminate gracefully amost immediately, but I've noticed the bq worker wait for a full heartbeat sometimes. I suspect we could be doing some improved signal handling. Curious if you've seen this.

3. ULIDs

This is my main reason for using custom Task types (thank goodness bq supports them). I'm fine continuing to use my custom approach, but I think there's enough growing consensus around ULIDs as an ideal primary key that it might make sense to upstream a version for others to use? Happy to share what I'm doing if this is interesting.

4. Concurrency

Basically, this TODO already in the code.

This is probably the biggest place where I feel like it's hard to sell others on bq. Right now I'm running multiple instances of bq to scale, which is a little expensive, but manageable. There are so many ways to approach this, but here are the main thoughts I've had.

  1. We'd like to keep it native (processes/threads/async) to avoid introducing heavy dependencies or unfamiliar concurrency paradigms.
  2. Multiple processes has advantages but will at the very least require a better healthcheck, more like a supervisord-style architecture. In my past experience, we did this with forking and having a central process monitor the children and provide healthcheck/restarts.
  3. When a fully loaded worker is 200MB+ as baseline (as is the case with my application), I start to wish I had threads/async for my lighter weight IO-bound tasks.
  4. Async is appealing because sqlalchemy has async session support, and some applications only have the single async driver. In my application, I have both async and sync db drivers, and while the transactions can't be shared between them easily, I do use it on the service side to have some async endpoints and some sync endpoints.

Alright, that's it for now. Feels great to finally get that out there. We may not be able to fix it overnight, but hoping with a bit of discussion we can start small and go big! :)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions