Skip to content

Replace multiprocessing DataLoader with thread-based embedding pipeline#41

Merged
NetZissou merged 1 commit into
mainfrom
fix/embed-explore-image-folder-import
Jun 15, 2026
Merged

Replace multiprocessing DataLoader with thread-based embedding pipeline#41
NetZissou merged 1 commit into
mainfrom
fix/embed-explore-image-folder-import

Conversation

@NetZissou

Copy link
Copy Markdown
Collaborator

Image preprocessing is GIL-light as PIL/torchvision release the GIL, therefore a thread pool parallelizes it near-linearly without process spawn, pickling, or fork/spawn divergence...

This commit introduce shared/utils/image_pipeline.py: thread-pool decode+preprocess overlapped with a main-thread batch forward.

The embedding service implements device-aware concurrency

  • GPU = wide pool + intra-op threads pinned to 1
  • CPU = small pool, forward keeps the cores

This commit drops the hpc-inference git dependency entirely, unblocks PyPI release.

Please see inline comment & docstrings for implementation details.

Image preprocessing is GIL-light as PIL/torchvision release the GIL,
therefore a thread pool parallelizes it near-linearly without process
spawn, pickling, or fork/spawn divergence...

This commit introduce `shared/utils/image_pipeline.py`: thread-pool
decode+preprocess overlapped with a main-thread batch forward.

The embedding service implements device-aware concurrency
- GPU = wide pool + intra-op threads pinned to 1
- CPU = small pool, forward keeps the cores

This commit drops the hpc-inference git dependency entirely, unblocks
PyPI release.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@NetZissou NetZissou requested a review from egrace479 June 15, 2026 17:37
@NetZissou NetZissou self-assigned this Jun 15, 2026
@NetZissou NetZissou added the bug Something isn't working label Jun 15, 2026
@NetZissou NetZissou linked an issue Jun 15, 2026 that may be closed by this pull request

@egrace479 egrace479 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can run it now (with more than 5 images).

@NetZissou NetZissou merged commit a6bac3f into main Jun 15, 2026
12 checks passed
@NetZissou NetZissou deleted the fix/embed-explore-image-folder-import branch June 15, 2026 20:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

emb-embed-explore Image Folder Import Error

2 participants